Re: [sqlite] Compressing the DBs?
On 7/6/06, Gussimulator <[EMAIL PROTECTED]> wrote: Eduardo said: > SQLite has no compression system for free. Not for free?, I take it that its possible to implement a compression scheme in the database system then. Which is what I care only - of course its possible, but now I know its been done - Thats all I needed to know, since, if every possible solution fails, I might have to implement this by myself, low-level. If this "compression-enabled package" uses its own compression algorithm/s, then I'm ok with it not being free.. On the other hand, if they use a well known open source algorithm, charging for this version is a little off... At least from my point of view - of course it depends mainly on licenses and points of view - I guess thats the case. You get paid for your time. Someone who adapts a compression algorithm to sqlite, even an open source one, should be paid for their time too. Seems fair to me. About normalization, even if I had a good work-around, that doesnt mean that using a compression scheme wouldnt perform better on most cases. (Although I cant tell speed-wise. If you really desire to save space optimization by normalization will help a great deal. So long as you do it correctly. I worked on one database where they normalized the state names in street addresses. They added a 4 byte integer key to the address table to reference the state, a matching 4 byte integer in the state table to identify the state and created a btree index to join the two tables with. The keys in the two tables alone were 4 times the size of a 2 byte state name. This design choice resulted in more disk space being used, slower response time, and more difficult programming. You can over normalize. -- SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com
Re: [sqlite] Compressing the DBs?
Christian said: For desktop or server use, storage is cheap. Thats true, however I'd like to compress the database because I need portability. Sure, one could compress the DB and move it into a mobile storaging unit, then when you're done you could decompress the DB on the destination harddrive and keep on working with it. But that defeats the purpose mainly, the idea is to compress the database so it could be stored / used on said limited storage units. Eduardo said: SQLite has no compression system for free. Not for free?, I take it that its possible to implement a compression scheme in the database system then. Which is what I care only - of course its possible, but now I know its been done - Thats all I needed to know, since, if every possible solution fails, I might have to implement this by myself, low-level. If this "compression-enabled package" uses its own compression algorithm/s, then I'm ok with it not being free.. On the other hand, if they use a well known open source algorithm, charging for this version is a little off... At least from my point of view - of course it depends mainly on licenses and points of view - I guess thats the case. In anyway, this might not be as easy as one thinks for the first time. (this happens all the time when you're developing something, I take it that I'm not the only one who thinks so). About normalization, even if I had a good work-around, that doesnt mean that using a compression scheme wouldnt perform better on most cases. (Although I cant tell speed-wise.. I don't know how SQLite works internally, never sniffed the sources I'm afraid, only a few routines I had to implement by myself on the software side - the string issues, remember - There would be a lot more memory and processor usage, thats for sure, although memory will depend on the algorithm, if its buffered then thats not a huge issue, on the other hand, the processor usage would go sky-high, again, depending on the algorithm we use, this might also be true for compression only, for example with JCalG1) I'll see what to do on this weekend. Thanks for all the feedback.
Re: [sqlite] Compressing the DBs?
I don't really think this is a good way of dealing with this but thanks for the idea, if everything else doesnt work out, I might end up doing this. However its not a solution at all. - Original Message - From: "Mikey C" <[EMAIL PROTECTED]> To: Sent: Wednesday, July 05, 2006 6:44 PM Subject: Re: [sqlite] Compressing the DBs? What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175 Sent from the SQLite forum at Nabble.com.
Re: [sqlite] Compressing the DBs?
Gussimulator uttered: I've been using SQLite for a very short period of time and so far Its doing a great job for my application (single user, quite a big amount of data though). Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... So, before I waste my time with something that perhaps has been done already, I decided to ask here. Is there any compression scheme I can use? or I'm doomed to implement this by myself? (any tips?). For desktop or server use, storage is cheap. Unless you're constrained by limited space because of embedded FLASH storage, you're better off just taking the hit, IMO. Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Compressing the DBs?
[EMAIL PROTECTED] uttered: Mikey C <[EMAIL PROTECTED]> wrote: Not sure what you mean there DRH, but I set compression on one of my database files on NTFS and file size shrunk from 1,289,216 bytes to 696,320 bytes. And of course the whole compression / decompression process is completely transparent to SQLite and if you decide that compression is a bad thing, you just uncheck the box on that file and you are back to where you started. After turning compression on, try making lots of updates to the database. Does the database stay the same size? Is there a significant I/O performance hit? I'm guessing that the answer in both cases will be "yes". Please let me know. Compression on NTFS and co is done at a cluster group level. If the cluster group does not compress, it is stored as is. I think NTFS works with 16 cluster groups, which would be 64k chunks I think. My guess is that bigger page sizes will benefit compressing filesystems, as similar keys will be close to each other. Match the page size to the group size, so 64k in the case of NTFS. The performance hit should be negligable if any, especially given modern processors' vast performance advantage over disk IO. As has been said, the amount of data being read/written should be lower, so performance may marginally improve. But seek latency should be similar in both cases, so performance is probably largely the same. On a full/fragmented filesystem, writing less data may also reduce the number of seeks required. But a full and fragmented filesystem will have other performance issues anyway. -- D. Richard Hipp <[EMAIL PROTECTED]> Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Compressing the DBs?
On 7/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I'm guessing that the random I/O behavior of SQLite will frustrate the compression implemented by NTFS and Reiser. The end result will be a database file that might actually take up more space than if it were uncompressed. I'd be interested in hearing about experiments to prove or disprove this conjecture. Sounds like it would take a bit of work to ensure you're measuring the right thing. I'm not sure how you'd turn off the caching of the database in RAM.
Re: [sqlite] Compressing the DBs?
Mikey C <[EMAIL PROTECTED]> wrote: > Not sure what you mean there DRH, but I set compression on one of my database > files on NTFS and file size shrunk from 1,289,216 bytes to 696,320 bytes. > > And of course the whole compression / decompression process is completely > transparent to SQLite and if you decide that compression is a bad thing, you > just uncheck the box on that file and you are back to where you started. > After turning compression on, try making lots of updates to the database. Does the database stay the same size? Is there a significant I/O performance hit? I'm guessing that the answer in both cases will be "yes". Please let me know. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] Compressing the DBs?
If you have repetition like that it doesn't mean that you cannot normalize into third normal form. In your simple case having a table where X="E:\DirectX90\Feb2006" would work. It would perform better than brute force compression requiring inflation for each search. Gussimulator wrote: Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. - Original Message - From: "Darren Duncan" <[EMAIL PROTECTED]> To: Sent: Thursday, July 06, 2006 12:04 AM Subject: Re: [sqlite] Compressing the DBs? At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
Not sure what you mean there DRH, but I set compression on one of my database files on NTFS and file size shrunk from 1,289,216 bytes to 696,320 bytes. And of course the whole compression / decompression process is completely transparent to SQLite and if you decide that compression is a bad thing, you just uncheck the box on that file and you are back to where you started. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5199615 Sent from the SQLite forum at Nabble.com.
Re: [sqlite] Compressing the DBs?
"Jay Sprenkle" <[EMAIL PROTECTED]> wrote: > On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote: > > > > What platform are you using? > > > > If you are using NTFS filesystem you can just mark the file for compression > > and the OS takes care of it transparently. > > The linux OS supports Reiser file systems. This will compress > on the fly with the compression being faster than the transfer to disk. > Thus it's actually faster than a non compressing file system. > I'm guessing that the random I/O behavior of SQLite will frustrate the compression implemented by NTFS and Reiser. The end result will be a database file that might actually take up more space than if it were uncompressed. I'd be interested in hearing about experiments to prove or disprove this conjecture. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] Compressing the DBs?
On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote: What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. The linux OS supports Reiser file systems. This will compress on the fly with the compression being faster than the transfer to disk. Thus it's actually faster than a non compressing file system.
Re: [sqlite] Compressing the DBs?
At 07:07 06/07/2006, you wrote: Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. SQLite has no compression system for free. Also, any compression must be done on page level, not data level, because most compression algorithms uses past data statics (statistical data of past data) for compression and if you try to it on data level, when a row is eliminated all rows after it becomes garbage. There are a lot of compression algorithms, but i think the best for this is an arith or range coder with order 0 or 1, the page size is too low for greater orders or lz algorithms. Both (arith and range) are pretty fast, no fpu code (only integer, for embedded devices) and i think it will not slow too much file i/o. On text only data you can expect 2.5 bpb or near 65% of size reduction, more when page size is greater. The code is about 10 Kb, but i don't know where "plug-it-in" ;) HTH
Re: [sqlite] Compressing the DBs?
Gussimulator wrote: > Hello there, > > This is what I mean by repetitive data: > > Tables: > E:\DirectX90c\ > E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ > E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ > E:\DirectX90c\Feb2006_xact_x64.cab\ > E:\DirectX90c\Feb2006_MDX1_x86.cab\ > E:\DirectX90c\Feb2006_xact_x86.cab\ > > And so on, As you can see, the string E:\DirectX90c\ repeats all the > time in this example. (Also does "Feb2006_" on almost every table). > > It's just an example of the type of repetitive data I have to deal > with, they are normally paths. Since theres directories within > directories, the paths repeat. > > What would be an ideal aproach for this situation?, I would like to > save space, but I wouldnt like to waste a big amount of processing > power to do so. > > One must keep in mind that my system must perform "well" on various > situations (which I cant predict, at least not all of them), for this > reason I cant have a very elaborated database scheme. Sometimes saving > a few KBs could mean wasting a few tons of cycles, and I can't deal > with that. I'd rather have those extra KBs and deal with a responsive > application, than saving a few KBs and falling asleep at the keyboard > (don't worry, it's a multi-threaded environment, however it's > important to keep it optimized, I'm just over-sizing the problem a > little). > > > I'd like to take the right 'path' here... > Thanks. > > > > > > > - Original Message - From: "Darren Duncan" > <[EMAIL PROTECTED]> > To: > Sent: Thursday, July 06, 2006 12:04 AM > Subject: Re: [sqlite] Compressing the DBs? > > >> At 6:04 PM -0300 7/5/06, Gussimulator wrote: >>> Now, since theres a lot of repetitive data, I thought that >>> compressing the database would be a good idea, since, we all know.. >>> One of the first principles of data compression is getting rid of >>> repetitive data, so... I was wondering if this is possible with >>> SQLite or it would be quite a pain to implement a compression scheme >>> by myself?.. I have worked with many compression libraries before so >>> that wouldnt be an issue, the issue however, would be to implement >>> any of the libraries into SQLite... >> >> First things first, what do you mean by "repetitive"? >> >> Do you mean that there are many copies of the same data? >> >> Perhaps a better approach is to normalize the database and just store >> single copies of things. >> >> If you have tables with duplicate rows, then add a 'quantity' column >> and reduce to one copy of the actual data. >> >> If some columns are unique and some are repeated, perhaps try >> splitting the tables into more tables that are related. >> >> This, really, is what you should be doing first, and may very well be >> the only step you need. >> >> If you can't do that, then please explain in what way the data is >> repetitive? >> >> -- Darren Duncan > > We came across this with our filesystem metainfo system, what we ended up doing was creating a sub-table called "location". with this location it can be used either via join in the sql statement, or via a cached internal structure to recreate the path of a file. (Easy enough with a map<> or a hash<> style bucket class). No major overhead costs, but a definate savings in space. (Each directory becomes an entry in the location table, so a file is then stored as a location key value + filename). -- Bill King, Software Engineer Trolltech, Brisbane Technology Park 26 Brandl St, Eight Mile Plains, QLD, Australia, 4113 Tel + 61 7 3219 9906 (x137) Fax + 61 7 3219 9938 mobile: 0423 532 733
Re: [sqlite] Compressing the DBs?
Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. - Original Message - From: "Darren Duncan" <[EMAIL PROTECTED]> To: Sent: Thursday, July 06, 2006 12:04 AM Subject: Re: [sqlite] Compressing the DBs? At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175 Sent from the SQLite forum at Nabble.com.