Re: [sqlite] Compressing the DBs?
I don't really think this is a good way of dealing with this but thanks for the idea, if everything else doesnt work out, I might end up doing this. However its not a solution at all. - Original Message - From: "Mikey C" <[EMAIL PROTECTED]> To: <sqlite-users@sqlite.org> Sent: Wednesday, July 05, 2006 6:44 PM Subject: Re: [sqlite] Compressing the DBs? What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175 Sent from the SQLite forum at Nabble.com.
Re: [sqlite] Compressing the DBs?
Gussimulator uttered: I've been using SQLite for a very short period of time and so far Its doing a great job for my application (single user, quite a big amount of data though). Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... So, before I waste my time with something that perhaps has been done already, I decided to ask here. Is there any compression scheme I can use? or I'm doomed to implement this by myself? (any tips?). For desktop or server use, storage is cheap. Unless you're constrained by limited space because of embedded FLASH storage, you're better off just taking the hit, IMO. Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Compressing the DBs?
[EMAIL PROTECTED] uttered: Mikey C <[EMAIL PROTECTED]> wrote: Not sure what you mean there DRH, but I set compression on one of my database files on NTFS and file size shrunk from 1,289,216 bytes to 696,320 bytes. And of course the whole compression / decompression process is completely transparent to SQLite and if you decide that compression is a bad thing, you just uncheck the box on that file and you are back to where you started. After turning compression on, try making lots of updates to the database. Does the database stay the same size? Is there a significant I/O performance hit? I'm guessing that the answer in both cases will be "yes". Please let me know. Compression on NTFS and co is done at a cluster group level. If the cluster group does not compress, it is stored as is. I think NTFS works with 16 cluster groups, which would be 64k chunks I think. My guess is that bigger page sizes will benefit compressing filesystems, as similar keys will be close to each other. Match the page size to the group size, so 64k in the case of NTFS. The performance hit should be negligable if any, especially given modern processors' vast performance advantage over disk IO. As has been said, the amount of data being read/written should be lower, so performance may marginally improve. But seek latency should be similar in both cases, so performance is probably largely the same. On a full/fragmented filesystem, writing less data may also reduce the number of seeks required. But a full and fragmented filesystem will have other performance issues anyway. -- D. Richard Hipp <[EMAIL PROTECTED]> Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Compressing the DBs?
On 7/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I'm guessing that the random I/O behavior of SQLite will frustrate the compression implemented by NTFS and Reiser. The end result will be a database file that might actually take up more space than if it were uncompressed. I'd be interested in hearing about experiments to prove or disprove this conjecture. Sounds like it would take a bit of work to ensure you're measuring the right thing. I'm not sure how you'd turn off the caching of the database in RAM.
Re: [sqlite] Compressing the DBs?
If you have repetition like that it doesn't mean that you cannot normalize into third normal form. In your simple case having a table where X="E:\DirectX90\Feb2006" would work. It would perform better than brute force compression requiring inflation for each search. Gussimulator wrote: Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. - Original Message - From: "Darren Duncan" <[EMAIL PROTECTED]> To: <sqlite-users@sqlite.org> Sent: Thursday, July 06, 2006 12:04 AM Subject: Re: [sqlite] Compressing the DBs? At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
Not sure what you mean there DRH, but I set compression on one of my database files on NTFS and file size shrunk from 1,289,216 bytes to 696,320 bytes. And of course the whole compression / decompression process is completely transparent to SQLite and if you decide that compression is a bad thing, you just uncheck the box on that file and you are back to where you started. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5199615 Sent from the SQLite forum at Nabble.com.
Re: [sqlite] Compressing the DBs?
"Jay Sprenkle" <[EMAIL PROTECTED]> wrote: > On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote: > > > > What platform are you using? > > > > If you are using NTFS filesystem you can just mark the file for compression > > and the OS takes care of it transparently. > > The linux OS supports Reiser file systems. This will compress > on the fly with the compression being faster than the transfer to disk. > Thus it's actually faster than a non compressing file system. > I'm guessing that the random I/O behavior of SQLite will frustrate the compression implemented by NTFS and Reiser. The end result will be a database file that might actually take up more space than if it were uncompressed. I'd be interested in hearing about experiments to prove or disprove this conjecture. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] Compressing the DBs?
On 7/5/06, Mikey C <[EMAIL PROTECTED]> wrote: What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. The linux OS supports Reiser file systems. This will compress on the fly with the compression being faster than the transfer to disk. Thus it's actually faster than a non compressing file system.
Re: [sqlite] Compressing the DBs?
At 07:07 06/07/2006, you wrote: Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. SQLite has no compression system for free. Also, any compression must be done on page level, not data level, because most compression algorithms uses past data statics (statistical data of past data) for compression and if you try to it on data level, when a row is eliminated all rows after it becomes garbage. There are a lot of compression algorithms, but i think the best for this is an arith or range coder with order 0 or 1, the page size is too low for greater orders or lz algorithms. Both (arith and range) are pretty fast, no fpu code (only integer, for embedded devices) and i think it will not slow too much file i/o. On text only data you can expect 2.5 bpb or near 65% of size reduction, more when page size is greater. The code is about 10 Kb, but i don't know where "plug-it-in" ;) HTH
Re: [sqlite] Compressing the DBs?
Gussimulator wrote: > Hello there, > > This is what I mean by repetitive data: > > Tables: > E:\DirectX90c\ > E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ > E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ > E:\DirectX90c\Feb2006_xact_x64.cab\ > E:\DirectX90c\Feb2006_MDX1_x86.cab\ > E:\DirectX90c\Feb2006_xact_x86.cab\ > > And so on, As you can see, the string E:\DirectX90c\ repeats all the > time in this example. (Also does "Feb2006_" on almost every table). > > It's just an example of the type of repetitive data I have to deal > with, they are normally paths. Since theres directories within > directories, the paths repeat. > > What would be an ideal aproach for this situation?, I would like to > save space, but I wouldnt like to waste a big amount of processing > power to do so. > > One must keep in mind that my system must perform "well" on various > situations (which I cant predict, at least not all of them), for this > reason I cant have a very elaborated database scheme. Sometimes saving > a few KBs could mean wasting a few tons of cycles, and I can't deal > with that. I'd rather have those extra KBs and deal with a responsive > application, than saving a few KBs and falling asleep at the keyboard > (don't worry, it's a multi-threaded environment, however it's > important to keep it optimized, I'm just over-sizing the problem a > little). > > > I'd like to take the right 'path' here... > Thanks. > > > > > > > - Original Message - From: "Darren Duncan" > <[EMAIL PROTECTED]> > To: <sqlite-users@sqlite.org> > Sent: Thursday, July 06, 2006 12:04 AM > Subject: Re: [sqlite] Compressing the DBs? > > >> At 6:04 PM -0300 7/5/06, Gussimulator wrote: >>> Now, since theres a lot of repetitive data, I thought that >>> compressing the database would be a good idea, since, we all know.. >>> One of the first principles of data compression is getting rid of >>> repetitive data, so... I was wondering if this is possible with >>> SQLite or it would be quite a pain to implement a compression scheme >>> by myself?.. I have worked with many compression libraries before so >>> that wouldnt be an issue, the issue however, would be to implement >>> any of the libraries into SQLite... >> >> First things first, what do you mean by "repetitive"? >> >> Do you mean that there are many copies of the same data? >> >> Perhaps a better approach is to normalize the database and just store >> single copies of things. >> >> If you have tables with duplicate rows, then add a 'quantity' column >> and reduce to one copy of the actual data. >> >> If some columns are unique and some are repeated, perhaps try >> splitting the tables into more tables that are related. >> >> This, really, is what you should be doing first, and may very well be >> the only step you need. >> >> If you can't do that, then please explain in what way the data is >> repetitive? >> >> -- Darren Duncan > > We came across this with our filesystem metainfo system, what we ended up doing was creating a sub-table called "location". with this location it can be used either via join in the sql statement, or via a cached internal structure to recreate the path of a file. (Easy enough with a map<> or a hash<> style bucket class). No major overhead costs, but a definate savings in space. (Each directory becomes an entry in the location table, so a file is then stored as a location key value + filename). -- Bill King, Software Engineer Trolltech, Brisbane Technology Park 26 Brandl St, Eight Mile Plains, QLD, Australia, 4113 Tel + 61 7 3219 9906 (x137) Fax + 61 7 3219 9938 mobile: 0423 532 733
Re: [sqlite] Compressing the DBs?
Hello there, This is what I mean by repetitive data: Tables: E:\DirectX90c\ E:\DirectX90c\Feb2006_MDX1_x86_Archive.cab\ E:\DirectX90c\Feb2006_d3dx9_29_x64.cab\ E:\DirectX90c\Feb2006_xact_x64.cab\ E:\DirectX90c\Feb2006_MDX1_x86.cab\ E:\DirectX90c\Feb2006_xact_x86.cab\ And so on, As you can see, the string E:\DirectX90c\ repeats all the time in this example. (Also does "Feb2006_" on almost every table). It's just an example of the type of repetitive data I have to deal with, they are normally paths. Since theres directories within directories, the paths repeat. What would be an ideal aproach for this situation?, I would like to save space, but I wouldnt like to waste a big amount of processing power to do so. One must keep in mind that my system must perform "well" on various situations (which I cant predict, at least not all of them), for this reason I cant have a very elaborated database scheme. Sometimes saving a few KBs could mean wasting a few tons of cycles, and I can't deal with that. I'd rather have those extra KBs and deal with a responsive application, than saving a few KBs and falling asleep at the keyboard (don't worry, it's a multi-threaded environment, however it's important to keep it optimized, I'm just over-sizing the problem a little). I'd like to take the right 'path' here... Thanks. - Original Message - From: "Darren Duncan" <[EMAIL PROTECTED]> To: <sqlite-users@sqlite.org> Sent: Thursday, July 06, 2006 12:04 AM Subject: Re: [sqlite] Compressing the DBs? At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
At 6:04 PM -0300 7/5/06, Gussimulator wrote: Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... First things first, what do you mean by "repetitive"? Do you mean that there are many copies of the same data? Perhaps a better approach is to normalize the database and just store single copies of things. If you have tables with duplicate rows, then add a 'quantity' column and reduce to one copy of the actual data. If some columns are unique and some are repeated, perhaps try splitting the tables into more tables that are related. This, really, is what you should be doing first, and may very well be the only step you need. If you can't do that, then please explain in what way the data is repetitive? -- Darren Duncan
Re: [sqlite] Compressing the DBs?
What platform are you using? If you are using NTFS filesystem you can just mark the file for compression and the OS takes care of it transparently. -- View this message in context: http://www.nabble.com/Compressing-the-DBs--tf1897195.html#a5190175 Sent from the SQLite forum at Nabble.com.
[sqlite] Compressing the DBs?
I've been using SQLite for a very short period of time and so far Its doing a great job for my application (single user, quite a big amount of data though). Now, since theres a lot of repetitive data, I thought that compressing the database would be a good idea, since, we all know.. One of the first principles of data compression is getting rid of repetitive data, so... I was wondering if this is possible with SQLite or it would be quite a pain to implement a compression scheme by myself?.. I have worked with many compression libraries before so that wouldnt be an issue, the issue however, would be to implement any of the libraries into SQLite... So, before I waste my time with something that perhaps has been done already, I decided to ask here. Is there any compression scheme I can use? or I'm doomed to implement this by myself? (any tips?). Thanks a lot. PS: Sorry if I didnt read the manual enough.