Re: [sqlite] Performance Improvement
Could you define "awfully slow"? That's pretty hard to tell if your speed is what one should expect. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Paul Linehan [lineh...@tcd.ie] Sent: Thursday, July 21, 2011 1:01 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Performance Improvement > I can at least store all the data, for the cases I have tested till > now (600 billion entries for now), but it is awfully slow. I'm not surprised. Maybe you should consider some sort of partitioning scheme? Take a look at VoltDB.com - it might be an approach? Paul... -- Hmmm a "life": wonder where I can download one of those? lineh...@tcd.ie Mob: 00 353 86 864 5772 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
> I can at least store all the data, for the cases I have tested till > now (600 billion entries for now), but it is awfully slow. I'm not surprised. Maybe you should consider some sort of partitioning scheme? Take a look at VoltDB.com - it might be an approach? Paul... -- Hmmm a "life": wonder where I can download one of those? lineh...@tcd.ie Mob: 00 353 86 864 5772 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
On Mon, Jul 18, 2011 at 9:30 AM, Igor Tandetnik wrote: > On 7/18/2011 11:10 AM, > abhisek...@gmail.com wrote: >> Sorry the mail was sent prematurely. Continuing ... >> 2. In a loop, read each address, >> for that address find the next time this address was used >> ( to implement the replacement policy) > > Sort the list by address then sequence number, probably using some form > of the merge sort algorithm (which allows one to sort a data set larger > than the available RAM). Then do a single pass over the sorted list, > looking for sequences of repeated addresses. ya - make the databae do your address comparison don't pull back each record - just the records you need. maybe add a timestamp or other number you can increment rather than delete the old records, just use the old record updated as appropriate > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/18/2011 07:58 AM, abhisek...@gmail.com wrote: > So I am coding up a cache simulator. I suggest having a look at cachegrind which does something similar. You should be able to learn from their data structures. It is also open source so maybe you could adapt it to meet your needs: http://valgrind.org/docs/manual/cg-manual.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk4kZEoACgkQmOOfHg372QTSkwCgsP2hB2v4wSVXCNDai7Y/fYri m4oAnjVPHf4qFH86yauD0pktobrvGFmR =iGFB -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
On 7/18/2011 11:10 AM, abhisek...@gmail.com wrote: > Sorry the mail was sent prematurely. Continuing ... > 2. In a loop, read each address, > for that address find the next time this address was used > ( to implement the replacement policy) Sort the list by address then sequence number, probably using some form of the merge sort algorithm (which allows one to sort a data set larger than the available RAM). Then do a single pass over the sorted list, looking for sequences of repeated addresses. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
Unless your caching at byte-level you don't need all the addresses. For 4K page size mask off the lower 11 bits. You should actually be able to reduce your memory usage by a LOT if you track by pages and not bytes. That will also speed you up along with it. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of abhisek...@gmail.com [abhisek...@gmail.com] Sent: Monday, July 18, 2011 10:50 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Performance Improvement Thanks Jay. I guess you are right. I have very little experience in dealing with such huge data-sets. So what I meant was, initially I was doing this with c++ data structures like vectors and maps, and I could not even store all the entries. The program would crash out with bad_alloc exception. With sqlite, I can at least store all the data, for the cases I have tested till now (600 billion entries for now), but it is awfully slow. Also continuing with the mail, this is what I am doing: 2. In a loop, read each address, for that address find the next time this address was used ( to implement the replacement policy) To do this: 1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT PRIMARY KEY, Address UNSIGNED BIG INT) 2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address) ... 3. In the loop I am doing this (for each of the seqeunce numbers): SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n , n = 1,2, .. (Let the address selected be m) DELETE FROM AddressSet WHERE SeqNo=n // delete it so that for the next select statement we get the next time this address was seen SELECT SeqNo, Address FROM AddressSet WHERE Address=m ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address was seen I was wondering if this flow can be modified so as to see some improvements in performance. Just to give some context,all these issues come in because we need the entire history of the trace to do the processing, since when an address is seen, we need to find out when in future this was used again, In the worst case the very first address was never used again, or maybe was used at the end of the address trace. This is because I am trying to simulate an "optimal" replacement policy in the cache, unlike the real ones such as LRU, in which I could have maintained a limited history, and be done with it. Thanks again, Abhisek On Mon, Jul 18, 2011 at 11:23 AM, Jay A. Kreibich wrote: > On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the > wall: > > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com > > wrote: > > > > > These are addresses accessed by a program. There will be 100 billion > > > entries > > > > > > > You won't be able to fit that many in your database - sqlite3 cannot > scale > > to the file size you will need for that. Assuming 10-byte addresses (as > you > > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the > addresses > > (not including any sqlite3-related overhead per record, which is probably > > much larger than the 10 bytes you're saving). > > In theory, the maximum size of an SQLite database is 128 TB. > > 2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12. > > (I know http://sqlite.org/limits.html says 14TB, but I think they > dropped a digit) > > If your file system can handle this or not is a different story. > > Using SQLite for this type of data seems very questionable, however. > As Stephen points out, the database with just the addresses is likely > to be in the 3 to 4 TB range. You said "There will be 100 billion > entries or so like this, which makes it necessary to use the > database," but I think just the opposite is true. If you have a > *very* large number of data points with with a very specific access > pattern, using a general purpose tool seems like exactly the wrong > choice. You need some custom system that is highly optimized for > both storage space and your specific access patterns. > > -j > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "Intelligence is like underwear: it is important that you have it, > but showing it to the wrong people has the tendency to make them > feel uncomfortable." -- Angela Johnson > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Abhisek Live Long and Prosper _
Re: [sqlite] Performance Improvement
Thanks Jay. I guess you are right. I have very little experience in dealing with such huge data-sets. So what I meant was, initially I was doing this with c++ data structures like vectors and maps, and I could not even store all the entries. The program would crash out with bad_alloc exception. With sqlite, I can at least store all the data, for the cases I have tested till now (600 billion entries for now), but it is awfully slow. Also continuing with the mail, this is what I am doing: 2. In a loop, read each address, for that address find the next time this address was used ( to implement the replacement policy) To do this: 1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT PRIMARY KEY, Address UNSIGNED BIG INT) 2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address) ... 3. In the loop I am doing this (for each of the seqeunce numbers): SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n , n = 1,2, .. (Let the address selected be m) DELETE FROM AddressSet WHERE SeqNo=n // delete it so that for the next select statement we get the next time this address was seen SELECT SeqNo, Address FROM AddressSet WHERE Address=m ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address was seen I was wondering if this flow can be modified so as to see some improvements in performance. Just to give some context,all these issues come in because we need the entire history of the trace to do the processing, since when an address is seen, we need to find out when in future this was used again, In the worst case the very first address was never used again, or maybe was used at the end of the address trace. This is because I am trying to simulate an "optimal" replacement policy in the cache, unlike the real ones such as LRU, in which I could have maintained a limited history, and be done with it. Thanks again, Abhisek On Mon, Jul 18, 2011 at 11:23 AM, Jay A. Kreibich wrote: > On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the > wall: > > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com > > wrote: > > > > > These are addresses accessed by a program. There will be 100 billion > > > entries > > > > > > > You won't be able to fit that many in your database - sqlite3 cannot > scale > > to the file size you will need for that. Assuming 10-byte addresses (as > you > > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the > addresses > > (not including any sqlite3-related overhead per record, which is probably > > much larger than the 10 bytes you're saving). > > In theory, the maximum size of an SQLite database is 128 TB. > > 2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12. > > (I know http://sqlite.org/limits.html says 14TB, but I think they > dropped a digit) > > If your file system can handle this or not is a different story. > > Using SQLite for this type of data seems very questionable, however. > As Stephen points out, the database with just the addresses is likely > to be in the 3 to 4 TB range. You said "There will be 100 billion > entries or so like this, which makes it necessary to use the > database," but I think just the opposite is true. If you have a > *very* large number of data points with with a very specific access > pattern, using a general purpose tool seems like exactly the wrong > choice. You need some custom system that is highly optimized for > both storage space and your specific access patterns. > > -j > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "Intelligence is like underwear: it is important that you have it, > but showing it to the wrong people has the tendency to make them > feel uncomfortable." -- Angela Johnson > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Abhisek Live Long and Prosper ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
On Mon, Jul 18, 2011 at 5:14 PM, abhisek...@gmail.com wrote: > file I am trying to implement, but that is fine. What would be the limit of > the file size that sqlite can create, assuming disc space is not > an issue. > http://www.sqlite.org/limits.html Apparently i was wrong, in any case: that page claims that the limit of the file size is theoretically 2^64 but that 14TB is the practical limit. i thought that the limit was somewhere under 4GB. -- - stephan beal http://wanderinghorse.net/home/stephan/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the wall: > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com > wrote: > > > These are addresses accessed by a program. There will be 100 billion > > entries > > > > You won't be able to fit that many in your database - sqlite3 cannot scale > to the file size you will need for that. Assuming 10-byte addresses (as you > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses > (not including any sqlite3-related overhead per record, which is probably > much larger than the 10 bytes you're saving). In theory, the maximum size of an SQLite database is 128 TB. 2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12. (I know http://sqlite.org/limits.html says 14TB, but I think they dropped a digit) If your file system can handle this or not is a different story. Using SQLite for this type of data seems very questionable, however. As Stephen points out, the database with just the addresses is likely to be in the 3 to 4 TB range. You said "There will be 100 billion entries or so like this, which makes it necessary to use the database," but I think just the opposite is true. If you have a *very* large number of data points with with a very specific access pattern, using a general purpose tool seems like exactly the wrong choice. You need some custom system that is highly optimized for both storage space and your specific access patterns. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
Thanks a lot Stephen. I guess I have to do this on parts of the input file at a time then, which I will try, though it will be an approximation of the file I am trying to implement, but that is fine. What would be the limit of the file size that sqlite can create, assuming disc space is not an issue. -Abhisek On Mon, Jul 18, 2011 at 11:01 AM, Stephan Beal wrote: > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com > wrote: > > > These are addresses accessed by a program. There will be 100 billion > > entries > > > > You won't be able to fit that many in your database - sqlite3 cannot scale > to the file size you will need for that. Assuming 10-byte addresses (as you > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses > (not including any sqlite3-related overhead per record, which is probably > much larger than the 10 bytes you're saving). > > -- > - stephan beal > http://wanderinghorse.net/home/stephan/ > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
Sorry the mail was sent prematurely. Continuing ... 2. In a loop, read each address, for that address find the next time this address was used ( to implement the replacement policy) To do this: 1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT PRIMARY KEY, Address UNSIGNED BIG INT) 2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address) ... 3. In the loop I am doing this (for each of the seqeunce numbers): SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n , n = 1,2, .. (Let the address selected be m) DELETE FROM AddressSet WHERE SeqNo=n // delete it so that for the next select statement we get the next time this address was seen SELECT SeqNo, Address FROM AddressSet WHERE Address=m ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address was seen That is all. Please let me know if these can be optimized somehow. Thanks a lot for your help! Abhisek On Mon, Jul 18, 2011 at 10:58 AM, abhisek...@gmail.com wrote: > Hi, > > I am a new user of SQLite and I have a series of steps to do on a single > table, and it takes way too long. I will outline the steps below and please > let me know if I can do anything different to speed things up a bit. > > So I am coding up a cache simulator. The input basically consists of a > series of addresses: > Seq No Address > 1 0x12459 > 2. 0x03300 > ... > > These are addresses accessed by a program. There will be 100 billion > entries or so like this, which makes it necessary to use the database. The > processing is as follows: > 1. Read the addresses into the table. > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance Improvement
Hi, I am a new user of SQLite and I have a series of steps to do on a single table, and it takes way too long. I will outline the steps below and please let me know if I can do anything different to speed things up a bit. So I am coding up a cache simulator. The input basically consists of a series of addresses: Seq No Address 1 0x12459 2. 0x03300 ... These are addresses accessed by a program. There will be 100 billion entries or so like this, which makes it necessary to use the database. The processing is as follows: 1. Read the addresses into the table. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Improvement
On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com wrote: > These are addresses accessed by a program. There will be 100 billion > entries > You won't be able to fit that many in your database - sqlite3 cannot scale to the file size you will need for that. Assuming 10-byte addresses (as you demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses (not including any sqlite3-related overhead per record, which is probably much larger than the 10 bytes you're saving). -- - stephan beal http://wanderinghorse.net/home/stephan/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance improvement In Sqlite 3.6.4 version
On Nov 18, 2008, at 5:10 AM, Roshan Sullad wrote: > [...] > I have another Visual studio test application where I am using this > Sqlite3.dll functionality, I have linked statically to Sqlite3.dll by > including *sqlite3.c,sqlite3.h,sqlite3ext.h* , files in to my test > application project. And am linking through *Sqlite3.lib* which I > generated > using *Implib.exe. *Everything is linked and built successfully. If you include sqlite3.c in the project, then you are not using Sqlite3.dll at all. > So my doubt Is I have enabled above macros in the project settings of > Sqlite3.dll, do I need to define above macros in the project settings Yes, you must either use the DLL or the macro definitions. e ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance improvement In Sqlite 3.6.4 version
Hi Roger, I got your mail id from one of my friend - I am facing some issues while working with Sqlite 3.6.4, could you please clarify those things. Actually I am upgrading Sqlite for one product from Sqlite 3.2.5 to Sqlite 3.6.4 version. I have dowloaded *sqlite-amalgamation-3_6_4 *from sqlite site it contains 3 files *sqlite3.c,sqlite3.h,sqlite3ext.h* and prepared Visual studio dll solution by including these files in to the project to create *Sqlite3.dll *out put executable. For the same project I have added below macros in to project settings to improve the performance of Sqlite queries. SQLITE_EXPORTS NO_TCL SQLITE_ENABLE_COLUMN_METADATA SQLITE_DEFAULT_CACHE_SIZE=6000 SQLITE_DEFAULT_MEMSTATUS=0 SQLITE_DEFAULT_TEMP_CACHE_SIZE=1500 SQLITE_DEFAULT_PAGE_SIZE=4096 SQLITE_THREADSAFE=1 SQLITE_TEMP_STORE=2 SQLITE_ENABLE_RTREE I have another Visual studio test application where I am using this Sqlite3.dll functionality, I have linked statically to Sqlite3.dll by including *sqlite3.c,sqlite3.h,sqlite3ext.h* , files in to my test application project. And am linking through *Sqlite3.lib* which I generated using *Implib.exe. *Everything is linked and built successfully. If I compare the performance result of Sqlite 3.2.5 and Sqlite 3.6.4 version for same set of select queries - Sqlite 3.2.5 is better than Sqlite 3.6.4. So my doubt Is I have enabled above macros in the project settings of Sqlite3.dll, do I need to define above macros in the project settings of my client application also because I am linking statically to Sqlite3.dll. Do you know what is the right way to check for the performance of queries because in the Sqlite editor - time shown for the same Query executed more number of times is not consistent. I tried executing queries on Sqlite 3.2.5 and Sqlite 3.6.4 dll's directly through Sqlite Editor - there I took average of 5 executions of query but there also Sqlite 3.2.5 is better than Sqlite 3.6.4 version. I have totally 70 queries with me - I have to provide maximum performance improvement of queries from Sqlite 3.6.4 than Sqlite 3.2.5 version. So what is the right aproach (Macro's which will boost the performance) I have to follow to measure and improve the performance. And client side do I need to enable the macro's while statically linking to Sqlite3.dll. Plz reply me back as soon as possible - I need to complete this task within this week. Thank you very much. Regards/Roshan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users