Re: [sqlite] Performance Improvement

2011-07-21 Thread Black, Michael (IS)
Could you define "awfully slow"?  That's pretty hard to tell if your speed is 
what one should expect.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Paul Linehan [lineh...@tcd.ie]
Sent: Thursday, July 21, 2011 1:01 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Performance Improvement


> I can at least store all the data, for the cases I have tested till
> now (600 billion entries for now), but it is awfully slow.

I'm not surprised. Maybe you should consider some sort of
partitioning scheme? Take a look at VoltDB.com - it might
be an approach?


Paul...

--

Hmmm a "life": wonder where I can download one of those?


lineh...@tcd.ie

Mob: 00 353 86 864 5772
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-21 Thread Paul Linehan
> I can at least store all the data, for the cases I have tested till
> now (600 billion entries for now), but it is awfully slow.

I'm not surprised. Maybe you should consider some sort of
partitioning scheme? Take a look at VoltDB.com - it might
be an approach?


Paul...

--

Hmmm a "life": wonder where I can download one of those?


lineh...@tcd.ie

Mob: 00 353 86 864 5772
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread J Decker
On Mon, Jul 18, 2011 at 9:30 AM, Igor Tandetnik  wrote:
> On 7/18/2011 11:10 AM,
> abhisek...@gmail.com wrote:
>> Sorry the mail was sent prematurely. Continuing ...
>> 2. In a loop, read each address,
>>                     for that address find the next time this address was used
>> ( to implement the replacement policy)
>
> Sort the list by address then sequence number, probably using some form
> of the merge sort algorithm (which allows one to sort a data set larger
> than the available RAM). Then do a single pass over the sorted list,
> looking for sequences of repeated addresses.

ya - make the databae do your address comparison don't pull back each
record - just the records you need.  maybe add a timestamp or other
number you can increment rather than delete the old records, just use
the old record updated as appropriate

> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/18/2011 07:58 AM, abhisek...@gmail.com wrote:
> So I am coding up a cache simulator. 

I suggest having a look at cachegrind which does something similar.  You
should be able to learn from their data structures.  It is also open source
so maybe you could adapt it to meet your needs:

  http://valgrind.org/docs/manual/cg-manual.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk4kZEoACgkQmOOfHg372QTSkwCgsP2hB2v4wSVXCNDai7Y/fYri
m4oAnjVPHf4qFH86yauD0pktobrvGFmR
=iGFB
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Igor Tandetnik
On 7/18/2011 11:10 AM, 
abhisek...@gmail.com wrote:
> Sorry the mail was sent prematurely. Continuing ...
> 2. In a loop, read each address,
> for that address find the next time this address was used
> ( to implement the replacement policy)

Sort the list by address then sequence number, probably using some form 
of the merge sort algorithm (which allows one to sort a data set larger 
than the available RAM). Then do a single pass over the sorted list, 
looking for sequences of repeated addresses.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Black, Michael (IS)
Unless your caching at byte-level you don't need all the addresses.



For 4K page size mask off the lower 11 bits.



You should actually be able to reduce your memory usage by a LOT if you track 
by pages and not bytes.

That will also speed you up along with it.





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of abhisek...@gmail.com [abhisek...@gmail.com]
Sent: Monday, July 18, 2011 10:50 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Performance Improvement

Thanks Jay.

I guess you are right. I have very little experience in dealing with such
huge data-sets. So what I meant was,
initially I was doing this with c++ data structures like vectors and maps,
and I could not even store all the
entries. The program would crash out with bad_alloc exception. With sqlite,
I can at least store all the data,
for the cases I have tested till now (600 billion entries for now), but it
is awfully slow.

Also continuing with the mail, this is what I am doing:

2. In a loop, read each address,
   for that address find the next time this address was used
( to implement the replacement policy)

To do this:

1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT
PRIMARY KEY, Address UNSIGNED BIG INT)
2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address)
...

3. In the loop I am doing this (for each of the seqeunce numbers):
  SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n ,
n = 1,2, ..   (Let the address selected be m)
  DELETE FROM AddressSet WHERE SeqNo=n
// delete it so that for the next select statement we get the next time this
address was seen
  SELECT SeqNo, Address FROM AddressSet WHERE Address=m
ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address
was seen

I was wondering if this flow can be modified so as to see some improvements
in performance.

Just to give some context,all these issues come in because we need the
entire history of the trace to do the processing, since when an address is
seen, we need to find out
when in future this was used again, In the worst case the very first address
was never used again, or maybe was used at the end of the address trace.
This is because I am trying to simulate an "optimal" replacement policy in
the cache, unlike the real ones such as LRU, in which I could have
maintained a limited history,
and be done with it.

Thanks again,
Abhisek

On Mon, Jul 18, 2011 at 11:23 AM, Jay A. Kreibich  wrote:

> On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the
> wall:
> > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
> > wrote:
> >
> > > These are addresses accessed by a program. There will be 100 billion
> > > entries
> > >
> >
> > You won't be able to fit that many in your database - sqlite3 cannot
> scale
> > to the file size you will need for that. Assuming 10-byte addresses (as
> you
> > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the
> addresses
> > (not including any sqlite3-related overhead per record, which is probably
> > much larger than the 10 bytes you're saving).
>
>   In theory, the maximum size of an SQLite database is 128 TB.
>
>  2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12.
>
>  (I know http://sqlite.org/limits.html says 14TB, but I think they
>   dropped a digit)
>
>  If your file system can handle this or not is a different story.
>
>  Using SQLite for this type of data seems very questionable, however.
>  As Stephen points out, the database with just the addresses is likely
>  to be in the 3 to 4 TB range. You said "There will be 100 billion
>   entries or so like this, which makes it necessary to use the
>   database," but I think just the opposite is true.  If you have a
>  *very* large number of data points with with a very specific access
>  pattern, using a general purpose tool seems like exactly the wrong
>  choice.  You need some custom system that is highly optimized for
>  both storage space and your specific access patterns.
>
>   -j
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Intelligence is like underwear: it is important that you have it,
>  but showing it to the wrong people has the tendency to make them
>  feel uncomfortable." -- Angela Johnson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
Abhisek
Live Long and Prosper
_

Re: [sqlite] Performance Improvement

2011-07-18 Thread abhisek...@gmail.com
Thanks Jay.

I guess you are right. I have very little experience in dealing with such
huge data-sets. So what I meant was,
initially I was doing this with c++ data structures like vectors and maps,
and I could not even store all the
entries. The program would crash out with bad_alloc exception. With sqlite,
I can at least store all the data,
for the cases I have tested till now (600 billion entries for now), but it
is awfully slow.

Also continuing with the mail, this is what I am doing:

2. In a loop, read each address,
   for that address find the next time this address was used
( to implement the replacement policy)

To do this:

1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT
PRIMARY KEY, Address UNSIGNED BIG INT)
2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address)
...

3. In the loop I am doing this (for each of the seqeunce numbers):
  SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n ,
n = 1,2, ..   (Let the address selected be m)
  DELETE FROM AddressSet WHERE SeqNo=n
// delete it so that for the next select statement we get the next time this
address was seen
  SELECT SeqNo, Address FROM AddressSet WHERE Address=m
ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address
was seen

I was wondering if this flow can be modified so as to see some improvements
in performance.

Just to give some context,all these issues come in because we need the
entire history of the trace to do the processing, since when an address is
seen, we need to find out
when in future this was used again, In the worst case the very first address
was never used again, or maybe was used at the end of the address trace.
This is because I am trying to simulate an "optimal" replacement policy in
the cache, unlike the real ones such as LRU, in which I could have
maintained a limited history,
and be done with it.

Thanks again,
Abhisek

On Mon, Jul 18, 2011 at 11:23 AM, Jay A. Kreibich  wrote:

> On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the
> wall:
> > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
> > wrote:
> >
> > > These are addresses accessed by a program. There will be 100 billion
> > > entries
> > >
> >
> > You won't be able to fit that many in your database - sqlite3 cannot
> scale
> > to the file size you will need for that. Assuming 10-byte addresses (as
> you
> > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the
> addresses
> > (not including any sqlite3-related overhead per record, which is probably
> > much larger than the 10 bytes you're saving).
>
>   In theory, the maximum size of an SQLite database is 128 TB.
>
>  2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12.
>
>  (I know http://sqlite.org/limits.html says 14TB, but I think they
>   dropped a digit)
>
>  If your file system can handle this or not is a different story.
>
>  Using SQLite for this type of data seems very questionable, however.
>  As Stephen points out, the database with just the addresses is likely
>  to be in the 3 to 4 TB range. You said "There will be 100 billion
>   entries or so like this, which makes it necessary to use the
>   database," but I think just the opposite is true.  If you have a
>  *very* large number of data points with with a very specific access
>  pattern, using a general purpose tool seems like exactly the wrong
>  choice.  You need some custom system that is highly optimized for
>  both storage space and your specific access patterns.
>
>   -j
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Intelligence is like underwear: it is important that you have it,
>  but showing it to the wrong people has the tendency to make them
>  feel uncomfortable." -- Angela Johnson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Abhisek
Live Long and Prosper
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Stephan Beal
On Mon, Jul 18, 2011 at 5:14 PM, abhisek...@gmail.com
wrote:

> file I am trying to implement, but that is fine. What would be the limit of
> the file size that sqlite can create, assuming disc space is not
> an issue.
>

http://www.sqlite.org/limits.html

Apparently i was wrong, in any case: that page claims that the limit of the
file size is theoretically 2^64 but that 14TB is the practical limit. i
thought that the limit was somewhere under 4GB.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Jay A. Kreibich
On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the wall:
> On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
> wrote:
> 
> > These are addresses accessed by a program. There will be 100 billion
> > entries
> >
> 
> You won't be able to fit that many in your database - sqlite3 cannot scale
> to the file size you will need for that. Assuming 10-byte addresses (as you
> demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses
> (not including any sqlite3-related overhead per record, which is probably
> much larger than the 10 bytes you're saving).

  In theory, the maximum size of an SQLite database is 128 TB.

  2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12.

  (I know http://sqlite.org/limits.html says 14TB, but I think they
   dropped a digit)

  If your file system can handle this or not is a different story.

  Using SQLite for this type of data seems very questionable, however.
  As Stephen points out, the database with just the addresses is likely
  to be in the 3 to 4 TB range. You said "There will be 100 billion
  entries or so like this, which makes it necessary to use the
  database," but I think just the opposite is true.  If you have a
  *very* large number of data points with with a very specific access
  pattern, using a general purpose tool seems like exactly the wrong
  choice.  You need some custom system that is highly optimized for
  both storage space and your specific access patterns.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread abhisek...@gmail.com
Thanks a lot Stephen.

I guess I have to do this on parts of the input file at a time then, which I
will try, though it will be an approximation of the
file I am trying to implement, but that is fine. What would be the limit of
the file size that sqlite can create, assuming disc space is not
an issue.

-Abhisek

On Mon, Jul 18, 2011 at 11:01 AM, Stephan Beal wrote:

> On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
> wrote:
>
> > These are addresses accessed by a program. There will be 100 billion
> > entries
> >
>
> You won't be able to fit that many in your database - sqlite3 cannot scale
> to the file size you will need for that. Assuming 10-byte addresses (as you
> demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses
> (not including any sqlite3-related overhead per record, which is probably
> much larger than the 10 bytes you're saving).
>
> --
> - stephan beal
> http://wanderinghorse.net/home/stephan/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread abhisek...@gmail.com
Sorry the mail was sent prematurely. Continuing ...
2. In a loop, read each address,
   for that address find the next time this address was used
( to implement the replacement policy)

To do this:

1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT
PRIMARY KEY, Address UNSIGNED BIG INT)
2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address)
...

3. In the loop I am doing this (for each of the seqeunce numbers):
  SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n ,
n = 1,2, ..   (Let the address selected be m)
  DELETE FROM AddressSet WHERE SeqNo=n
// delete it so that for the next select statement we get the next time this
address was seen
  SELECT SeqNo, Address FROM AddressSet WHERE Address=m
ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address
was seen

That is all.
Please let me know if these can be optimized somehow.

Thanks a lot for your help!
Abhisek









On Mon, Jul 18, 2011 at 10:58 AM, abhisek...@gmail.com  wrote:

> Hi,
>
> I am a new user of SQLite and I have a series of steps to do on a single
> table, and it takes way too long. I will outline the steps below and please
> let me know if I can do anything different to speed things up a bit.
>
> So I am coding up a cache simulator. The input basically consists of a
> series of addresses:
> Seq No Address
> 1 0x12459
> 2. 0x03300
> ...
>
> These are addresses accessed by a program. There will be 100 billion
> entries or so like this, which makes it necessary to use the database. The
> processing is as follows:
> 1. Read the addresses into the table.
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance Improvement

2011-07-18 Thread abhisek...@gmail.com
Hi,

I am a new user of SQLite and I have a series of steps to do on a single
table, and it takes way too long. I will outline the steps below and please
let me know if I can do anything different to speed things up a bit.

So I am coding up a cache simulator. The input basically consists of a
series of addresses:
Seq No Address
1 0x12459
2. 0x03300
...

These are addresses accessed by a program. There will be 100 billion entries
or so like this, which makes it necessary to use the database. The
processing is as follows:
1. Read the addresses into the table.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Improvement

2011-07-18 Thread Stephan Beal
On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
wrote:

> These are addresses accessed by a program. There will be 100 billion
> entries
>

You won't be able to fit that many in your database - sqlite3 cannot scale
to the file size you will need for that. Assuming 10-byte addresses (as you
demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the addresses
(not including any sqlite3-related overhead per record, which is probably
much larger than the 10 bytes you're saving).

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance improvement In Sqlite 3.6.4 version

2008-11-18 Thread Doug Currie

On Nov 18, 2008, at 5:10 AM, Roshan Sullad wrote:

> [...]
> I have another Visual studio test application where I am using this
> Sqlite3.dll functionality, I have linked statically to Sqlite3.dll by
> including *sqlite3.c,sqlite3.h,sqlite3ext.h* , files in to my test
> application project. And am linking through *Sqlite3.lib* which I  
> generated
> using *Implib.exe. *Everything is linked and built successfully.

If you include sqlite3.c in the project, then you are not using  
Sqlite3.dll at all.

> So my doubt Is I have enabled above macros in the project settings of
> Sqlite3.dll, do I need to define above macros in the project settings

Yes, you must either use the DLL or the macro definitions.

e

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance improvement In Sqlite 3.6.4 version

2008-11-18 Thread Roshan Sullad
Hi Roger,

I got your mail id from one of my friend - I am facing some issues while
working with Sqlite 3.6.4, could you please clarify those things.

Actually I am upgrading Sqlite for one product from Sqlite 3.2.5 to Sqlite
3.6.4 version.

I have dowloaded *sqlite-amalgamation-3_6_4 *from sqlite site it contains 3
files *sqlite3.c,sqlite3.h,sqlite3ext.h* and prepared Visual studio dll
solution by including these files in to the project to create *Sqlite3.dll *out
put executable.

For the same project I have added below macros in to project settings to
improve the performance of Sqlite queries.

SQLITE_EXPORTS

NO_TCL

SQLITE_ENABLE_COLUMN_METADATA

SQLITE_DEFAULT_CACHE_SIZE=6000

SQLITE_DEFAULT_MEMSTATUS=0

SQLITE_DEFAULT_TEMP_CACHE_SIZE=1500

SQLITE_DEFAULT_PAGE_SIZE=4096

SQLITE_THREADSAFE=1

SQLITE_TEMP_STORE=2

SQLITE_ENABLE_RTREE

I have another Visual studio test application where I am using this
Sqlite3.dll functionality, I have linked statically to Sqlite3.dll by
including *sqlite3.c,sqlite3.h,sqlite3ext.h* , files in to my test
application project. And am linking through *Sqlite3.lib* which I generated
using *Implib.exe. *Everything is linked and built successfully.

If I compare the performance result of Sqlite 3.2.5 and Sqlite 3.6.4 version
for same set of select queries - Sqlite 3.2.5 is better than Sqlite 3.6.4.

So my doubt Is I have enabled above macros in the project settings of
Sqlite3.dll, do I need to define above macros in the project settings of my
client application also because I am linking statically to Sqlite3.dll.

Do you know what is the right way to check for the performance of queries
because in the Sqlite editor - time shown for the same Query executed more
number of times is not consistent.

I tried executing queries on Sqlite 3.2.5 and Sqlite 3.6.4 dll's directly
through Sqlite Editor - there I took average of 5 executions of query but
there also Sqlite 3.2.5 is better than Sqlite 3.6.4 version.

I have totally 70 queries with me - I have to provide maximum performance
improvement of queries from Sqlite 3.6.4 than Sqlite 3.2.5 version. So what
is the right aproach (Macro's which will boost the performance) I have to
follow to measure and improve the performance.

And client side do I need to enable the macro's while statically linking to
Sqlite3.dll.

Plz reply me back as soon as possible - I need to complete this task within
this week.

Thank you very much.

Regards/Roshan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users