Re: [sqlite] Write performance question for 3.7.15

2013-01-01 Thread Dan Frankowski
Ah, interesting. However, yes, we need production-ready. Good luck with
sqlite4 tho.

On Tue, Jan 1, 2013 at 11:43 AM, Richard Hipp  wrote:

> On Tue, Jan 1, 2013 at 12:33 PM, Dan Frankowski 
> wrote:
>
> >
> > We are comparing to leveldb, which seems to have much better write
> > performance even in a limited-memory situation. Of course it offers much
> > less than sqlite. It is a partially-ordered key/value store, rather than
> a
> > relational database.
> >
>
> The default LSM storage layer for SQLite4 gives much better performance
> than LevelDB on average.  Note that most LevelDB inserts are a little
> faster than LSM, however, every now and then LevelDB encounters a really,
> really slow insert.  SQLite4 LSM avoids these spikes and hence is able to
> perform significantly faster in the long run.  SQLite4 LSM also gives you
> concurrent access and transactions - capabilities that are missing from
> LevelDB.
>
> SQLite4 gives you all the high-level schema and querying capabilities as
> SQLite3, with enhancements.
>
> OTOH, SQLite4 is not anything close to being production ready at this time.
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2013-01-01 Thread Richard Hipp
On Tue, Jan 1, 2013 at 12:33 PM, Dan Frankowski  wrote:

>
> We are comparing to leveldb, which seems to have much better write
> performance even in a limited-memory situation. Of course it offers much
> less than sqlite. It is a partially-ordered key/value store, rather than a
> relational database.
>

The default LSM storage layer for SQLite4 gives much better performance
than LevelDB on average.  Note that most LevelDB inserts are a little
faster than LSM, however, every now and then LevelDB encounters a really,
really slow insert.  SQLite4 LSM avoids these spikes and hence is able to
perform significantly faster in the long run.  SQLite4 LSM also gives you
concurrent access and transactions - capabilities that are missing from
LevelDB.

SQLite4 gives you all the high-level schema and querying capabilities as
SQLite3, with enhancements.

OTOH, SQLite4 is not anything close to being production ready at this time.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2013-01-01 Thread Dan Frankowski
I appreciate everyone's thoughts about this.

Knowing larger batch sizes help is interesting. Unfortunately, we don't
always control the batch size. We're using 1000 as an optimistic estimate,
but we receive things and may just have to commit after awhile.

Knowing that more OS file cache or a faster disk helps is also interesting.
Unfortunately, it is non-trivial to switch to SSDs. We will have a whole
fleet of machines, each storing several hundred terabytes. The sqlite
databases are meta-data about that. We might be able to use one SSD just
for the meta-data. We haven't explored that yet. We also can't use lots of
OS disk cache, as it will probably be taken by writing things other than
this meta-data.

Still, all of your observations are useful.

We are comparing to leveldb, which seems to have much better write
performance even in a limited-memory situation. Of course it offers much
less than sqlite. It is a partially-ordered key/value store, rather than a
relational database.

Michael Black writes:

Referencing the C program I sent earlierI've found a COMMIT every 1M
records does best.  I had an extra zero on my 100,000 which gives the EKG
appearance.
I averaged 25,000 inserts/sec over 50M records with no big knees in the
performance (there is a noticeable knee on the commit though around 12M
records).  But the average performance curve is pretty smooth.
Less than that and you're flushing out the index too often which causes an
awful lot of disk thrashing it would seem.
During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
constant...albeit slow

P.S. I'm using 3.7.15.1
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Simon Slavin

On 29 Dec 2012, at 9:45pm, Michael Black  wrote:

> During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
> constant...albeit slow

For the last few years, since multi-core processors have been common on 
computers, SQLite performance has usually been limited by the performance of 
storage.  Several times I've recommended to some users that rather than pouring 
their money and development effort into unintuitive programming (e.g. splitting 
one TABLE into smaller ones) they just update from spinning disks to SSD.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Michael Black
Referencing the C program I sent earlierI've found a COMMIT every 1M
records does best.  I had an extra zero on my 100,000 which gives the EKG
appearance.
I averaged 25,000 inserts/sec over 50M records with no big knees in the
performance (there is a noticeable knee on the commit though around 12M
records).  But the average performance curve is pretty smooth.
Less than that and you're flushing out the index too often which causes an
awful lot of disk thrashing it would seem.
During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
constant...albeit slow

P.S. I'm using 3.7.15.1


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Michael Black
I wrote a C program doing your thing (with random data so each key is
unique)

I see some small knees at 20M and 23M -- but nothing like what you're seeing
as long as I don't do the COMMIT.
Seems the COMMIT is what's causing the sudden slowdown.
When doing the COMMIT I see your dramatic slowdown (an order of magnitude)
at around 5M records...regardless of cache sizeso cache size isn't the
problem.
I'm guessing the COMMIT is paging out the index which starts thrashing the
disk.
Increasing the COMMIT to every 100,000 seems to help a lot.  The plot looks
almost like an EKG then with regular slowdowns.


And...when not doing the commit is it normal for memory usage to increase
like the WAL file does?


#include 
#include 
#include 
#include 
#include "sqlite3.h"
  
time_t base_seconds;
suseconds_t base_useconds;
  
void tic() {
  struct timeval tv;
  gettimeofday(,NULL);
  base_seconds=tv.tv_sec;
  base_useconds=tv.tv_usec;
} 
  
// returns time in seconds since tic() was called
double toc() {
  struct timeval tv;
  gettimeofday(,NULL);
  double mark=(tv.tv_sec-base_seconds)+(tv.tv_usec-base_useconds)/1.0e6;
  return mark;
}

void checkrc(sqlite3 *db,int rc,int checkrc,int flag,char *msg,char *str) {
  if (rc != checkrc) {
fprintf(stderr,msg,str);
fprintf(stderr,"%s\n",sqlite3_errmsg(db));
if (flag) { // then fatal
  exit(1);
}
  }
}   

int main(int argc, char *argv[]) {
  int rc;
  long i;
  char *sql,*errmsg=NULL;
  char *databaseName="data.db";
  sqlite3 *db;
  sqlite3_stmt *stmt1,*stmt2;
  remove(databaseName);
  rc =
sqlite3_open_v2(databaseName,,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE,NU
LL);
  checkrc(db,SQLITE_OK,rc,1,"Error opening database '%s': ",databaseName);
  sql = "create table if not exists t_foo (key binary(16) primary key, value
binary(16))";
  rc=sqlite3_prepare_v2(db,sql,-1,,NULL);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  rc=sqlite3_step(stmt1);
  checkrc(db,SQLITE_DONE,rc,1,"Error executing statement '%s': ",sql);
  rc=sqlite3_step(stmt1);
  checkrc(db,SQLITE_DONE,rc,1,"Error executing statement '%s': ",sql);
  rc=sqlite3_finalize(stmt1);
  checkrc(db,SQLITE_OK,rc,1,"Error finalizing statement '%s': ",sql);
  rc=sqlite3_exec(db, "PRAGMA journal_mode=WAL",NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error on WAL mode statement '%s': ",sql);
  rc=sqlite3_exec(db, "PRAGMA synchronous=OFF",NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error on synchronous mode statement '%s':
",sql);
  rc=sqlite3_exec(db, "PRAGMA cache_size=10",NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error on cache size statement '%s': ",sql);
  sql="BEGIN";
  rc=sqlite3_exec(db,sql,NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  sql = "insert or replace into t_foo(key,value) values(?,?)";
  rc=sqlite3_prepare_v2(db,sql,-1,,NULL);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  tic();
  for(i=0; i<5000; ++i) {
char key[16],value[16];
long number = random();
if (i>0 && (i % 10) == 0) {
  printf("%ld,%g \n",i,10/toc());
  tic();
}
#if 0 // COMMIT?
if  (i>0&&(i % 1000)==0) { // try 100,000 
  sql="COMMIT";
  rc=sqlite3_exec(db,sql,NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error executing statement '%s': ",errmsg);
  sql="BEGIN";
  rc=sqlite3_exec(db,sql,NULL,NULL,);
  checkrc(db,SQLITE_OK,rc,1,"Error executing statement '%s': ",errmsg);
}
#endif
memcpy(key,,8);
memcpy([8],,8);
memcpy(value,,8);
rc=sqlite3_bind_blob(stmt2,1,key,16,SQLITE_STATIC);
checkrc(db,SQLITE_OK,rc,1,"Error bind1 statement '%s': ",sql);
rc=sqlite3_bind_blob(stmt2,2,value,16,SQLITE_STATIC);
checkrc(db,SQLITE_OK,rc,1,"Error bind2 statement '%s': ",sql);
rc=sqlite3_step(stmt2);
checkrc(db,SQLITE_DONE,rc,1,"Error finalizing statement '%s': ",sql);
rc=sqlite3_reset(stmt2);
checkrc(db,SQLITE_OK,rc,1,"Error resetting statement '%s': ",sql);
  }
  return 0;
}


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
Sent: Saturday, December 29, 2012 8:19 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Write performance question for 3.7.15


On 29 Dec 2012, at 12:37pm, Stephen Chrzanowski <pontia...@gmail.com> wrote:

> My guess would be the OS slowing things down with write caching.  The
> system will hold so much data in memory as a cache to write to the disk,
> and when the cache gets full, the OS slows down and waits on the HDD.  Try
> doing a [dd] to a few gig worth of random data and see if you get the same
> kind of sl

Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Valentin Davydov
On Fri, Dec 28, 2012 at 03:35:17PM -0600, Dan Frankowski wrote:
> 
> 3. Would horizontal partitioning (i.e. creating multiple tables, each for a
> different key range) help?

This would seriously impair read performance (you'd have to access two indices
instead of one).

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Simon Slavin

On 29 Dec 2012, at 12:37pm, Stephen Chrzanowski  wrote:

> My guess would be the OS slowing things down with write caching.  The
> system will hold so much data in memory as a cache to write to the disk,
> and when the cache gets full, the OS slows down and waits on the HDD.  Try
> doing a [dd] to a few gig worth of random data and see if you get the same
> kind of slow down.

Makes sense.  A revealing of how much memory the operating system is using for 
caching.  Once you hit 30M rows you exceed the amount of memory the system is 
using for caching, and it has to start reading or writing disk for every 
operation which is far slower.  Or it's the amount of memory that the operating 
system is allowing the benchmarking process to use.  Or some other OS 
limitation.

But the underlying information in our responses is that it's not a decision 
built into SQLite.  There's nothing in SQLite which says we use a fast strategy 
for up to 25M rows and then a slower one from then on.

A good way to track it down would be to close the database at the point where 
performance starts to tank, and look at how big the filesize is.  That size 
should give a clue about what resource the OS is limiting to that size.  
Another might be to add an extra unindexed column to the test database and fill 
it with a fixed text string in each row.  If this changes the number of rows 
before the cliff edge then it's dependent on total filesize.  If it doesn't, 
then it's dependent on the size of the index being searched for each INSERT.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-28 Thread Dan Frankowski
On Fri, Dec 28, 2012 at 3:34 PM, Dan Frankowski  wrote:

> I am running a benchmark of inserting 100 million (100M) items into a
> table. I am seeing performance I don't understand. Graph:
> http://imgur.com/hH1Jr. Can anyone explain:
>
> 1. Why does write speed (writes/second) slow down dramatically around 28M
> items?
> 2. Are there parameters (perhaps related to table size) that would change
> this write performance?
>

3. Would horizontal partitioning (i.e. creating multiple tables, each for a
different key range) help?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users