Re: [sqlite] generic advice for insert performance in-memory database optimization

2017-12-21 Thread Keith Medcalf
381
  9100   969.031142100 9.844972
  9200   978.891677100 9.860535
  9300   988.752464100 9.860787
  9400   998.597371100 9.844907
  9500  1008.457877100 9.860506
  9600  1018.318437100 9.860560
  9700  1028.163434100 9.844996
  9800  1038.024322100 9.860888
  9900  1047.854005100 9.829683
 1  1057.761420100 9.907415
 10100  1067.637771100 9.876351
 10200  1078.08940010010.451629
 10300  1088.75689210010.667492
 10400  1099.58277110010.825879
 10500  1110.23725710010.654486
 10600  1121.18147910010.944222
 10700  1132.09262710010.911147
 10800  1142.93457910010.841952
 10900  1153.86677410010.932196
 11000  1164.80899810010.942224
 11100  1175.66799910010.859001
 11200  1186.50393710010.835938
 11300  1197.44014310010.936207
 11400  1208.19286010010.752717
 11500  1218.89143210010.698572
 11600  1229.73740110010.845968
 11700  1240.84606510011.108665
 11800  1251.64888510010.802819
 11900  1262.13191210010.483027
 12000  1273.30306110011.171149

Source code:

#! python3

from __future__ import print_function

import apsw
import apswrow
import random
import sys
import time

sql_create = """
create table data
(
id int,
path int,
month int,
val1 double,
val2 double,
val3 double,
val4 double,
val5 double,
val6 double,
val7 double,
val8 double,
val9 double,
val10 double,
val11 double,
val12 double,
val13 double,
val14 double,
val15 double,
val16 double,
val17 double,
val18 double,
val19 double,
val20 double
);
"""
print(sys.version_info)
db = apsw.Connection(':memory:')
db.cursor().execute(sql_create)

for row in db.cursor().execute('pragma journal_mode=off; pragma 
synchronous=off;'):
print(row)

for row in db.cursor().execute('pragma journal_mode; pragma synchronous;'):
print(row)

stmt = 'insert into data values (' + ','.join('?'*23) + ');'
print(stmt)

st = time.time()
lt = time.time()
rows = 100

# db.cursor().execute('begin immediate;')
id = 0
while id < 12000:
id += 1
data = [id, id, id]
for i in range(20):
data.append(random.random())
db.cursor().execute(stmt, data)
if id % rows == 0:
# db.cursor().execute('commit;')
ct = time.time()
print('%10d %12.6f %10d %12.6f' % (id, ct-st, rows, ct-lt))
lt = ct
# db.cursor().execute('begin immediate;')

# db.cursor().execute('commit;')
db.close()

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Nelson, Erik - 2
>Sent: Thursday, 21 December, 2017 12:06
>To: SQLite mailing list
>Subject: [sqlite] generic advice for insert performance in-memory
>database optimization
>
>I've got an in-memory database with a single table that I need to
>fill with ~500 million rows.  There are no indexes and the table
>definitions is
>
>create table data(id int, path int, month int, val1 double, val2
>double, val3 double... val20 double)
>
>I'm running on linux with the OS page size configured to 4096 and
>~380 GB of ram (much more than required for the table so I think I'm
>not swapping) and haven't altered the sqlite page size.  I am using
>sqlite version 3.13.0 and these pragmas immediately after database
>creation.
>
>pragma temp_store = MEMORY
>pragma journal_mode = off
>
>With these settings I'm seeing nonlinear (in a bad way) times for the
>insert. Is that expected?  I've fiddled about with various
>performance-related settings like described on
>
>https://blog.devart.com/increasing-sqlite-performance.html
>
>with varying results but haven't managed to arrive at fairly-linear
>insert behavior.  It's a single-threaded insert on a prepared query
>with bound arguments in a tight loop.
>
>Is linear-ish insert time a reasonable goal for an in-memory
>database?
>
>-
>-
>This message, and any attachments, is for the intended recipient(s)
>only, may contain information that is privileged, confidential and/or
>proprietary and subject to important terms and conditions available
>at http://www.bankofamerica.com/emaildisclaimer.   If you are not the
>intended recipient, please delete this message.
>___

Re: [sqlite] generic advice for insert performance in-memory database optimization

2017-12-21 Thread Simon Slavin


On 21 Dec 2017, at 7:06pm, Nelson, Erik - 2  
wrote:

> I'm running on linux with the OS page size configured to 4096 and ~380 GB of 
> ram (much more than required for the table so I think I'm not swapping) and 
> haven't altered the sqlite page size.  I am using sqlite version 3.13.0 and 
> these pragmas immediately after database creation.
> 
> pragma temp_store = MEMORY
> pragma journal_mode = off
> 
> With these settings I'm seeing nonlinear (in a bad way) times for the insert. 
> Is that expected? 

Thank you for your detailed description of your setup, which saved a lot of 
back-and-forth.  Your blog page shows a good understanding of SQLite and I can 
find no obvious errors in your code or your reasoning.  Your Blog post includes 
a very nice report of comparative testing.

The inconsistency you’re seeing is the result of caching.  Once the amount of 
data busts the amount of cache memory available, the operations have to wait 
for storage access, which takes a lot longer than RAM access.  I’m not talking 
about the cache under control of SQLite, but the amount of memory allocated to 
the program by the OS.  The differences you show for the various PRAGMA changes 
are not unexpected and don’t suggest any fault with your setup.

I’m not familiar with Windows as a dev platform, but from what I see there, 
Windows is busting a cache (OS process memory ? storage ? both ?) near the 
upper end of your testing range.

As you say in your comments, your macOS and Windows platforms are not 
comparable because the hardware isn’t identical.  I thought it worth repeating 
that here before someone gets hooked up on the faster figures for the Mac.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] generic advice for insert performance in-memory database optimization

2017-12-21 Thread Nelson, Erik - 2
I've got an in-memory database with a single table that I need to fill with 
~500 million rows.  There are no indexes and the table definitions is

create table data(id int, path int, month int, val1 double, val2 double, val3 
double... val20 double)

I'm running on linux with the OS page size configured to 4096 and ~380 GB of 
ram (much more than required for the table so I think I'm not swapping) and 
haven't altered the sqlite page size.  I am using sqlite version 3.13.0 and 
these pragmas immediately after database creation.

pragma temp_store = MEMORY
pragma journal_mode = off

With these settings I'm seeing nonlinear (in a bad way) times for the insert. 
Is that expected?  I've fiddled about with various performance-related settings 
like described on

https://blog.devart.com/increasing-sqlite-performance.html

with varying results but haven't managed to arrive at fairly-linear insert 
behavior.  It's a single-threaded insert on a prepared query with bound 
arguments in a tight loop.

Is linear-ish insert time a reasonable goal for an in-memory database?

--
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users