Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-07 Thread IQuant
ESXI 5
Disk is on a local host, running raid 6 on a LSI raid card
Machine has HT turned off so only scheduling full cores
System takes up less than 500mb of RAM and uses less than 1% of CPU

Intel Xeon X5482 3.2Ghz
64GB RAM
6TB Raid 6
10GB NIC
VMXNET NIC#2
Windows Server 2008 R2

Dedicated Development Box
VM #1: 8 Core 6GB  (Writes SQLITE DB's to C and shares directory)
VM#2:  4 Core 4GB (Writes SQLITE DB's out to Share on VM#1)

The Tick Tapes are on 40TB NAS

Looking through the code I noticed sqlite3_exec(db, zSQL, 0, 0, 0);
wrapped inside transaction.
Suspect this is causing the CPU load, going to try recoding using
prepared statements.


 Response to Keith Medalf 
What version of ESX and where is your disk coming from?  Are you scheduling
full cores or fake cores?  What is the split between USER/SYSTEM CPU on the VM
when it tanks?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-07 Thread IQuant
There was a significant increase in throughput converting the
sqlite3_exec inserts into prepared statements:

I suspect every previous insert was compiling versus now only 1 of
every 10,000 inserts.
I'll bump this up to 100K next revision

I've changed many variables so its hard to base line the differential:

PRAGMA synchronous=OFF
PRAGMA count_changes=OFF
PRAGMA page_size=32768
PRAGMA temp_store=MEMORY
PRAGMA journal_mode=OFF

Library compiled with # define SQLITE_THREADSAFE 0


The CPU is still pegged at 100% but the extraction databases appear to
be generating much faster.
Currently testing 30 extractors running on a 4 core VM.


 Reply to Simon 
There's nothing wrong with sqlite3_exec that will causes excessive CPU usage,
but the above usage of it suggests someone put that together quickly.  The last
three parameters should be more like

NULL, 0, zErrMsg

showing no callback, a dummy argument for the non-existant callback, and
somewhere for SQLite to put an error message if there is one.  Even if you
burst the _exec() into _prepare(), _step(), _finalize(), you'll still have to
provide the equivalent parameters somewhere.

I don't know whether using 0, 0, 0 when there is no error will cause any
problems.  I suspect not.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-06 Thread IQuant
No indexes...  create db and table structure  - process tick tapes
inserting records - than create index.  Windows 2008 R2 Servers ... We
can throw hardware at this problem.

We need to be able to run 1000's of extractors concurrently processing
different tick tapes and symbol sets. aka service bureau. The Daily
tick tapes are approx 20gb each.. 30TB repository and growing. An
extraction run take 1 - 5 minutes for small symbol sets per tape.

An example would be to concurrently extract 5 years of a particular
stock's tick data. 1500 days x 2 minute extraction jobs across 15 VMs
each running 100 extractors.

I'll try journal mode off and increasing page size.

 Response to:

So you want to minimize the cpu usage? It looks like your os is
running on the last core, it has work to do too ;) Apart form using
directly a hard disk instead a vdmk virtual hard disk, i don't know
other tricks on vmware sorry.

Did you create the indexs before the mass inserts? If yes, sqlite is
reindexing after your 10.000 inserts. Create the tables without any
indexes, and recreate them after the bulk insert. Set pragma
automatic_index=FALSE too to disable automatic index creation on
primary key on the not indexed table.

Calculate how much database size increase happens when you insert the
10.000 rows. If you are using wal mode, set wal file size to this size
plus a bit more. This way only one wal file will be used. Other
journal modes (try off mode) may be faster but riskier.

9MB of memory, perhaps the system is copying data between kernel
structures and sqlite, like disk cache entries (I don't know what os
are you using, it's a blind shoot) try to increase the pragma cache
page.

Increase page size to 8K-32K, this way sqlite will reduce the amount
of administrative internal work on pages.

Of course, read the documentation, and if you have any cuestion on it ask.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-06 Thread IQuant
Running a single extractor on a VM runs near bare metal speeds.  I can
run concurrent extractors up to the # of cores allocated to the VM and
sustain decent throughput.  Once I go above this count the VM's get
crippled..  Pegged at 100% - Disk, RAM and LAN loads are miniscule.

For a test we ran 50 extractors across 2 VM's (8 core and 4 core) and
it took 14 hours to extract a single symbol.  A single exctractor
running on one 4 core VM can do the job in 50 hours.  The same run
limited to running 12 extractors finished in 4 hours.

There is obviously an issue with the virtualization but no clear fix
or workaround were aware of. Our objective is to extract and process
all tick tapes in one shot.


 Response to 
Unless pegging the CPU causes the hypervisor to preempt the guest's
synchronization primitives and cause priority inversion, in which case
OP's throughput would plummet. If throughput remains reasonably stable
in the VM, though, I would agree that a pegged CPU isn't necessarily a
problem. Another possibility: the guest OS idle loop will probably
count as CPU usage on the host, and with heavy load the guest won't
necessarily attempt to park the cores it was using in between
scheduling decisions; on bare metal, the spinning idle loop doesn't
get reported as CPU usage, even though it is.


Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-05 Thread IQuant
Yes:  Inserts wrapped in transactions 10,000 at a shot.

Tried pragma synchronous=OFF

And 3 different threadsafe compilations for sqlite:
# define SQLITE_THREADSAFE 0 /* IMP: R-07272-22309 */
# define SQLITE_THREADSAFE 1 /* IMP: R-07272-22309 */
# define SQLITE_THREADSAFE 2 /* IMP: R-07272-22309 */

All other settings are default.

all 6 versions peg the cpu when i run concurrent instances = # of cores.

Bare Metal wrote to local disk.  Will test writing to same share next.
VM's write to NAS (Windows server 2008 mapped network drive).

  Response to: 
Have you run your bare metal test using the same share?

And have you tried pragma synchronous=OFF ??

And I assume you are using transactions?

Have you profiled your app to see where it's spinning?  Probably
trying to get the lock on the share.

What flags did you use to comiple the sqlite library?  In particular
for threadsafe?

Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-04 Thread IQuant
Have a market data extractor running tick tapes and saving data to sqlite db.
All coded in C++ and previously ran fine and fast on bare metal.

Processes tick tapes from NAS and saved to local sqlite db's on c drive.

We moved program onto VMWare VM's... same nas tick tapes.  Writing
db's to a share.

On a 4 core vm we can run 3 extractors concurrently at 25 - 75% CPU
utilization.  The 4th pegs the CPU at 100%

Same on an 8 core VM ... seems we can run Cores - 1 without pegging the CPU.

The memory footprint for the program is a tiny 9mb...  On bare metal
we can run 20 extractors concurrently.   Looking for suggestions on
how to optimize for VMware.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update Query

2012-05-29 Thread IQuant
Keith,

Trying to work through your suggestions:
I'm getting Misuse of aggregate max()


CREATE VIEW TICKMAX
AS
 SELECT ASK, BID, TRADEPRICE, TIMESTAMP, SYMBOL
FROM TICKDATA
   WHERE TIMESTAMP = MAX(TIMESTAMP)
GROUP BY SYMBOL;
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update Query

2012-05-28 Thread IQuant
Not sure how to combine the subqueries.

Our TickData table looks close to:  TimeStamp, Symbol, Ask, Bid, Last
...  IQ_A, IQ_B, IQ_T, IQ_X

We need to update the IQ fields with calculations always made between
the current record and previous symbol record ordered by timestamp.  I
don't know how to eliminate the 4 subqueries...  Anyone have any code
for implementing oracle style lag / lead analytic functions?

This update query has become a monster...  string functions, math
calcs, sign checks, case, case etc.  All very simple to do in Excel
but not so easy as a Sql update query.

The IQ fields hold arrays of values formatted into a string and are
used to feed different downstream processes: ie.  charting engine and
price analysis.

IQ_A =
{1,-2,3,-19,25,-28,175,-228,359,-611,7105.5,-14165,21247.5;41019.5416927083,41019.5416741898,41019.5416241319,41019.5415546875,41019.5037196181,41019.4764363426,41019.939236,41018.6088674769,41016.6075604745,41009.610889,41002.4481603009,41002.395833912,41002.395833912}

IQ_B =
{1,-2,3,-19,25,-28,175,-228,359,-611,7105.5,-14163,21244.5;41019.5416927083,41019.5416741898,41019.5416241319,41019.5415546875,41019.5037196181,41019.4764363426,41019.939236,41018.6088674769,41016.6075604745,41009.610889,41002.4481603009,41002.395833912,41002.395833912}

IQ_T
{-1,3,-19,25,-29,175,-227.5,359.5,-611,7105,-14164,21246;41019.5416741898,41019.5416232639,41019.5415546875,41019.5038145255,41019.4764427083,41019.939236,41018.6088674769,41016.6077332176,41009.6107991898,41002.4481603009,41002.395833912,41002.395833912}

IQ_X
{39653,-13516,6737,-3974,-2650,-1874,-1376,-1084,-886,-726,-592,-522,-454,-376,-316,-302,-266,-248,-232,215;39667,-13512,6737,-3974,-2650,-1874,-1378,-1086,-886,-726,-592,-522,-454,-376,-316,-302,-266,-248,-232,215;-19456,9205,5157,-3240,-2234,-1618,-1258,-998,-808,-682,-564,-490,-412,-352,-312,-290,-262,-248,-230,209}

I'm able to run the core calculation portions of the update query very
fast.  ie.   5 seconds to update 1 ...  The string / array
manipulations are another story.
I may try a 2 pass approach:  Pass 1 update query  to record the rowid
of the previous symbol record and pass 2 use a join for performing the
IQ calcs.  I think we might have hit the wall and need to code this
externally.

You appear to be doing the same sub-select four times.  I don't know
whether SQLite optimizes these into one or not, but if you're using a
programming language I bet you could speed the query up a great deal
by doing that subselect first, and substituting the results into the
UPDATE command.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] (no subject)

2012-05-24 Thread IQuant
How can I construct a update query to calculate and set a record field
latency with the difference between timestamps by deviceid?
Appears sqlite doesn't support lag and lead.

ie.  I have a table with 1,000,000 + records collecting real time
stats from many devices with many columns but the main ones of
interest are 'timestamp', 'latency' and 'DeviceID'.

2012-05-01 13:12:11.103  Null   14356
2012-05-01 13:12:11.103  Null   14372
2012-05-01 13:12:11.103  Null   4356
2012-05-01 13:12:07.103  Null   14356

2012-05-01  13:12:11.221 Null14356

The data is collected inconsistently...  many times out of sequence

I want to update the latency field for each record with the latest
previous timestamp for the same DeviceID.

Run this update query every time new data is appended to the table.
Periodically recalc the entire table if an out of sequence file is
processed or at minimum recalc from the earliest out of sequence time
to present.

I can do this fairly easily in excel but am struggling in sql/sqlite.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Update Query

2012-05-24 Thread IQuant
Thanks for your suggestion Igor Tandetnik:

Scope creep expanded the original query to the actual trading
instruments and the refactored code has evolved to::

update TICKDATA set IQ_A = ROUND(ASK - (
   select t2.ASK from TICKDATA t2
   where t2.SYMBOL = TICKDATA.SYMBOL and t2.TIMESTAMP 
TICKDATA.TIMESTAMP ORDER BY T2.SYMBOL, t2.timestamp DESC LIMIT 1),4),
IQ_B = ROUND(BID - (
   select t2.BID from TICKDATA t2
   where t2.SYMBOL = TICKDATA.SYMBOL and t2.TIMESTAMP 
TICKDATA.TIMESTAMP ORDER BY T2.SYMBOL, t2.timestamp DESC LIMIT 1),4),
IQ_T = ROUND(TRADEPRICE - (
   select t2.TRADEPRICE from TICKDATA t2
   where t2.SYMBOL = TICKDATA.SYMBOL and t2.TIMESTAMP 
TICKDATA.TIMESTAMP ORDER BY T2.SYMBOL, t2.timestamp DESC LIMIT 1),4),
IQ_X = (select t2.timestamp from TICKDATA t2
   where t2.SYMBOL = TICKDATA.SYMBOL and t2.TIMESTAMP 
TICKDATA.TIMESTAMP ORDER BY T2.SYMBOL, t2.timestamp DESC LIMIT 1);

The consolidated tick database is 1.5GB / 32M records and the above
query takes 5 minutes to run.
Found proper index is key for performance...  Symbol,Timestamp
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users