Re: [sqlite] Enhancements for SQLite

2005-09-27 Thread Christopher Petrilli
On 9/27/05, Jay Sprenkle [EMAIL PROTECTED] wrote:
  I'm developing an open source project (a Data Access Grid) that
  uses internally the SQLite library as the storage method and I'm
  interested in your opinion about 4 questions:
 
  1) SQLite can't deal with raw devices. Should be hard to patch the
  source to deal with raw devices?

 You'll have to write your own file system code (with locking).

Seriously though, there's no reason for this.  It's not going to help
on performance unless you also add all sorts of multi-spindle
management code.  Having worked a lot with Oracle on raw-devices, it's
only beneficial when you can throw lots of different spindles at it.

Chris
--
| Christopher Petrilli
| [EMAIL PROTECTED]


[sqlite] 128-bit numbers

2005-05-03 Thread Christopher Petrilli
I'm wondering if the availability of 128-bit numbers has ever been
contemplated? The reason I'm asking is that I have an application that
needs to store IPv6 addresses, which are 128bits.  Right now, I'm
storing them in 2 64-bit fields, but this obviously complicates the
query quite a bit.

Thoughts?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] 128-bit numbers

2005-05-03 Thread Christopher Petrilli
On 5/3/05, D. Richard Hipp [EMAIL PROTECTED] wrote:
 On Tue, 2005-05-03 at 17:52 -0400, Christopher Petrilli wrote:
  I'm wondering if the availability of 128-bit numbers has ever been
  contemplated? The reason I'm asking is that I have an application that
  needs to store IPv6 addresses, which are 128bits.  Right now, I'm
  storing them in 2 64-bit fields, but this obviously complicates the
  query quite a bit.
 
 
 Declare the field as CLOB or TEXT and store the numbers
 that way.  You can store any size number you want in a TEXT
 column.  You just cannot do arithmetic on them if they are bigger
 than 64 bits.  Do you need to do arithmetic on your IPv6
 addresses?

What I was hoping to do is somewhat what PostgreSQL can do with its
inet address type. Range scanning, etc. Storing it in a blob doesn't
let you do comparison operations with SQL, only exact retrieval,
unfortunately.

I wonder how hard it would be to add a specific IP data type?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] quote() function

2005-03-28 Thread Christopher Petrilli
On Mon, 28 Mar 2005 11:57:10 -0500, Iulian Popescu [EMAIL PROTECTED] wrote:
 That doesn't work either - I get the same error.
 From the documentation:
 
 This routine returns a string which is the value of its argument suitable
 for inclusion into another SQL statement. Strings are surrounded by
 single-quotes with escapes on interior quotes as needed. BLOBs are encoded
 as hexadecimal literals. The current implementation of VACUUM uses this
 function. The function is also useful when writing triggers to implement
 undo/redo functionality.
 
 Therefore I was expecting that something like:
 
 SELECT quote(AA'AA)
 
 would work in the sense that the actual SQL statement executed will be:
 
 SELECT 'AA''AA'
 
 but it doesn't seem to.

I suspect it was intended to be used like this:

select quote(columname) from table;

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] thoughts on a web-based front end to sqlite3 db?

2005-03-08 Thread Christopher Petrilli
On Tue, 8 Mar 2005 09:17:40 -0500 (EST), Clay Dowling
[EMAIL PROTECTED] wrote:

  So, I was wondering if any of the more opinionated among you would care
  to suggest an interface language. It'll be on a Linux box, presumably
  running apache although I'm open to alternatives. The app itself uses
  sqlite3 for scheduling jobs and storing job data, so the web interface
  only needs to be able to insert some data and do visualization
  (pretty standard stuff I think).
 
 See this month's Linux journal for my opinions on the matter of interface
 languages.  SQLite3 is a dream to work with in a C++ CGI program,
 especially true if you make use of my web template library
 (http://www.lazarusid.com/libtemplate.shtml).
 
 Python is probably fine as well, as long as you aren't concerned about
 performance.

Bah, this is a red herring.  I've built Python sites that handle
millions of hits a day on top of Zope, and Zope isn't that efficient. 
There's sites running SkunkWeb that handle tons of load, and we're not
talking about anything close to that here.

Algorithms are 99% of the problem, language performance is 1%.  People
often blame the language (Python, Perl, Tcl, etc) for their own
inability to understand algorithm behavior.  O(N^2) is gonna suck on
every platform, even if written in assembly when N becomes some
reasonable number.

Use what you're comfortable writing in, because that's what you'll be
productive in. If that's C++ great, but don't not use a language
because of performance concerns unless you truly have a C10K problem.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Version 3.1.3 is a headache

2005-02-28 Thread Christopher Petrilli
On Sat, 28 Feb 2004 11:52:03 +0100, Jakub Adamek [EMAIL PROTECTED] wrote:
 Is it really so that some database server returns a result set with two
 same column names? Seems very strange. And the lovely SQLite 3.0.8
 didn't do such things ...

PostgreSQL, which holds closer to the SQL spec than any other DB I'm
aware of, refuses this syntax:

SELECT * FROM a INNER JOIN b;

You are depending on implicit join syntax that I believe is illegal in
SQL.  The fact that it happens to work doesn't make that a good idea. 
If you are explicit in the join:

petrilli=# SELECT * FROM a, b WHERE a.id = b.id;
 id | x | id | y 
+---++---
  1 | 1 |  1 | 2


As you'll notice, it returns both id columns, because you selected ALL
columns.  This is the correct behavior.  In this case, you've simply
chosen the wrong behavior.  As several other people have commented,
you should, in all join cases, explictely call out all columns that
you are interested in:

petrilli=# SELECT a.id, x, y FROM a, b WHERE a.id = b.id;
 id | x | y 
+---+---
  1 | 1 | 2

Good luck.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Feature request

2005-02-22 Thread Christopher Petrilli
On Tue, 22 Feb 2005 08:58:45 -0800 (PST), Jay [EMAIL PROTECTED] wrote:
 
 This is probably going to be hard but you did ask...
 
 SQLite version 3.0.8
 Enter .help for instructions
 sqlite create table x( a text );
 sqlite insert into x(a) values('one');
 sqlite create view y as select a from x;
 sqlite select * from y;
 one
 sqlite insert into y(a) values('two');
 SQL error: cannot modify y because it is a view
 sqlite
 
 It would be really nice if I could join multiple tables into a view
 and insert data into the view. :)

And which table did you plan for the data to go into?  What you're
asking for is data partitioning, really.  That's a totally different
concept, and I suspect outside the goals of SQLite.

-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Python bindings for SQLite 3?

2005-02-17 Thread Christopher Petrilli
On Thu, 17 Feb 2005 10:00:08 -0800, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 On Feb 17, 2005, at 9:53 AM, H. Wade Minter wrote:
  I'm playing around with some Python stuff, and was wondering if there
  were any reasonably stable bindings for SQLite 3?  I've got an
  existing SQLite 3 database that I want to work against, so I'd rather
  not drop back to SQLite 2?
 
 I have been using Roger Binn's Another Python SQLite Wrapper.  It is a
 direct wrapper of the SQLite APIs while still bringing up the API to
 Python's level of abstraction.
 
 It has worked flawlessly for me.
 
 http://www.rogerbinns.com/apsw.html
 

I'd second this, and have found it more reliable than the other Python
wrapper (PySQLite)...

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] speedtest result is obsolete

2005-02-07 Thread Christopher Petrilli
On Mon, 7 Feb 2005 10:09:58 -0800 (PST), Jay [EMAIL PROTECTED] wrote:
 
 I did a small test to see if performance was linear with time.
 I wanted to make sure it was suitable for my application.
 It seems with both indexed and unindexed tables it doesn't take
 significantly longer to do the 1,000,000th insert than it did the
 first.

My experience is that it depends heavily on the number of indexes.
With just a primary key index, it's pretty linear (though it does
degrade), however with 4-5 indexes, it degrades very quickly after a
certain point.  I could try and genericize my test rigs and publish
some numbers if it would be interesting?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Row count in database

2004-12-21 Thread Christopher Petrilli
Oy! After doing some research and actually changing machines, I
isolated the problem.

Linux.

well, no, not really, but the drivers for the SATA controller I have
are so horrendously bad that they crap out under load.  How they ever
got considered releasable is another question entirely.  Once I
moved to another box with different controller, but nearly identical
drives, the problem went away, and performance went up by a factor of
4+.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Changing table and security

2004-12-21 Thread Christopher Petrilli
On Tue, 21 Dec 2004 14:48:33 -0500, D. Richard Hipp [EMAIL PROTECTED] wrote:
 wiseguy wrote:
 
  I've been used sqlite a bit since php5 beta version. Now I am making  a
  website and at this point I must decide which database I should use. My wish
  is to use sqlite, but friends of mine say that I might have problem down the
  road if I need to add a column on database or something since ALTER TABLE is
  not available.
 
 
 Tell your friends that they need to read the FAQ.
 http://www.sqlite.org/faq.html#q13

I would also add that last time I worked with MySQL it does this under
the hood, which is why ALTER TABLE can be very expensive compared to
PostgreSQL or Oracle. It's a trade off in simplicity.
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Row count in database

2004-12-15 Thread Christopher Petrilli
On Wed, 15 Dec 2004 08:47:34 -0500, D. Richard Hipp [EMAIL PROTECTED] wrote:
 Christopher Petrilli wrote:
  Has anyone had any experience in storing a million or more rows in a
  SQLite3 database?  I've got a database that I've been building, which
  gets 250 inserts/second, roughly, and which has about 3M rows in it.
  At that point, the CPU load is huge.
 
 
 The other thing to remember is that when a table has 5 separate
 indices (4 explicit indices + 1 primary key) then each INSERT or
 DELETE operation is really doing 6 inserts or deletes.  There
 is one insert/delete for the main table and one for each of the
 indices.  So you would expect insert performance to be at least
 six times slower on a table with 5 indices versus a table with
 no indices.
 
 The other thing to remember is that when you insert on a table,
 the new row goes at the very end of the table, which is typically
 very fast.  (The BTree backend for SQLite is optimized for the
 common case of inserting a row at the end of a table.)  But an
 insert into an index will usually occur somewhere in the middle
 of the indice, and thus will likely involve some rebalancing
 operations to make space for the new entry and to keep the tree
 level.  Inserting into an index is thus typically a little
 slower than inserting into a table.  Hence we expect inserting
 into a table with 5 indices to be more than 6 times slower than
 inserting into a table with no indices.
 
 So the base insert rate of SQLite is about 25000 rows/second.
 Divide by 6 because you have 5 indices.  Divide by 4 because
 you are using synchronous=OFF instead of BEGIN...COMMIT.  This
 leaves us with an expected insert rate in your application of
 about 1000 rows/second.  We are still missing a factor of 4.
 
 Could there be a hardware difference?  What kind of equipment
 are you running on?

Actually, I do wrap inside BEGIN/COMMIT, as that was the first thing I
tried.  That created a HUGE increase in performance.  There's some
numbers up on my blog, BTW:

http://www.amber.org/~petrilli/archive/2004/11/28/sqlite_insertion_performance.html

Notice the odd behavior?  Anyway...

Past thatt, I've decided to do some of my own database manipulations,
and only add the additional indices after the database has been
closed for insertions.  That seems to help a lot, and doesn't
require but a couple seconds (I throw it into a separate thread to
do).

Hardware is a AMD64/3000, 2Gb RAM, SATA drives, Fedora Core 3

One thing I've noticed is that if I turn of synchronous, the
filesystem slowly slows down, which is fun, but it doesn't do so
enough that it's a major issue.

I'm using the APSW wrapper for Python, which is basically a very thin
wrapper over the basic API, and does nothing special, so I'm 99% sure
it's not that.  You'll notice in the web page that performance seems
to change radically at several points.

Chris

-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


[sqlite] Row count in database

2004-12-14 Thread Christopher Petrilli
Has anyone had any experience in storing a million or more rows in a
SQLite3 database?  I've got a database that I've been building, which
gets 250 inserts/second, roughly, and which has about 3M rows in it. 
At that point, the CPU load is huge.

Note that I've got syncing turned off, because I'm willing to accept
the risks.

Thoughts?  

Chris

-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Row count in database

2004-12-14 Thread Christopher Petrilli
On Tue, 14 Dec 2004 12:03:01 -0700 (MST), Ara.T.Howard
[EMAIL PROTECTED] wrote:
 On Tue, 14 Dec 2004, Christopher Petrilli wrote:
 
  Has anyone had any experience in storing a million or more rows in a
  SQLite3 database?  I've got a database that I've been building, which
  gets 250 inserts/second, roughly, and which has about 3M rows in it.
  At that point, the CPU load is huge.
 
  Note that I've got syncing turned off, because I'm willing to accept
  the risks.
 
  Thoughts?
 
  Chris
 
  --
  | Christopher Petrilli
  | [EMAIL PROTECTED]
 
 on linux perhaps?
 
cp ./db /dev/shm  a.out /dev/shm/db  mv /dev/shm/db ./db
 
 this will be fast.

Right, but not really workable when total DB size is in gigabytes. :-)

 are you sure it's not YOUR 'building' code which is killing the cpu?  can you
 gperf it?

Yes, my code is using under 20% of the CPU.  The rest is basically
blocked up in sqlite3 code, and kernel time.  In order to eliminate
all possibility of my code being the issue, I actually built a rig
that prebuilds 10,000 rows, and inserts them in sequence repeatedly
putting new primary keys on them as its going alone.  So the system
basically just runs in a loop doing sqlite calls.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]


Re: [sqlite] Row count in database

2004-12-14 Thread Christopher Petrilli
On Tue, 14 Dec 2004 13:54:35 -0500, D. Richard Hipp [EMAIL PROTECTED] wrote:
 Christopher Petrilli wrote:
  Has anyone had any experience in storing a million or more rows in a
  SQLite3 database?  I've got a database that I've been building, which
  gets 250 inserts/second, roughly, and which has about 3M rows in it.
  At that point, the CPU load is huge.
 
 
 I just ran a test case inserting 3 million rows in a database.
 Wall-clock time was 122 seconds for a rate just shy of 25000 inserts
 per second.  The final database size was 222428160.  To check to see
 if performance was falling off with increases size, I then inserted
 an additional million rows.  41 seconds: 24390 inserts per second.
 New file size 297440256.  This is on three year old hardware.

That was about my insert performance as well (AMD64), as it's
basically disc limited any more.  I found substantial differences
between my SATA drive and my PATA drive though.

 Inserts can be significantly slower if you have indices.  The more
 indices you have the slower things might go.  (Depending on what
 your indices and your data look like.) If possible, it is recommended
 that you do all your inserts first, then do the CREATE INDEX statements
 as a separate step afterwards.

Alas, I think it is the indexing that's killing me.  I'm contemplating
shrinking the size of the database (i.e. partitioning the data into
multiple databases, and doing some in-memory joining in my
application), which means it'd be easier to add the indexes after the
database has been filled if that makes sense.
 
 What does your schema look like?

Here's the schema, or mostly... I've removed a bunch of columns that
aren't really public... think of it as another 8 INTEGER columns,
without indexes.

CREATE TABLE events (
event_id VARCHAR(32) PRIMARY KEY,
sensor_ts INTEGER NOT NULL,
my_ts INTEGER NOT NULL,
sensor_id INTEGER NOT NULL,
src_ip INTEGER NOT NULL,
dst_ip INTEGER NOT NULL,
event_class INTEGER NOT NULL,
event_type INTEGER NOT NULL,
user_name TEXT,
info TEXT);

CREATE INDEX events_sensor_ts_idx ON events(sensor_ts);
CREATE INDEX events_conduit_ts_idx ON events(conduit_ts);
CREATE INDEX events_src_ip_idx ON events(src_ip);
CREATE INDEX events_dst_ip_idx ON events(dst_ip);
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]