Re: [sqlite] Efficient hash lookup table emulation with SQLite - how?

2010-11-26 Thread Black, Michael (IS)
sqlite> CREATE TABLE t (key TEXT PRIMARY KEY, count INTEGER);
sqlite> INSERT OR REPLACE INTO t VALUES("key1",coalesce((SELECT count FROM t 
WHERE key="key1"),0)+1);
sqlite> SELECT * FROM t;
key1|1
sqlite> INSERT OR REPLACE INTO t VALUES("key1",coalesce((SELECT count FROM t 
WHERE key="key1"),0)+1);
sqlite> SELECT * FROM t;
key1|2
 
It's not a hash table lookup though...it's a b-tree
 
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 



From: sqlite-users-boun...@sqlite.org on behalf of Alexei Alexandrov
Sent: Thu 11/25/2010 8:07 AM
To: sqlite-users@sqlite.org
Subject: EXTERNAL:[sqlite] Efficient hash lookup table emulation with SQLite - 
how?



Hi,

I'm trying to solve efficiently the following task with SQlite:
* There is a trace file which contains a big number of some objects.  Each
object has a number of fields which constitute its primary key (PK).
* The objects are loaded into a table which has a number of PK columns (mapped
from the trace object PK properties) and also has a number of non-key columns
which are used to aggregate certain information about objects - e.g. count of
objects, or min/max timestamp of the object instance in the trace.
* The loading function needs to:
** Understand whether the object is present or not in the table already.  This
is done by the object PK fields.
** If it is present, update its non-key fields.
** If it is not present, insert new object filling the non-key fields with
default values.

As an example, consider table

CREATE TABLE t (key TEXT PRIMARY KEY, count INTEGER);

which would be used for counting how many times a certain word appeared in the
text.  We need to walk over word list and either set the counter to 1 or
increase it if the value is already present.

Ideally, I would like to do the following:
* INSERT operation for the primary key like

  INSERT INTO t (key, count) VALUES (?, 1);

and if primary key already exists, get the rowid of that row so that I can do

  UPDATE t SET count = count+1 WHERE rowid = ?;

passing the rowid found during failed insertion operation.

But, it's not possible now - rowid is not returned during failed PK contraint
insertion.  I cannot do "INSERT OR REPLACE" because REPLACE removes the row and
so count will be lost.

So currently I have to do:
* First, SELECT operation to try to find the row by primary key and return its
rowid.
* If select operation returned rowid, use that to do UPDATE.
* If select operation did not return anything, do INSERT.

I feel that there should be more efficient ways to do this hash table emulation.
 Are there?  Or am I trying to get something irrelevant from a SQL database?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient hash lookup table emulation with SQLite - how?

2010-11-26 Thread Simon Slavin

On 25 Nov 2010, at 2:07pm, Alexei Alexandrov wrote:

> * There is a trace file which contains a big number of some objects.  Each
> object has a number of fields which constitute its primary key (PK).
> * The objects are loaded into a table which has a number of PK columns (mapped
> from the trace object PK properties) and also has a number of non-key columns
> which are used to aggregate certain information about objects - e.g. count of
> objects, or min/max timestamp of the object instance in the trace.


> * The loading function needs to:
> ** Understand whether the object is present or not in the table already.  This
> is done by the object PK fields.
> ** If it is present, update its non-key fields.
> ** If it is not present, insert new object filling the non-key fields with
> default values.

Okay, first you really have two TABLEs here.  You have a trace TABLE, and a 
summary TABLE.

The summary table doesn't really need to exist as a real SQL table.  You can 
produce everything in it using SELECT queries and GROUP BY.  For instance

SELECT count(*),max(nosesize) FROM trace GROUP BY keycol1,keycol2,keycol3

So you don't need to actually make your summary table at all.  You can do it 
all with queries.  Take a look at



No, hold on, take a look at



That's a serious solution: just have SQL do all the summary calculations for 
you whenever you need them.

However, it may be efficient to keep the summary table around, for example if 
you do queries to it again and again and don't want to require all the 
processing that would be involved in the GROUP BY queries.  One way to do this 
would be to create the trace TABLE, and to use a TRIGGER on it so that when you 
insert a new row in the trace TABLE, the summary TABLE was automatically 
updated.  See



However, unless you keep all your trace data in TABLE it will be impossible to 
'remove' a line from the trace TABLE from the summary TABLE.  Because if you're 
storing only max(nosesize) and remove the biggest nose, you don't know how big 
the second-biggest nose was without access to the other trace rows.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Efficient hash lookup table emulation with SQLite - how?

2010-11-26 Thread Alexei Alexandrov
Hi,

I'm trying to solve efficiently the following task with SQlite:
* There is a trace file which contains a big number of some objects.  Each
object has a number of fields which constitute its primary key (PK).
* The objects are loaded into a table which has a number of PK columns (mapped
from the trace object PK properties) and also has a number of non-key columns
which are used to aggregate certain information about objects - e.g. count of
objects, or min/max timestamp of the object instance in the trace.
* The loading function needs to:
** Understand whether the object is present or not in the table already.  This
is done by the object PK fields.
** If it is present, update its non-key fields.
** If it is not present, insert new object filling the non-key fields with
default values.

As an example, consider table

CREATE TABLE t (key TEXT PRIMARY KEY, count INTEGER);

which would be used for counting how many times a certain word appeared in the
text.  We need to walk over word list and either set the counter to 1 or
increase it if the value is already present.

Ideally, I would like to do the following:
* INSERT operation for the primary key like

  INSERT INTO t (key, count) VALUES (?, 1);

and if primary key already exists, get the rowid of that row so that I can do

  UPDATE t SET count = count+1 WHERE rowid = ?;

passing the rowid found during failed insertion operation.

But, it's not possible now - rowid is not returned during failed PK contraint
insertion.  I cannot do "INSERT OR REPLACE" because REPLACE removes the row and
so count will be lost.

So currently I have to do:
* First, SELECT operation to try to find the row by primary key and return its
rowid.
* If select operation returned rowid, use that to do UPDATE.
* If select operation did not return anything, do INSERT.

I feel that there should be more efficient ways to do this hash table emulation.
 Are there?  Or am I trying to get something irrelevant from a SQL database?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users