Re: [sqlite] Fastest way to insert MASSIVE data quantities

2011-05-02 Thread Eric Smith
On Thu, Apr 21, 2011 at 2:33 PM, Phoenix phoenix.ki...@gmail.com wrote:

 Hi,

 I have a database with 130 million rows. It's an RDBMS.

 I am thinking of using Sqlite3 as a kind of a backup, just for the
 important bits of data.

 Questions.

 1. Is this advisable? Will Sqlite3 hold up to this volume?


I wrote a smallish application that ran on big iron and read a bunch of data
from flat text files into a sqlite database to support future relational
queries.

I was also pre-calculating some results that I knew would be asked for and
putting those into a little side table.

The data sets were usually on the order of 1.5 billion* to 2.0 billion
records of around 300 bytes each spread across maybe 10 or 15 columns.  I
was parsing and inserting at a rate of like 200k rows/sec -- so a couple
hours to read in the whole data set.

So, no problem from sqlite's end in my application fwiw.

Eric

* I'm American -- billion == thousand million == 10^9.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is this normal?

2011-04-02 Thread Eric Smith
On Sat, Apr 2, 2011 at 8:01 PM, Matt Young youngsan...@gmail.com wrote:

 sqlite select 1 as type union select 2;
 type
 1
 2
 sqlite select 'tr' as type union select 2;
 type
 2
 tr
 sqlite

 The order of the rows change when text replaces a numeric.


My understanding is that if you don't use an ORDER BY clause, you're asking
for an unordered set.  So SQLite can print your result set in whatever order
it likes.

Eric
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using indexed fields in a table.

2011-03-05 Thread Eric Smith
On Sat, Mar 5, 2011 at 8:14 AM, BareFeetWare list@barefeetware.comwrote:

 On 05/03/2011, at 1:59 AM, J Trahair j.trah...@foreversoftware.co.uk
 wrote:

  I understand about Primary keys and Unique keys attributed to their own
 field. Is there a need to have other fields indexed, for faster searching?
 Eg. a table containing your favourite music. Say you have 9 Beethoven
 symphonies (one row in the table for each), 14 Bach partitas and 100 Haydn
 symphonies, and you want to select the Bach ones.
  You have a table called AllMusic containing columns called Composer,
 NameOfPiece, YearComposed, etc.
 
  SELECT * FROM AllMusic_tbl WHERE Composer = 'Bach' ORDER BY YearComposed


Watch this talk by an author of SQLite for a great explanation in his own
words:

http://www.youtube.com/watch?v=Z_cX3bzkExE

Eric

--
Eric A. Smith

There is no likelihood man can ever tap the power of the atom.
-- Robert Millikan, Nobel Prize in Physics, 1923
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [documentation] very minor docs bug

2011-02-15 Thread Eric Smith
http://sqlite.org/features.html

Database For Gadgets. SQLite is popular choice for the ...

should have an article, like 

Database For Gadgets. SQLite is a popular choice for the ...

Eric

--
Eric A. Smith

Windows is *NOT* a virus - viruses are small and efficient.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature suggestion for the Tcl interface : ability to use a list variable in combination with IN

2011-02-09 Thread Eric Smith
Fredrik Karlsson wrote:

 package require sqlite3
 sqlite3 db :memory:
 db eval {create table a (id INTEGER);}
 db eval {insert into a values (1);}
 db eval {insert into a values (2);}
 db eval {select * from a where id in (1,3);} vals {parray vals}
 vals(*)  = id
 vals(id) = 1
 set alist [list 1 3]
 1 3
 db eval {select * from a where id in $alist;} vals {parray vals}
 near $alist: syntax error
 --

This implies that the manner in which $alist gets expanded should be
sensitive to the SQL context in which the expansion happens (and also,
for the purposes of backward compatibility, to the value of the
variable iteslf).  

Unless I'm mistaken that would require pushing the expansion logic down 
much further into sqlite, and probably would still fail in a number of 
cases.

So I doubt you'll get much traction there, especially since this can be 
pretty easily done from your application.

Here's an option off the top of my head:

proc qSqlList {alistname} {
  set magic_array_name _qSql_${alistname}_arr
  upvar $alistname alist $magic_array_name alist_arr
  #assert {[string is list $alist]} ;# or whatever your infra provides
  array unset alist_arr
  set i 0
  set out [list]
  foreach item $alist {
set alist_arr($i) $item
lappend out \$${magic_array_name}($i)
incr i
  }
  return ([join $out ,])
}

So your call becomes:

db eval select * from a where id in [qSqlList alist] vals {parray vals}

SQLite does the expansion on the underlying list values with the proper 
sqlite3_bind_* calls etc.

The proc isn't 100% ideal because:

1. it creates this magic array in the caller's scope (not the prettiest
   thing in the world), and

2. for that reason it disallows dumb copies of the return value to float
   around.  You need to make the sqlite call close to where you do the
   quoting.

Still, it might be good enough for your purposes.  Or maybe you can 
expand on the idea, wrap it up into a cleaner interface, and go from there.

Eric

--
Eric A. Smith

Money is the root of all wealth.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Eric Smith
Yuzem wrote:

  
  Test#1
  create index tags_index on tags(tag);
  You should have an index for any fields you query on like this.
  
 
 Thanks Michael but I don't see any speed improvement:
 create index test on tags(tag);
 select count(distinct tag) from tags;
 
 This is much faster:
 select count(*) from tags;
 
 Am I doing something wrong?

As Michael mentioned, you might be getting killed by string comparisons.

I'm no SQL expert, so gurus are welcome to add to the following without
insulting me.

To beef up Michael's suggestion, try something like this:

CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));

(BTW, you should consider making movie_id INTEGER PRIMARY KEY and then
remove unique(movie_id) -- as long as that otherwise makes sense for 
your architecture.  For SQLite-specific reasons that will probably be
faster.)

CREATE TABLE tagNames(tagId INTEGER PRIMARY KEY, tagName TEXT UNIQUE);
CREATE INDEX tagNamesIdxtagName on tagNames(tagName);
CREATE TABLE tags(movie_id INTEGER,
  tagId INTEGER REFERENCES tagNames,
  unique(movie_id,tag));

CREATE INDEX tagsIdxTagId on tags(tagId);

Then see how fast it is to ask 
SELECT COUNT(*) FROM tagString;

Note: The index tagNamesIdxtagName is there because you'll probably want it 
to speed up insertions into 'tags'.

Note: If foreign key checking is on, be careful about deletes on table
tagNames.  Without an index on tags(tagId) a delete of a single row on
tagNames implies a full table scan on tags (to make sure there are no
rows referencing it).  That bit me in the past.

Eric

--
Eric A. Smith

The concept is interesting and well-formed, but in order to earn 
better than a 'C,' the idea must be feasible.
-- A Yale University management professor in response to Fred Smith's paper
   proposing reliable overnight delivery service.
   (Smith went on to found Federal Express Corp.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Eric Smith
Yuzem wrote: 

 I thought sqlite didn't handle foreign keys correctly, I am using 
 triggers to automatically delete tags when a movie gets deleted.  

There's a new thing, 'PRAGMA foreign_keys=1', that causes SQLite to 
enforce them.  Check out the docs on that.  

 Anyway, to use integers in the tags table is very complicated because 
 I will have to assign the corresponding number to each tag that I insert 
 and I have to insert lots of keywords for every movie.  

Well, technically you can have SQLite do that assignment for you.  But
yes, you'll have to deal with that mapping somewhere.  You can abstract
that away at a very low level in your infrastructure though.

It's not so terrible: again it's something like (in one high level 
language)

set tagId [db one {SELECT tagId FROM tagNames WHERE tagName=$tagName}]
db eval {INSERT INTO tags VALUES($movieId, $tagId)}

That one SELECT lookup will probably save you tons of string comparisons
later on.  Depends of course on the nature of your data set and query
stats.

 Does sqlite has any function to convert a text string into an unique 
 number?  

If you run (again correcting for your preferred language)

db eval {INSERT INTO tagNames(tagName) VALUES($tagName)}

then SQLite will generate the tagId column value for you if the column
is declared INTEGER PRIMARY KEY.  Again, you can get it back out at some
point later using SELECT as above, or if it's needed immediately (which
it probably is) you can use last_insert_rowid. 

So, again in Tcl, your whole insertion path will probably look something
like:

db eval {INSERT OR IGNORE INTO tagNames(tagName) VALUES($tagName)}
if {[db changes]} {
  set tagId [db last_insert_rowid]
} else {
  set tagId [db one {SELECT tagId FROM tagNames WHERE tagName=$tagName}]
}
assert {$tagId!=}
db eval {INSERT OR IGNORE INTO tags VALUES($movieId, $tagId)}

That could possibly be optimized further.  But you could give it a go as
a first pass and see how far it takes you.  It's not so complicated, and
in 7 lines (plus 1 sanity-check line) we've abstracted the ID-text 
mapping.

Eric

-- 
Eric A. Smith

Carperpetuation (kar' pur pet u a shun), n.:
The act, when vacuuming, of running over a string at least a
dozen times, reaching over and picking it up, examining it, then
putting it back down to give the vacuum one more chance.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is this SQLite implementation doing something stupid?

2011-02-05 Thread Eric Smith
Ian Hardingham wrote:

 I'm using an SQLite implementation that someone else made for my 
 high-level language of choice.
 
 While looking through the imp, I've just found this function, which is 
 used as the callback argument to sqlite3_exec.  Does this look like an 
 ok useage?  It seems to me like this might be doing a lot of work for 
 some data I may never use.

If it's not buggy, and you don't need it to perform better, then leave
it alone.

If you need it to perform better, don't do any optimizing until you've 
measured where your hotspots really are by using, e.g., gprof and/or
whatever profiling tools exist for your language.

If you find that this interface is really your issue, scope out how
feasible/cost-effective it is to change it.  This will be 
determined by a great many variables -- size of the code base  number
of entry points to this interface, quality of client code, level of
abstraction client code provides from this interface to higher layers, 
availability of a covering test suite, amount of time you have to do 
the work, number of round-trips you can make with your QA team (if 
they exist) before delivery, how much new client code you think will
be written that eventually calls down to this interface over the next 
weeks/months/years, etc etc etc.

If so, and if it makes sense, consider creating a new interface instead
of editing the existing one.  This allows you to upgrade the client code 
a little bit at a time to use the new interface.  The drawback is that
now you're maintaining two interfaces.

Probable hot spots in the existing code: string scans (dStrlen), string 
copies (dStrcpy) and calls to 'new'.  The fewer of those you can do, the 
better off you'll be.  

I'd say a good reference as to the proper way to hook SQLite into a
high-level language is the SQLite devs' own implementation for Tcl.

You want to grab e.g. http://sqlite.org/sqlite-tea-3070500.tar.gz
and have a look at tclsqlite3.c, function DbEvalNextCmd on line 124807
and function dbEvalColumnValue on line 124743.

Things this gets right that your implementation doesn't, from my quick
glance:

1. It keeps a cache of prepared statements and re-uses statements
   that have been run recently.  You are re-preparing statements
   every time.

2. It only asks for columns that the client is asking for (only you
   can know whether higher layers are getting this right).

3. It only computes the column names once during the statement.  You
   are computing them on every row.

4. It only processes one result row at a time (synchronously calling up
   to the high-level language), keeping memory usage low.  You are 
   stuffing the whole result set into memory before returning it to the 
   client.  Fixing this might be the biggest hassle on this list,
   because it might impact many layers above. 

5. It does not force all values to strings.  You are (and, depending
   on which language you're using, you'll probably convert them back
   to native types higher up somewhere).

Shared drawback:

1. Text data is copied.  This can't be helped for Tcl because it's 
   enforced by the Tcl extention API.  Maybe your language lets you
   just point directly at a const char* and use copy-on-write semantics
   or some such.

Eric

 Any help much appreciated,
 Thanks,
 Ian
 
 int Callback(void *pArg, int argc, char **argv, char **columnNames)
 {
 // basically this callback is called for each row in the SQL query 
 result.
 // for each row, argc indicates how many columns are returned.
 // columnNames[i] is the name of the column
 // argv[i] is the value of the column
 
 sqlite_resultrow* pRow;
 sqlite_resultset* pResultSet;
 char* name;
 char* value;
 int i;
 
 if (argc == 0)
return 0;
 
 pResultSet = (sqlite_resultset*)pArg;
 if (!pResultSet)
return -1;
 
 // create a new result row
 pRow = new sqlite_resultrow;
 pResultSet-iNumCols = argc;
 // loop through all the columns and stuff them into our row
 for (i = 0; i  argc; i++)
 {
// DBEUG CODE
 //  Con::printf(%s = %s\n, columnNames[i], argv[i] ? argv[i] : 
 NULL);
name = new char[dStrlen(columnNames[i]) + 1];
dStrcpy(name, columnNames[i]);
pRow-vColumnNames.push_back(name);
if (argv[i])
{
   value = new char[dStrlen(argv[i]) + 1];
   dStrcpy(value, argv[i]);
   pRow-vColumnValues.push_back(value);
}
else
{
   value = new char[10];
   dStrcpy(value, NULL);
   pRow-vColumnValues.push_back(value);
}
 }
 pResultSet-iNumRows++;
 pResultSet-vRows.push_back(pRow);
 
 // return 0 or else the sqlexec will be aborted.
 return 0;
 }
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
Eric A. Smith

Stocks have reached what looks like a 

Re: [sqlite] Multiple threads sharing one DB

2011-01-26 Thread Eric Smith
On Wed, Jan 26, 2011 at 11:02 AM, Ian Hardingham i...@omroth.com wrote:

 Hey guys.

 I am under the impression that there is no concurrent access to a single
 SQLite DB.  Ie if thread A is performing a query, and thread B trys to
 query, it will block until thread A is finished, no matter the query.

 1.  Is this correct?


Certainly not!  By default, an arbitrary number of readers can execute
queries concurrently.  But writers block readers and other writers.

Under the WAL journaling mode you can have readers executing concurrently
with a writer.  See http://www.sqlite.org/wal.html#concurrency for details
on that.

SQLite isn't the king of concurrency by any stretch, but it does much better
than at-most-one-querier-at-a-time.

Eric

--
Eric A. Smith

Dad said, To fish you must be patient.  Since Dad is not patient,
and since he was catching no fish, I figured he must be right.
-- Paige Smith, age 14
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [3.7.4] [BUG] [TCL] busy handler not called on lock escalation

2011-01-26 Thread Eric Smith
SQLite version 3.7.4 with TEA.  gcc was run like this:

gcc -DPACKAGE_NAME=\sqlite\ -DPACKAGE_TARNAME=\sqlite\ 
-DPACKAGE_VERSION=\3.7.4\ -DPACKAGE_STRING=\sqlite\ 3.7.4\ 
-DPACKAGE_BUGREPORT=\\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_LIMITS_H=1 -DHAVE_SYS_PARAM_H=1 -DUSE_THREAD_ALLOC=1 -D_REENTRANT=1 
-D_THREAD_SAFE=1 -DTCL_THREADS=1 -DSQLITE_THREADSAFE=1 -DMODULE_SCOPE=extern\ 
__attribute__\(\(__visibility__\(\hidden\\)\)\) -D_LARGEFILE64_SOURCE=1 
-DTCL_WIDE_INT_TYPE=long\ long -DHAVE_STRUCT_STAT64=1 -DHAVE_OPEN64=1 
-DHAVE_LSEEK64=1 -DHAVE_TYPE_OFF64_T=1 -DUSE_TCL_STUBS=1 -DSQLITE_ENABLE_FTS3=1 
-I./generic -I/home/eas/tcl8.5.9/generic-pipe -O0 -g -Wall -fPIC -c 
`echo ./generic/tclsqlite3.c` -o tclsqlite3.o

(All those parameters were auto-generated by 'configure' except -O0,
which I changed from the default -O2.)

$ uname -a
Linux hudson 2.6.32.26-175.fc12.i686 #1 SMP Wed Dec 1 21:52:04 UTC 2010 i686 
athlon i386 GNU/Linux

I register a Tcl busy handler that fails to be called when a connection 
attempts to escalate its lock status.  This may also indicate a bug 
in the C API as well.

Steps to reproduce:

1. Write the following script into a file named 'foo.tcl':

===
#!/usr/bin/tclsh

load sqlite-tea-3070400/libsqlite3.7.4.so

proc ::busy {ntries} {
puts stderr [pid] finds the db busy, ntries=$ntries.  try again.
after 1
return 0
}

if {[catch {
  puts stderr [pid] hello, world
  sqlite3 db /tmp/foo.db
  db busy ::busy
  db eval begin
  db eval {create table if not exists t(a)}
  db eval commit
  db eval begin
  db eval {select count(*) from t}
  db eval {insert into t values(100)}
  after 1000
  db eval commit
  puts stderr [pid] exits successfully
}]} {
  puts stderr [pid] error in script: $::errorInfo
} 
===

2. Make sure that the 'load' command above will work by changing the path 
name as appropriate.

3. Make sure /tmp is a normal filesystem and the file 'foo.db' does not
exist there, e.g. as the remnant of some previous debugging exercise.

4. Make the script executable:

   $ chmod +x ./foo.tcl

5. Run the script twice in parallel:

   $ ./foo.tcl  ./foo.tcl

Expected result: 

Both scripts indicate that they have completed successfully by 
printing PID exits successfully.  This is because we expect
for the busy handler to be called when the database is locked,
and the busy handler always return 0, indicating that SQLite
should keep retrying until successful.

Actual result:

In almost every test run, neither instance never enters ::busy.  
Instead, I get the following output:

[hudson:~] $ ./foo.tcl  ./foo.tcl
[1] 15308
15309 hello, world
15308 hello, world
15308 error in script: database is locked
while executing
db eval {insert into t values(100)}
15309 exits successfully

I say almost almost every test run because if I remove that 
'after 1000' line, sometimes the script fails like this and sometimes 
it doesn't.  Adding the 'after' line increases the failure frequency to 
nearly 100%, probably because the winning process is holding a certain 
flavor of lock for longer.

Please let me know if you need more information.

Eric

--
Eric A. Smith

Electricity is actually made up of extremely tiny particles 
called electrons, that you cannot see with the naked eye unless 
you have been drinking.
-- Dave Barry
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Tcl API doc bug (was Re: [3.7.4] [BUG] [TCL] busy handler not called on lock escalation)

2011-01-26 Thread Eric Smith
Igor Tandetnik wrote:

 On 1/26/2011 6:39 PM, Eric Smith wrote:
  busy handler not called on lock escalation
 
 This behavior is by design. See
 
 http://sqlite.org/c3ref/busy_handler.html
 
 the part that talks about a deadlock.

Understood  agreed.

This is a bug in the Tcl API documentation, which can lead the unwary
Tcl programmer to believing the busy callback will be called in all
cases of contention.  I recommend adding some text to the section 
'The busy method'.  This example was largely lifted from the C API
doc:

==
The presence of a busy handler does not guarantee that it will be
invoked when there is lock contention.  If SQLite determines that
invoking the busy handler could result in a deadlock, it will go ahead
and throw an error instead of invoking the busy handler. 

Consider a scenario where one process is holding a read lock that it is 
trying to promote to a reserved lock and a second process is holding a 
reserved lock that it is trying to promote to an exclusive lock. The first
process cannot proceed because it is blocked by the second and the
second process cannot proceed because it is blocked by the first. If
both processes invoke the busy handlers, neither will make any progress.
Therefore, SQLite throws an error for the first process, hoping that
this will induce the first process to release its read lock and allow
the second process to proceed.

See http://www.sqlite.org/lockingv3.html for more details.
==

Eric

--
Eric A. Smith

Slurm, n.:
The slime that accumulates on the underside of a soap bar when
it sits in the dish too long.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Patch to allow DISTINCT in multi-argument aggregate functions

2011-01-20 Thread Eric Smith
Stephen Oberholtzer wrote:

 Good news, everyone! I've taught SQLite to feel love!

FINALLY.  I put in that feature request like 3 years ago.

--
Eric A. Smith

Computer programs don't like being anthropomorphized.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64 bit sqlite 3

2010-12-17 Thread Eric Smith
On Fri, Dec 17, 2010 at 4:36 AM, giuseppe500 giuseppe...@yahoo.it wrote:
 There is a version of SQLite 3 for 64-bit systems?
 or, you can simply compile the source of sqlite3 at 64-bit with c++ 2008?
 thanks.

FWIW I compiled sqlite 3.6.23.1 along with its tcl hooks and have been
happily using it in a (single-threaded, multi-process) 64-bit
application in tcl and C on both FreeBSD6 and linux 2.6.18-164,
RHEL5.4 for 6 or 8 months with no issues whatsoever.  The application
parses a superset of csv (with arbitrary optional field separators,
arbitrary optional quotation characters, arbitrary optional padding
characters, arbitrary record separators, field data type checking etc)
and exposes relational queries on the data set.

The app screams along at 16 records per second on the parse side,
and I'm still pretty sure it's my parse code that's the bottleneck and
not sqlite.

Eric
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXTERNAL:Re: Cross-compiling SQLite forBlackfin+uClinux?

2010-12-02 Thread Eric Smith
Gilles Ganault wrote: 

 Sorry about that :-/ Apparently, gcc will compile and link in one go.  
 Is a Makefile required to add the -lpthread switch, or can this be 
 done on the command line?  

You can do it on the command line -- just try it.  

make(1) (and its typical input, a set of makefiles) is just a clever 
wrapper for shell commands.  It's not required to do anything at all, 
though life soon becomes miserable without it or something similar.  

 BTW, am I correct in understanding that originally, gcc stood for GNU 
 C compiler, while now, gcc is just a language-neutral front-end that 
 can compile different source files (C, C++, Fortran, etc.)  and is 
 smart enough to guess what type of source files it is handed and call 
 the right compiler accordingly?  

gcc == command line tool for preprocessing/compiling/linking C 
programs.  

GCC == GNU Compiler Collection, a suite of tools for compiling tons of 
different languages.  See http://gcc.gnu.org.  

Eric 

-- 
Eric A. Smith

Wright, rightly righting waiting wights' weighty writing rights 
writ white, writes right. 
-- artifex2004
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite really SLOW on Solaris?

2010-11-20 Thread Eric Smith
Lynton Grice wrote:

 Many thanks for your feedback, much appreciated ;-)
 
 But why would that happen on Solaris and not Debian?

Did you try it on real OSs  not VMs?  Are you sure the host OS is
giving the same compute  disk slices to the VMs?  Any other differences
in those mappings?

Are the VMs running on the same host box?  If not, are they running on
the same hardware and host OS?

Along the lines of Roger's comment, some VM implementations have 
OS clocks that simply don't work right -- they run too fast or too slow 
(I have personally observed differences of up to 3x), because they 
assume they are on real hardware.  Have you tested that?

Did you try compiling it with profiling data turned on to see where the
bottlenecks are (using gcc's -pg flag plus gprof(1))?

Or, as a simpler test, you could run your app from strace(1) (assuming
solaris has such a thing) to see if you are spending loads of time in
system calls as Roger hypothesized.

 Also, how would I pass the -D flags? Do I do that when I compile SQLite?

Yes -- those are arguments to GCC that are similar to writing a #define 
compiler directive into your code.  So saying

gcc -DFOO=bar ... 

would be like writing

#define FOO bar

into the code.

The sqlite.org docs describe what they do pretty well iirc.

Eric

--
Eric A. Smith

You can observe a lot by watching.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DB rotate / incremental

2010-10-28 Thread Eric Smith

 I currently use a single database and it works fine.  
 The trouble is replication.  
 Or the sharing of an ever growing database.  

I think your question is more architectural than sqlite-specific.

If your user base / data set is going to be getting more serious and 
if you have some time on your hands, it doesn't seem like sqlite is 
the right tool for you.  You may be better off with a nice fat 
traditional rdbms with built-in replication, load balancing, etc.  

So what follows is a not-as-good hack, but you could probably get it 
up in a few hours.  



If you're only doing inserts and selects then (as far as I understand) 
existing pages of the database should largely remain constant, 
regardless of your schema.  I'm sure the sqlite experts can expound on 
that or explain in what circumstances it's wrong.

If that's the case, then you could leverage an incremental file transfer 
program like rsync(1) to do replication.  

Keep two copies of your database around, one with all the latest stuff 
(call it 'current') and one replication candidate that you'll only 
update every once in a while (call it 'historical').  When you want,
attach 'historical' and sync the latest stuff from 'current' into it.
Obviously you want that to be fast (not scan the whole table), so 
hopefully your schema permits that.  Then detach 'historical' and rsync
'historical' to wherever it needs to go.  Only the new bytes will go over 
the wire.

You implied you'd already tried something like that -- just wanted to
point out rsync in particular if you hadn't looked into it. 



Another point is that, depending on the specifics of your data set and 
the metrics you're calculating on it, you may be able to accumulate the 
answers to your big 'select' queries as you go along, in a separate metrics
table.  E.g. use triggers, or run the 'select' only on each new chunk of 
data and update the metrics table accordingly.

Then you can replace your big select on the main table with a simple
(hopefully constant-time) select on the metrics table.

This only works if your metrics have the nice property that they can be 
incrementally calculated using O(1) space.  E.g. sum, count, mean, stddev, 
but not (afaik) median.

In fact, if you are only inserting (never deleting), and you already know
what all your metrics are, and they have that property, then you don't
need to keep the main table at all -- just the metrics table :)

Eric

-- 
Eric A. Smith

Keeping Young #1:
Avoid fried meats which angry up the blood.
-- Satchel Paige
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What happens when PK reaches max integer

2010-10-14 Thread Eric Smith

The sqlite int space is 64 bits.  You'd have to generate 1 billion keys 
per second for ~600 years to exhaust the space.  Is my math right?

Eric

 Hello, 
 
 My Primary Key is an integer.  In the lifetime of a product, it may 
 increase to the maximum possible value of an integer.  Our processor will 
 make it overflow into a very large negative number and so on.  
 
 My specific question is, if overflow occurs, how does sqlite deal with 
 it?  
 
 1.  What if its a large negative number ?  
 2.  What if there is already a PK with the rolled over value, say 1?  
 
 This maybe a common problem you all may have ran into.  Hope you can 
 help answer.  
 
 Thanks, 
 Kavita 
 ___ 
 sqlite-users mailing list 
 sqlite-users@sqlite.org 
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 


-- 
Eric A. Smith

Aeropalmics (ayr o palm' iks), n.: 
The study of wind resistance conducted by holding a cupped 
hand out the car window.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trouble when i insert during long time

2010-09-02 Thread Eric Smith
My Spanish isn't perfect but it looks like you're running out of file 
descriptors.  I.e. you (or a library or module you are using) is creating file 
descriptors using e.g. open(2) and never close(2)ing them. 

Please forgive typos--I sent this from my smart phone.

On Sep 1, 2010, at 15:34, andres felipe tamayo cortes 
andrewt12...@hotmail.com wrote:

 
 Hi i made a program on C language who inserts some data on sqlite database, 
 it works well but after a while (10,15 minutes) it stops to work and throws 
 me this mistake:
 
 on console:
 unknown:11373): Gtk-WARNING **: Error loading theme icon 'gtk-ok' for 
 stock: No se ha podido cargar el módulo de carga de imágenes: 
 /usr/lib/gtk-2.0/2.10.0/loaders/svg_loader.so: 
 /usr/lib/gtk-2.0/2.10.0/loaders/svg_loader.so: no se puede abrír el archivo 
 de objeto compartido: Demasiados archivos abiertos
 
 on a window:
 unable to open database file Unable to fetch row
 
 i dont know what im making wrong, or if maybe is it a mitake of sqlite?
 
 
 this its part of my code:
 
 
 
 
 QSqlQuery inserta;
   inserta.exec(insert into t1 (voltaje) values (1););
inserta.prepare(insert into t1 (voltaje, direccion) 
 VALUES(:voltaje,:direccion));//primero preparas la query asi
  //  inserta.bindValue(:voltaje,dest16[i]);//aca asignas la 
 variable
   inserta.bindValue(:voltaje,gua);//aca asignas la variable
   inserta.bindValue(:direccion,addr1);
   addr1=addr1+1;
   if(!inserta.exec())
  {
   QMessageBox::critical(this,Error al grabar,No se pudo 
 grabar en la base de datos:\n+inserta.lastError().text());
  }
 i hope you can helpme
 
 thanks!!
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] next value in sequence

2010-08-31 Thread Eric Smith
Scott Frankel wrote:

 Hi all,
 
 How does one find the next value of a serial item?  Given a simple  
 table with a serial primary key, I'd like to get the next available  
 integer key value.  eg:
 
 CREATE TABLE foo (
   foo_id  SERIAL  PRIMARY KEY,
   nametextDEFAULT NULL);
 
 SELECT foo_id, name FROM foo;
  
 1|red
 2|green
 3|blue
 
 SELECT ???;
  
 4

You probably don't want 'SELECT max(foo_id)+1' because I think that does 
a full table scan.

I recommend you re-define your key as 'foo_id INTEGER PRIMARY KEY'.  

If you want to INSERT a new value with the next id you can just say

INSERT INTO foo(foo_id,name) VALUES(NULL, 'name')

and get the newly-inserted id by saying SELECT last_insert_rowid().

If you only want to get what *would* be the next value, I believe

SELECT 1 + (SELECT foo_id FROM foo ORDER BY foo_id DESC LIMIT 1)

will run in constant time.

Eric

--
Eric A. Smith

Some people have told me they don't think a fat penguin really 
embodies the grace of Linux, which just tells me they have never 
seen an angry penguin charging at them in excess of 100mph. They'd 
be a lot more careful about what they say if they had. 
-- Linus Torvalds, 1996
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the most efficient way to get the close by numbers?

2010-08-20 Thread Eric Smith
Peng Yu wrote:

 I have the following code to search for neighboring positions
 (distance =10). But it is slow for large data set. I'm wondering what
 is the most efficient query for such a search. Note that I don't
 create an index, as I'm not sure what index to create on table A.

I haven't used it myself, but I'm pretty sure this is what the R*tree 
module was designed for:

http://www.sqlite.org/rtree.html

--
Eric A. Smith

Drill for oil? You mean drill into the ground to try and find oil? You're
crazy.
-- Drillers who Edwin L. Drake tried to enlist to his project 
   to drill for oil in 1859.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Igor Tandetnik wrote: 

  How would you find a row whose column X contained value Y if the 
  partial index on column X specified that rows containing value Y 
  in column X should never be returned?  
 
 No one suggests partial index should be capable of hiding anything.  The 
 idea is that, when the query can be proven to only involve rows covered 
 by the partial index, the index can be used to speed up the query.  
 Otherwise, it simply won't be used.  

Right.

-- 
Eric A. Smith

Well then, let's go on.  Sorry, there's nothing to go on to.
Let's digress.
-- Tim Pulju, Winter 2005
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Stephen Oberholtzer wrote:

 I believe what he's getting at is this:
{snip explanation}

You exactly understand what I'm going for and my use case.

Is there a better way to implement it in sql itself than what I
outlined?  I.e. create my own index table that points to the proper
rows and keep it updated via triggers?

Eric

--
Eric A. Smith

Gnagloot, n.:
A person who leaves all his ski passes on his jacket just to
impress people.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith

Sorry, let me amend that:

 The schema is roughly
 
 create table records(__recno INTEGER PRIMARY KEY, fileId, data);

Forget the INTEGER PRIMARY KEY.  My partial index would reference 
the _rowid_.  I don't permit vacuums on the database so, if I'm not
mistaken, this shouldn't be an issue.

--
Eric A. Smith

Parkinson's Fourth Law:
The number of people in any working group tends to increase
regardless of the amount of work to be done.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Kees Nuyt wrote:

 You could add a deleted column with value range (0,1) and
 create an index on it if benchmarks show that makes it
 faster. As a bonus it is easier to code and maintain than a
 separate table with references and triggers.
 
 Alternatively, you can create an composite index with the
 deleted column as one of the components.
 
 From a theoretical view, if you care about the visibility of
 a row, you should express it as an attribute of the entity.
 The solutions above comply with that notion.

I think you misunderstand what I want.

I don't care about keeping the row around after it is deleted.  I don't
care about visibility.

I only want to find rows quickly in order to delete them.

--
Eric A. Smith

Bowlikinetics (boh lih kih neh' tiks), n.:
The act of trying to control a released bowling ball 
by twisting one's body in the direction one wants it to go.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Eric Smith
Simon Slavin wrote: 

 http://www.sqlite.org/lang_createview.html 
 
 This is the SQL standard way to reduce your view of a table to just 
 certain rows.  If I understand your request, this feature should provide 
 exactly what you want.  Appropriate indexes will be used when consulting 
 any VIEW you've defined.  

I don't think that helps either.  A view in sqlite is just syntactic
sugar for a select statement.  I don't want to define any real indices 
-- they are a performance burden.

  Something like DELETE FROM records WHERE __recno IN (SELECT __recno 
  FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records.  
 
 I don't understand what you're looking up here.  If you have some 
 method of recognising which rows of a table should be deleted just use the 
 appropriate 
 
 DELETE FROM ...  WHERE ...  
 
 command.  No need for any sub-SELECT clause.  

Maybe it'll be clearer if I describe my (quite simple) use case.  Our
app is caching what are basically csv files.  Hundreds of files, about 2m 
records per file.  Sometimes we want to delete all the cache rows for one 
of the files.  We know ahead of time which file it will be -- let's say
it's file 7.

The schema is roughly

create table records(__recno INTEGER PRIMARY KEY, fileId, data);

So sometimes we would *like* to say delete from records where
fileId=7.

But that is bad because does a full table scan.

So the next cut is to say create index recordsIdxFileId on
records(fileId).

But that is bad because it is a huge burden during INSERT and is not
used often enough (or with enough values) to justify its existence.

What I really want is to be able to say create index
recordsIdxFileIdOnFile3 on records(fileId) where fileId=7.  
But sqlite doesn't do that.

So let's assume SQLite is much faster at deleting rows by the INTEGER
PRIMARY KEY than it is by deleting rows by some other value.  Then we
can optimize by keeping track of which __recnos we will want to delete.

Hence my idea for having a separate table that maps fileId -- __recno,
but only for the fileId we care about.

Eric

-- 
Eric A. Smith

Keeping Young #6:
Don't look back. Something might be gaining on you.
-- Satchel Paige
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] partial index?

2010-08-19 Thread Eric Smith
Afaict sqlite doesn't support indices on subsets of rows in a table, Ю
la http://en.wikipedia.org/wiki/Partial_index -- right?

Any plans to implement that?

Are there any known hacks to implement something similar?

--
Eric A. Smith

Keeping Young #3:
Keep the juices flowing by janglin round gently as you move.
-- Satchel Paige
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-19 Thread Eric Smith
Tim Romano wrote: 

 The partial index is one very messy thing, fraught with ambiguities, 
 something to avoid.  

I want an index that only can be used to find rows with a particular 
value or set of values.  In what way is that ambiguous?  Other databases 
(e.g. postgres) seem to support this kind of thing.  

 I can imagine other business rules being really 
 bollixed up by the sudden reappearance of zombie rows.  

This isn't a 'business rule', this is an optimization.  No high 
level logic will change.  Just like when we use other sql indices.  

 Under the partial index method, how would 
 you ever find a row again once it has become invisible, unless you were 
 perhaps to change or suspend the partial index rule, and cause the missing 
 rows to reappear?  

Become invisible, meaning it no longer contains data that I care 
about?  I don't need to find it quickly because it no longer contains 
data that I care about.  

So, I'm not sure I understand your concerns.  

Since SQLite doesn't support partial indices directly, I'm 
thinking about making my own index as a separate table and 
populating/depopulating it using triggers on the main table.  I only 
need it for fast lookups during deletion of the relevant rows, so I'll 
hijack the app logic that wants to delete those rows and instead use 
the secondary table to get the row ids, and delete those directly.  

Something like DELETE FROM records WHERE __recno IN (SELECT __recno 
FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records.

Am I missing something?

Eric 

-- 
Eric A. Smith

Furbling, v.:
Having to wander through a maze of ropes at an airport or 
bank even when you are the only person in line.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.7.0

2010-07-22 Thread Eric Smith
Darren Duncan wrote: 

 I don't have time to investigate right now, but both failing tests seem 
 to be connected with concurrent access to a table by two forked processes 
 (the test script forks a child, which does concurrent access).  
 
 At least in the second case, the DROP TABLE and CREATE TABLE commands 
 are issued by the main process (after the child has dropped table2) and 
 are supposed to succeed, so I believe there's something else going on than 
 changed error codes (unless they trigger a bug within SQLite itself).  

I assume you aren't sharing a single database connection object between 
the parent and child processes.  That would violate one of SQLite's
use assumptions and could definitely cause problems.

Eric

-- 
Eric A. Smith

Money is the root of all wealth.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Help with complex UPDATE question

2010-07-22 Thread Eric Smith
peterwinson1 wrote:

 Hello,
 
 I have a some what complex question about UPDATE.  I have the following
 table
 
 table1 (KEY, COL1)
 
 0, 1
 1, 2
 2, 3
 3, 4
 
 What I would like to do is to UPDATE COL1 by subtracting the COL1 value
 where KEY = 0 from the COL1 value of the current row so that the result
 would be.
 
 0, 0
 1, 1
 2, 2
 3, 3
 
 Can this be done in SQL?  It does not have to be one UPDATE/SELECT
 statement.  

Not sure if it's optimal, but it works:

update table1 set col1=col1-(SELECT col1 FROM table1 WHERE key=0);

The full session:

[hudson:~] $ sqlite3 
SQLite version 3.6.20
Enter .help for instructions
Enter SQL statements terminated with a ;
sqlite create table table1(key integer primary key, col1 integer);
sqlite insert into table1 values(0,1);
sqlite insert into table1 values(1,2);
sqlite insert into table1 values(2,3);
sqlite insert into table1 values(3,4);
sqlite select * from table1;
0|1
1|2
2|3
3|4
sqlite begin;
sqlite update table1 set col1=col1-(SELECT col1 FROM table1 WHERE key=0);
sqlite select * from table1;
0|0
1|1
2|2
3|3

Eric

--
Eric A. Smith

(1) Auto da fe (2) Beating with clubs (3) Beheading: Decapitation 
(4) Blowing from cannon (5) Boiling (6) Breaking on the wheel (7) Burning 
(8) Burying alive (9) Crucifixion (10) Decimation (11) Dichotomy 
(12) Dismemberment (13) Drowning (14) Exposure to wild beasts etc. 
(15) Flaying alive (16) Flogging: Knout (17) Garrote (18) Guillotine 
(19) Hanging (20) Hari kari (21) Impalement (22) Iron Maiden 
(23) Peine Forte et Dure (24) Poisoning (25) Pounding in mortar 
(26) Precipitation (27) Pressing to death (28) Rack 
(29) Running the gauntlet (30) Shooting (31) Stabbing (32) Stoning 
(33) Strangling (34) Suffocation.
-- List of execution methods compiled by the New York State 
   Commission to Investigate and Report the Most Humane and 
   Practical Methods of Carrying into Effect the Sentence 
   of Death, 1888
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Insert Speed Optimization

2010-07-14 Thread Eric Smith
Black, Michael (IS) wrote: 

 Does anybody know how to make the journal file go to a different 
 location than the database?  Apprarently it's not treated as a temporary 
 file.  Perhaps it should be??  

Seems like you'd have to communicate the journal location to other 
processes, meaning you'd have to write the name of the journal file 
into the main db, in the header or something.  I think sqlite doesn't 
do that at the moment, which means you'd have to change the file 
format, which sqlite devs are loath to do.

-- 
Eric A. Smith

Worthless.
-- Sir George Bidell Airy, KCB, MA, LLD, DCL, FRS, FRAS
   (Astronomer Royal of Great Britain), estimating for the
   Chancellor of the Exchequer the potential value of the
   analytical engine invented by Charles Babbage, September
   15, 1842.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Insert Speed Optimization

2010-07-14 Thread Eric Smith
Werner Smit wrote: 

 After taking out count(*) and adding a few pragma's and saving 6000 
 records rather than 500 at a time I've got it down to 34 minutes.  
 If I build in on local drive it takes 28 minutes.(with chunks of 500) 

Why not do an apples-to-apples test and commit the same number of 
records per batch in each test?  The idea was to vary only one thing 
(the mount point) and keep all other variables constant.  

250ms ping times, wow.  SQLite write speeds will be better if the NFS 
server is on the same planet as the client.  

-- 
Eric A. Smith

Software is like entropy.  It is difficult to grasp, weighs nothing,
and obeys the Second Law of Thermodynamics, i.e., it always increases.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] work-arounds for (possibly buggy) nfs?

2010-07-13 Thread Eric Smith
Richard Hipp wrote:

 Try using:
 
 sqlite3 db ./foo -vfs unix-dotfile

When my application dies while holding a lock under unix-dotfile, all
subsequent runs always think the database is locked.  I guess this is
because the lock file exists.

Is there an obvious/good way to know whether the lock is really being
held?

Eric

--
Eric A. Smith

Stocks have reached what looks like a permanently high plateau.
-- Irving Fisher, Professor of Economics, Yale University, 1929.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Insert Speed Optimization

2010-07-13 Thread Eric Smith
Griggs, Donald wrote: 

 Is the percentage of the final rowcount really a criterion?  

The answer to that, according to my brief exploration, is somewhere 
between yes and very much yes, depending on various factors.

-- 
Eric A. Smith

The number of UNIX installations has grown to 10, with more expected.
-- The Unix Programmer's Manual, 2nd Edition, June 1972
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Insert Speed Optimization

2010-07-13 Thread Eric Smith
Werner Smit wrote: 

 My question(s) 
 If I want to use the insert or replace to populate my database of 
 around 1 million records.  
 And I want to do it as fast as possible.  
 What are all the tricks I can use?  

Obey the first rule of optimization: don't do it unless you're 
sure you need to.

If you're sure you need to, here are a few hints from my own 
experience.  They all come with trade-offs.  

Buy faster hardware with more memory.  I am serious.  Sometimes it is 
much cheaper to wait for 6 months for CPU speeds and memory sizes to 
increase than to spend time optimizing your code.  

Don't define triggers, indices, unique constraints, check constraints, 
or primary keys.  Make sure foreign key checking is off (which it is by 
default).  

Give sqlite a large page cache.  If you are 32 bits then sqlite can't use 
more than 4Gb of memory, so keep that in mind.  If you give sqlite more 
memory than exists on your machine, you might go to swap hell, so don't 
do that.  

If you are CPU bound and if you can split your problem into orthogonal
chunks and if you have multiple CPUs, consider farming out the work to 
worker processes and incorporating their results into the main database.

Depending on your use-case you can consider telling your operating
system to favor the disk cache over processes' memory when you are
running low on RAM.  In linux this is accomplished by setting 'swappiness' 
high, not sure about other OSs.

 I had a count(*) to check how many inserts was actually done(4 progress 
 bar) - and this slowed my down very much.  

That's because count(*) doesn't run in constant time.  I'm not sure, but
it may be linear.  Which would imply that your algo as a whole is
quadratic instead of its original (likely constant) asymptotic behavior.

 Took it out, and want to use select total_changes()  to keep track of 
 inserts.  Any problem with that?  

You sound like you are writing a multi-threaded program.  Are you sure 
that total_changes() is only counting changes due to your insertions?  

Consider keeping a loop execution counter and using that for your status 
bar.  

Good luck,
Eric 

-- 
Eric A. Smith

Electricity is actually made up of extremely tiny particles 
called electrons, that you cannot see with the naked eye unless 
you have been drinking.
-- Dave Barry
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Insert Speed Optimization

2010-07-13 Thread Eric Smith
Griggs, Donald wrote: 

 I guess I was wondering if the fastest records-per-transaction value 
 would depend on the page cache and be more or less independent of the 
 total records to be imported.  

I think the page cache is one of a great many variables.

 So, the records-per-transaction for import to a 20 million row table 
 should be twenty times the size for a 1 million row table?  

I'm no sqlite or sql guru myself, so with a grain of salt: 

If you have no reason to commit in the middle of a batch, then don't 
do it.  I think inserting all the rows in a single go will give you the 
best insert performance in most use cases.  

The idea is that there is some fixed overhead (call it O) that SQLite 
has to go through every time it commits a transaction.  The overhead is 
'fixed' because it is independent of the number of rows you inserted.  
If you insert 1m rows and commit every 500, the total commit overhead is 
2000*O.  If you commit just once, the total commit overhead is just O.  

This argument is likely a small or big lie for a number of reasons, but 
is at least a push in the right direction.  

Eric 

-- 
Eric A. Smith

The problem with engineers is that they tend to cheat in order to get results.

The problem with mathematicians is that they tend to work on toy problems 
in order to get results.

The problem with program verifiers is that they tend to cheat at toy problems 
in order to get results.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] quickly insert multiple rows into multiple table

2010-07-12 Thread Eric Smith
W.-H.  Gu wrote: 

 For (1), after I wrapped inserts into a transaction, I face an issue: 
 the cpu usage is too high.  I think the reason is that I use prepare 
 statement to insert a row at a time, it than does sqlite3_step for 
 thousands times every second.  

If I'm reading this right, you're saying you're re-preparing the 
statement on every row.  Don't do that.  Prepare it once, and only 
re-bind the variables for each insert.  Then finalize the statement 
after your loop.  

I gather that statement preparation is a non-trivial percentage of 
the compute time taken to insert a single row.

 Every second I insert 9184 rows, 

Your optimal insertion rates will vary strongly depending on your 
hardware, your operating system, the efficiency of data generation 
between calls to the sqlite api, amount of data you're inserting per 
row, amount of RAM sqlite is allowed to use, any indices, triggers, 
check constraints, unique constraints, primary keys, and foreign keys 
you have defined, compile time flags, and dozens of other variables.

I'm no SQL or SQLite expert myself -- the gurus might add many items 
to that list off the tops of their heads.

We have no way of knowing what number you should be shooting for.  

 which leads to cpu usage ~30%.  
 
 Is there any good way to resolve this issue?  

Obviously there are many variables here.  Why do you think you want low 
CPU usage?  

Generally, I think you *want* to be CPU-bound for the duration of your 
insert batch.  Anything else implies you're doing something besides 
useful computation (like waiting for disk i/o (due to a cache spill or
one of many other possible reasons), or scheduled behind some other 
process, or something).

At commit time, sqlite will write the new rows back to the disk, by 
default blocking til the OS tells SQLite that the write is complete 
(though this is configurable).  During this time you expect cpu usage 
to go down while disk usage goes up.  

Another point is that you need to make sure the OS is really telling you
what you think it's telling you.  E.g. a common mistake for a Linux user
is to think that the load average is the average CPU usage over some 
period of time.  It isn't.

Eric 

-- 
Eric A. Smith

One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Eric Smith
Roger Binns wrote: 

 I'd have no problem contributing the code to SQLite, but there isn't 
 very much of it 

I don't think the devs would complain about that.  This code has too
few lines!

 and it is an open issue as to how you report the cross fork usage should 
 it happen.  (In my case I know Python is being used.)  

I didn't look at it deeply, but out of curiosity: what's wrong with just 
returning (or causing SQLite to return) a standard unhappy return code?  

Eric 

-- 
Eric A. Smith

Usisahau ubaharia kwa sababu ya unahodha.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Noobie question about importing data into table

2010-07-10 Thread Eric Smith
Calimeron wrote:

 Chinese Char. No.  Chinese Char.  English Def. No.English Def.
 1 Char1   1   Def1
 2 Char2   2   Def2
 2 Char2   3   Def3
 3 Char3   4   Def4
 4 Char4   5   Def5
 4 Char4   6   Def6
 5 Char5   7   Def7
 
 Can this be done? 

What you want is called a join.

[hudson:~] $ sqlite3 
SQLite version 3.6.20
Enter .help for instructions
Enter SQL statements terminated with a ;
sqlite create table chChar(id INTEGER PRIMARY KEY, char TEXT);
sqlite create table englCharDefn(id INTEGER PRIMARY KEY, chChar REFERENCES 
chChar, defn TEXT);
sqlite INSERT INTO chChar VALUES(null, 'char1');
sqlite INSERT INTO chChar VALUES(null, 'char2');
sqlite INSERT INTO chChar VALUES(null, 'char3');
sqlite INSERT INTO chChar VALUES(null, 'char4');
sqlite INSERT INTO chChar VALUES(null, 'char5');
sqlite INSERT INTO englCharDefn VALUES(null, 1, 'def1');
sqlite INSERT INTO englCharDefn VALUES(null, 2, 'def2');
sqlite INSERT INTO englCharDefn VALUES(null, 2, 'def3');
sqlite INSERT INTO englCharDefn VALUES(null, 3, 'def4');
sqlite INSERT INTO englCharDefn VALUES(null, 4, 'def5');
sqlite INSERT INTO englCharDefn VALUES(null, 4, 'def6');
sqlite INSERT INTO englCharDefn VALUES(null, 5, 'def7');
sqlite select chChar.id, chChar.char, englCharDefn.id, englCharDefn.defn from 
englCharDefn join chChar ON (englCharDefn.chChar = chChar.id);
1|char1|1|def1
2|char2|2|def2
2|char2|3|def3
3|char3|4|def4
4|char4|5|def5
4|char4|6|def6
5|char5|7|def7

--
Eric A. Smith

I still maintain the point that designing a monolithic kernel in 
1991 is a fundamental error.  Be thankful you are not my student.  
You would not get a high grade for such a design.
-- Andrew Tanenbaum, to Linus Torvalds
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-10 Thread Eric Smith
Roger Binns wrote: 

 I'll bet you are actually getting exit(3) which means anything 
 registered with atexit will be run.  (SQLite does not register with 
 atexit.)  

I knew what Nico meant :) just repeated him for expositional clarity 
(deeming the distinction to be unimportant for my particular question,
and figuring that most people mean exit(3) when they say exit(2)).

 In my wrapper I provide functionality that can check SQLite objects are 
 not being used across processes.  
 
 I'd recommend doing something like this 

My fork call is from Tcl -- you've convinced me to add a line in the 
child unsetting the Tcl variable containing the name of the parent's 
database connection.  :) 

Your wrapper is nice -- have you considered folding something like it 
into the core (disabled by default, enabled by a compile-time flag) and 
submitting it to drh for official adoption?  

Thanks much to both of you!

Eric 

-- 
Eric A. Smith

More people would come here if it weren't so crowded.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Noobie question about importing data into table

2010-07-10 Thread Eric Smith
Calimeron wrote: 

 I've understood that the program (I'm 
 using http://www.sqlite.org/sqlite-3_6_23_1.zip) will first compile the 
 entire list 30.000 x 50.000 = 1.5 BILLION lines, after which it filters 
 it.  

One of SQLite's primary goals in life is to fit into a tiny memory 
footprint.  I don't know for sure, but strongly doubt that it will 
create the whole cross product of the two tables before filtering.
I guess it's more likely that either the video is wrong or you're
misunderstanding it.

 --After you've selected the lines you want, how do you save the 
 table (instead of only selecting it)?

What do you mean, save?

-- 
Eric A. Smith

gleemites, n.:
Petrified deposits of toothpaste found in sinks.
-- Sniglets, Rich Hall  Friends
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Noobie question about importing data into table

2010-07-10 Thread Eric Smith
Calimeron wrote: 

 What do you mean, save?  
 
 So that when you're done, you have 3 tables Chinese, English, Merged 
 or the original 2, but with data from the one into the other.  
 
 I don't know the terms or the procedure, but I'd like to have a new table 
 that has the Chinese and English mixed, which I then can continue to 
 manipulate.  

The answer to that depends strongly on your application.  It could be
that you should just use do whatever manipulations you need in your main
language.

It could be that your use case will be simplified using what's called 
a view.  This will essentially be a wrapper around your join query.  
You can read from the view but you can't write to it.  The view is 
always consistent with the underlying real tables.  

But the view, as far as I understand, doesn't provide a performance 
benefit over the query that defines it.  It only exists for simplified 
understanding of your problem space and your code.  

So if you think that you will run your JOIN query a lot, and you think 
that it will be a performance bottleneck, then a view is probably not 
the right way to go.

Here's what a view looks like (using my previous example code):

[hudson:~] $ sqlite3 ./foo.db 
SQLite version 3.6.20
Enter .help for instructions
Enter SQL statements terminated with a ;
sqlite .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE chChar(id INTEGER PRIMARY KEY, char TEXT);
INSERT INTO chChar VALUES(1,'char1');
INSERT INTO chChar VALUES(2,'char2');
INSERT INTO chChar VALUES(3,'char3');
INSERT INTO chChar VALUES(4,'char4');
INSERT INTO chChar VALUES(5,'char5');
CREATE TABLE chCharDefn(id INTEGER PRIMARY KEY, chChar REFERENCES
chChar, defn TEXT);
INSERT INTO chCharDefn VALUES(1,1,'def1');
INSERT INTO chCharDefn VALUES(2,2,'def2');
INSERT INTO chCharDefn VALUES(3,2,'def3');
INSERT INTO chCharDefn VALUES(4,3,'def4');
INSERT INTO chCharDefn VALUES(5,4,'def5');
INSERT INTO chCharDefn VALUES(6,4,'def6');
INSERT INTO chCharDefn VALUES(7,5,'def7');
COMMIT;
sqlite create view ChineseCharsAndDefns as select chChar.id, chChar.char, 
chCharDefn.id, chCharDefn.defn from chCharDefn join chChar ON 
(chCharDefn.chChar = chChar.id);
sqlite select * from ChineseCharsAndDefns;
1|char1|1|def1
2|char2|2|def2
2|char2|3|def3
3|char3|4|def4
4|char4|5|def5
4|char4|6|def6
5|char5|7|def7
sqlite

As you can see, I just prefixed my join query with create view
ChineseCharsAndDefns as.  Now watch what happens when I change one of
the underlying tables:

sqlite begin;
sqlite update chCharDefn set defn='def7CHANGED' where id=7;
sqlite select * from ChineseCharsAndDefns;
1|char1|1|def1
2|char2|2|def2
2|char2|3|def3
3|char3|4|def4
4|char4|5|def5
4|char4|6|def6
5|char5|7|def7CHANGED
sqlite rollback;
sqlite 

Note again that, as far as I know, views in SQLite only exist as 
'syntactic sugar' -- they don't provide any functional difference, 
just a textual difference in your code.  Others can confirm or refute
that.

It could be that you want to create a real table that can be read from
and written to.  I would then ask why you think you need to do that.

Eric

-- 
Eric A. Smith

Sendmail may be safely run set-user-id to root.
-- Eric Allman, Sendmail Installation Guide
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Null character problem

2010-07-09 Thread Eric Smith
Kavita Raghunathan wrote: 

 sprintf(SqlStr, INSERT INTO %s (AttrName, AttrEnum, AttrType, 
 AttrValue, ReadWrite, Entity_id) VALUES('%s', %d, %d, '%s', %d, 
 %d);, tbl_name, db[i]-attr_name, db[i]-attr_num, db[i]-attr_type, 
 db[i]-attr_value, db[i]-attr_src, entity_id); 

Don't do that.  What if attr_name contains a ' character (or, as you
say, some other weird character)?

Instead, prepare a statement with sqlite variables, and bind values to
those variables using the sqlite3_bind* family of interfaces:
http://sqlite.org/c3ref/bind_blob.html

I have no specific knowledge on whether sqlite handles null characters
within the variables' values--but if I were a bettin man, I'd bet that
it handles them quite cleanly.

-- 
Eric A. Smith

I think there's a world market for about five computers.
-- attr. Thomas J. Watson (Chairman of the Board, IBM), 1943
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Null character problem

2010-07-09 Thread Eric Smith
Simon Slavin wrote: 

 BLOBs can handle any sequences of bytes without problems, 
 including nulls, ETX, and sequences which be illegal if they were used to 
 express Unicode characters.  You can put anything you like in a BLOB.  

I assume, due to the manifest typing semantics of the library, that 
the declared type of the column will make no difference when I bind a 
weird datum to a variable (as long as it's not an 'INTEGER PRIMARY KEY' 
column).  

Will sqlite3_bind_text work properly if the string contains (internal) 
nulls?  What if I did something like: 

char zText[100];
memset(zText, 0, sizeof(zText)); 
sqlite3_bind_text(stmt, idx, zText, sizeof(zText), SQLITE_TRANSIENT); 

According to a strict reading of the doc, sqlite will blindly copy
sizeof(zText) characters (starting from zText[0]) into the column.  
That is, this will store 100 null bytes into the column.  Is that 
right?

Eric 

-- 
Eric A. Smith

When you come to a fork in the road, take it.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Null character problem

2010-07-09 Thread Eric Smith
Simo Slavin wrote: 

 (according to your earlier post) 

I'm not OP.  I'm Eric.  OP was someone else.  In this context, I don't 
care about blobs or about the right way of doing anything.

 Read the documentation for memset().  

I know quite well how memset works.  I know character!=byte.  These 
matters are irrelevant to my question.  

My question came purely from a mild curiosity.  I was wondering about 
the behavior of sqlite call sqlite3_bind_text when it is passed a range 
of BYTES that includes nulls.

-- 
Eric A. Smith

It's up.  It sorta works.  That's a start.
-- BJ Premore
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite query with c++ variable

2010-07-09 Thread Eric Smith
smengl90 wrote:

 Hi, I want to compose a query that will use a c++ variable. For example I
 have:
 
 int i= 5;
 char * query = SELECT * from userInfo WHERE count'i'.
 
 The syntax does not work. How do I do that?

I think your primary problem is lack of experience in C++.  I 
strongly recommend that you read up on the language itself before
attempting to write SQLite applications.

Are you required to use C++?  You may find it (much, much) easier to 
use Tcl at first.

If I may ask: are you a student, or doing this for fun in your spare 
time, or is this a part of your job?

Eric

--
Eric A. Smith

Windows is *NOT* a virus - viruses are small and efficient.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-09 Thread Eric Smith

 I strongly recommend that you always make the child side of fork(2)
 either exit(2) or exec(2) immediately.  

Sorry Nico, I never saw this response -- I appreciate it!

What do you mean, immediately?  As I said, my child comes to life,
does some work without touching (its copy of) existing SQLite strucures,
and then calls exit(2).  The lifetime of the child is small wrt the
lifetime of the parent.

Let's assume for the moment that I don't care about safety wrt
non-sqlite libraries (except of course any libraries on which 
sqlite depends).

 With respect to SQLite3, there are two sets of fork-safety issues: file
 descriptor offsets (use USE_PREAD to avoid this issue), 

I take you to mean that the child and parent's fds point to the same
underlying file description, and if the child changes the file
description then it will have a side effect in the parent.

But I have assumed that the child does not make any sqlite api calls
against existing sqlite structures.  I believe this assumption allows 
me to conclude that sqlite will not touch any existing fd, and hence
will not bear such an impact on the parent (even if the child makes
sqlite api calls against structures the child creates on its own).  
Am I right? 

 and POSIX file byte range locks.

I'm not using POSIX locks, so I'm good to go there.  But even if I were,
I believe my above reasoning applies equally well here, since I believe
your reason for being concerned about it is similar.  The fds that were 
duplicated across the fork refer to the same underlying file
description, so we are technically in a dangerous state: the child
*could*, at its whim, release the parent's lock (for example).  But if 
it guarantees not to do so (by guaranteeing to make no sqlite calls
against existing structures), then no harm will result.

Thanks,
Eric

--
Eric A. Smith

Impartial, adj.:
Unable to perceive any promise of personal advantage from
espousing either side of a controversy or adopting either of two
conflicting opinions.
-- Ambrose Bierce, The Devil's Dictionary
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reg: In Memory Database Using SQLite

2010-07-08 Thread Eric Smith
Subhadeep Ghosh wrote: 

 I finally managed to create a wrapper around the SQLite core to support 
 the creation of in-memory databases.  The wrapper comprises of three 
 functions - one to serialize the database, one to de-serialize a database 
 and the third one to do the cleanup job.  
 
 The function which serializes a database, copies the entire contents of 
 the database to a block of memory.  The function which de-serializes takes 
 a block of memory as input and creates a database object out of it.  
 
 I am including the source code for the interested souls and I've not put 
 comments as I hate doing it.  
 
 I hope you guys find the code useful or at-least interesting.  

I hope you didn't spend too much time on that.  SQLite supports 
in-memory databases natively: if you open the file called :memory: you
get an in-memory database that dies when the connection is closed:

http://sqlite.org/c3ref/open.html

-- 
Eric A. Smith

More people would come here if it weren't so crowded.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] docs bug: tclsqlite.html

2010-07-08 Thread Eric Smith
tclsqlite.html lists an unlock_notify method with no other
documentation.  Trying to use it gives me this:

-bash-2.05b$ tcl
% package require sqlite
3.6.23
% sqlite3 db /tmp/foo
% db unlock_notify
unlock_notify not available in this build
% 

--
Eric A. Smith

The concept is interesting and well-formed, but in order to earn 
better than a 'C,' the idea must be feasible.
-- A Yale University management professor in response to Fred Smith's paper
   proposing reliable overnight delivery service.
   (Smith went on to found Federal Express Corp.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] work-arounds for (possibly buggy) nfs?

2010-07-08 Thread Eric Smith
I'm forced into a situation where I have to use an nfs server that I 
think is buggy.  

I can read/write files normally using fopen() on the exported 
filesystem, but can't do anything useful with sqlite 3.6.23.1:

-bash-2.05b$ tclsh
% package require sqlite
3.6.23
% sqlite3 db ./foo
% db eval {pragma synchronous=off}
disk I/O error
% puts $::errorInfo
disk I/O error
while executing
db eval {pragma synchronous=off}
% db eval {create table t(a)}
disk I/O error
% ^C
-bash-2.05b$ lh
total 0
-rw-r--r--  1 esmith  1005 0B Jul  8 10:19 foo
-bash-2.05b$ echo foobar  foo
-bash-2.05b$ lh
total 2
-rw-r--r--  1 esmith  1005 7B Jul  8 10:23 foo
-bash-2.05b$ tclsh
% package require sqlite
3.6.23
% 
% file delete -force foo
% sqlite3 db ./foo
% db eval {pragma locking_mode=exclusive}
exclusive
% db eval {pragma synchronous=off}
disk I/O error

Everything works fine when I use a database on the local disk:

% db close
% sqlite3 db /tmp/foo
% db eval {pragma synchronous=off}
% db eval {create table t(a)}
% db eval {insert into t values('blah')}
% db eval {select count(*) from t}
1
% 

-bash-2.05b$ uname -a
FreeBSD  5.40. FreeBSD 5.40. #0: Fri Sep 19 03:14:59 EDT 2008  i386

Can anyone offer any hints?

Thanks!
Eric

-- 
Eric A. Smith

Furbling, v.:
Having to wander through a maze of ropes at an airport or 
bank even when you are the only person in line.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] work-arounds for (possibly buggy) nfs?

2010-07-08 Thread Eric Smith
Richard Hipp wrote:

 sqlite3 db ./foo -vfs unix-dotfile

Works like a charm!

 That uses an alternative VFS that uses dot-file locking instead of posix
 advisory locks.  The dot-file locks are someone slower and have less
 concurrency, so the use of PRAGMA locking_mode=EXCLUSIVE might also be a
 good idea in such a configuration.

My use-case requires no concurrency whatsoever.  I'll circle back if I
notice any perf changes.

Many thanks!

Eric

--
Eric A. Smith

fenderberg, n.:
The large glacial deposits that form on the insides
of car fenders during snowstorms.
-- Sniglets, Rich Hall  Friends
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reg: In Memory Database Using SQLite

2010-07-08 Thread Eric Smith
Subhadeep Ghosh wrote: 

 I am aware of the functionality which you mentioned.  But the reason why 
 I ended up coding the feature was because of a very specific requirement.  
 My requirement was such that some of my client applications were running 
 on disk-less systems where I was not using any kind of network storage.  
 But my applications used SQLite.  So I coded the below mentioned feature 
 such that the database was being stored on the server and when the 
 disk-less clients wanted to use them, they fetched the database from the 
 server over the network and directly create a database out of the fetched 
 data (basically serializing and de-serializing the database).  
 
 I am guessing that SQLite does not support such functionality.  The 
 closest thing would have been to fetch the database over the network, 
 store it in a file (which in this case is not possible) and then open the 
 database.  
 
 I maybe wrong, but if such a feature already exists then I would be more 
 than happy to adopt it in my code.  

I see.  

Firstly, you may be better off using a client/server db, since that 
seems to map more naturally into your use-case.  But let's forget about 
that for the moment.  

Had I written your application, I first would have considered dumping 
the original database to raw SQL text and reading it into the in-mem 
db on the other end.  That would be pretty trivial to do, and if it 
suits your needs perf-wise, you're golden.  This also gives the extra 
advantage of having a textual communication format between the master 
and the slave, which can be a useful debugging tool.  

I would then have considered using a RAM-backed filesystem on the 
slave.  You can copy the sqlite db as-is to the slave and open it 
using normal sqlite api calls.  SQLite and your app think the db is 
disk-backed because the OS is faking the existence of a disk.  Whether 
this option works well depends on how easy it is to get a ram-backed fs 
up and running on your slave.  (In linux this is very easy, not sure 
about your deployment oS.)  

Eric 

-- 
Eric A. Smith

The people can always be brought to the bidding of the leaders. That 
is easy. All you have to do is tell them they are being attacked and 
denounce the pacifists for lack of patriotism and exposing the 
country to danger. It works the same way in any country. 
-- Herman Goering, at the Nuremberg trials
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] work-arounds for (possibly buggy) nfs?

2010-07-08 Thread Eric Smith
Richard Hipp wrote: 

 sqlite3 db ./foo -vfs unix-dotfile 
 
 That uses an alternative VFS that uses dot-file locking instead of 
 posix advisory locks.  The dot-file locks are someone slower and have less 
 concurrency, so the use of PRAGMA locking_mode=EXCLUSIVE might also be a 
 good idea in such a configuration.  

Looking at your code, I see that unix-dotfile locks are in reality 
always exlusive.  My guess is that the existence of a separate file 
indicates that a process owns the (unique) lock, and non-existence 
indicates no one owns the lock.  

I see you defer to open(2) with O_EXCL to get atomicity here.  

The docs leave something unclear: 

If I ATTACH a database with pragma locking_mode=EXCLUSIVE under the 
unix-dotfile VFS, get a lock on the attached db, execute a write 
transaction on it, and then DETACH the database, then I'm assuming the 
lock on the attached db is released.  Is that right?  

Eric 

-- 
Eric A. Smith

I still maintain the point that designing a monolithic kernel in 
1991 is a fundamental error.  Be thankful you are not my student.  
You would not get a high grade for such a design.
-- Andrew Tanenbaum, to Linus Torvalds
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SqLite 3.7.0 amalgamation build warnings

2010-07-07 Thread Eric Smith
Nick Shaw wrote: 

 Realised I also fixed this warning before posting too: 

You may not be getting a lot of responses on this because the SQLite 
devs have a philosophy that, for this project, chasing down compiler 
warnings is generally a waste of time.  

See http://sqlite.org/testing.html#staticanalysis for details on that.  

If you want action regarding the warnings you see, I recommend arguing 
how the relevant code will cause a real-world behavioral problem.  And 
the best argument is to write a bit of code exercising the problem.  

-- 
Eric A. Smith

Where a calculator on the ENIAC is equipped with 18,000 vaccuum 
tubes and weighs 30 tons, computers in the future may have only 
1,000 vaccuum tubes and perhaps weigh 1 1/2 tons.
-- Popular Mechanics, March 1949
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SqLite 3.7.0 amalgamation build warnings

2010-07-07 Thread Eric Smith
Nick Shaw wrote: 

 If it's safe to ignore 

Note that I was careful not to say that the warnings are safe to 
ignore.  :) I only said that the SQLite devs may ignore them without 
further evidence of problems.  

I'm not an expert on the SQLite code, so wouldn't make any specific 
claims about it.  

Eric 

-- 
Eric A. Smith

Peace, n.:
In international affairs, a period of cheating between two
periods of fighting.
-- Ambrose Bierce, The Devil's Dictionary
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ARM Cross compilation problem

2010-07-01 Thread Eric Smith
Matheus Ledesma wrote:

 arm-none-linux-gnueabi-gcc -o balanca 
 balanca_simula_peso_io_paralela.c  -Wall -W -O2 
 -Wl,-R/home/generic/CodeSourcery/Sourcery_G++_Lite/lib -lsqlite3 
 -lpthread -static

Try adding '-ldl' to your args.

Eric

--
Eric A. Smith

Where a calculator on the ENIAC is equipped with 18,000 vaccuum 
tubes and weighs 30 tons, computers in the future may have only 
1,000 vaccuum tubes and perhaps weigh 1 1/2 tons.
-- Popular Mechanics, March 1949
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ARM Cross compilation problem

2010-07-01 Thread Eric Smith
Matheus Ledesma wrote: 

 With -ldl after -static argument I receive the following error: 
 
 balanca_simula_peso_io_paralela.c:632: warning: 'mseg1' may be used 
 uninitialized in this function 
 
 /home/generic/CodeSourcery/Sourcery_G++_Lite/bin/../arm-none-linux-gnueabi/libc/lib/libsqlite3.a(sqlite3.o):
  
 In function `unixDlOpen': 
 /home/generic/OpenController/garten/teste/sqlite3.c:26401: warning: 
 Using 'dlopen' in statically linked applications requires at runtime the 
 shared libraries from the glibc version used for linking 

Not an error, a warning :) If you check, your app binary was probably 
created in your build directory.  

Here's my understanding: 

I can't speak to whether you really need to, but you are passing 
-static.  According to the gcc manpage, this means this prevents 
linking with ... shared libraries.  

This is a lie.  gcc still does a dynamic link to libgcc, which is 
assumed to be so widely available that it's a waste to statically 
link it.  But gcc is nice and warns you that your application is not 
completely statically linked.

So you can try running your application on the target platform.  If the 
proper shared libs exist there, then you're good to go.  

If not, you need to statically link libgcc as well at compile time.  Do 
that by passing -static-libgcc to your compiler.  

Disclaimer: I'm not a guru in the area, so ymmv.  

Good luck,

Eric 

-- 
Eric A. Smith

The concept is interesting and well-formed, but in order to earn 
better than a 'C,' the idea must be feasible.
-- A Yale University management professor in response to Fred Smith's paper
   proposing reliable overnight delivery service.
   (Smith went on to found Federal Express Corp.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] minor website bug (was Re: SQLite 3.7.0 coming soon....)

2010-06-30 Thread Eric Smith
 Please provide feedback - positive, negative, or indifferent - to this  
 mailing list.

Minor: the link in about.html from fopen() to 
http://man.he.net/man3/fopen is broken.

Eric

--
Eric A. Smith

Mandelbug, n.:
A bug whose underlying causes are so complex and obscure as to
make its behavior appear chaotic or even non-deterministic.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] open db cx to fork(2)d children

2010-06-28 Thread Eric Smith
From the docs:

 Under Unix, you should not carry an open SQLite database across a
 fork() system call into the child process. Problems will result if you
 do.

What if I fork a process that promises not to use the handle, and
furthermore the child process certainly dies before the parent does?

Will this still cause problems?

What if I add an additional assumption that the parent process is the
only process that ever accesses the database?

--
Eric A. Smith

Slurm, n.:
The slime that accumulates on the underside of a soap bar when
it sits in the dish too long.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64 bit libsqlite for AIX and Solaris

2010-06-24 Thread Eric Smith
Black, Michael (IS) wrote: 

 Though I'm not sure if there's any advantage/disadvantage to 64-bit 
 binaries for sqlite3, is there?  

64-bit SQLite can cache more than 4Gb of your db.

Eric

-- 
Eric A. Smith

Finagle's First Law: If an experiment works, something has gone wrong.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How accept sqlite3 commands from stdin

2010-06-24 Thread Eric Smith
Peng Yu wrote:

 On Thu, Jun 24, 2010 at 5:05 AM, Simon Slavin slav...@bigfraud.org wrote:
 
  On 24 Jun 2010, at 4:50am, Peng Yu wrote:
 
  Is there a way to use Shebang for sqlite3 script?
 
  http://en.wikipedia.org/wiki/Shebang_%28Unix%29
 
  SQLite comes with a command-line tool.  You can feed it with individual 
  commands on a Unix command-line or tell it to read commands from a file.
 
 I was asking whether it is possible to use Shebang with sqlite script.
 If it is possible, would you please show me how to modify the
 following script to do so?
 
 $ ./main.sql
 Error: unknown command or invalid arguments:  /main.sql. Enter
 .help for help
 $ cat main.sql
 #!/usr/bin/sqlite3 main.db
 
 create table tbl1(one varchar(10), two smallint);
 .quit

You want to pretend a .sql file is an executable and send commands to
sqlite3?  Try something like this:

[hudson:~] $ cat foo.sql
#!/bin/sh

sqlite3 main.db EOF

create table tbl1(col1 text);
insert into tbl1 values('foobar');

EOF
[hudson:~] $ chmod +x foo.sql
[hudson:~] $ ./foo.sql 
[hudson:~] $ lh main.db
-rw-r--r--. 1 eas eas 2.0K Jun 24 10:18 main.db
[hudson:~] $ sqlite3 main.db select * from tbl1
foobar
[hudson:~] $ 

--
Eric A. Smith

Peace, n.:
In international affairs, a period of cheating between two
periods of fighting.
-- Ambrose Bierce, The Devil's Dictionary
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
I have confirmed that INSERT times are roughly logarithmic in 
the number of existing records after creating my specific user 
indices.

But INSERT times appeared to be *linear* in the number of existing 
records before I had created any user indices (and with no primary 
keys or unique indices defined).

Can anyone explain this?

Eric

   time (minutes) to insert 2m records
 10 ++--+--+---+--+---+-++
+   +  +   +  A   +   +  +
  9 ++..++
|  AAA A |
  8 ++..AAA..A..++
| A      |
  7 ++.A.A..AAA.++
|    |
  6 ++..AA...A.A++
|AAA |
  5 ++A...AA++
  4 ++.AA.AA..A.A...++
|   A AAA  AA|
  3 ++..++
|   AAA A  A |
  2 ++.AA...++
| AA |
  1 ++...AAA++
AAA +  +   +  +   +  +
  0 ++--+--+---+--+---+-++
0  100200 300400 500600
millions of existing records

--
Eric A. Smith

A nickel ain't worth a dime anymore.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Jim Wilcoxson wrote: 

 Insert times should be constant for the 2nd case: no primary key, no 
 indexes; ie, it doesn't matter how many records are already in the 
 database.  I confirmed this with SQLite 3.6.18.  

Definitely not constant.  Looks linear to me -- you saw the plot, you
can decide for yourself.

I'm in SQLite 3.6.23.1.  How do I score an old version to test it?

 Did you see my earlier note about combining your two integers into the 
 primary key?  This will also give you constant insert times, if you 
 insert items in the order: 

Hey sorry, I didn't see that.  Cute idea, but my accessors are in Tcl, 
I don't want to do bit twiddling or query mangling on the read side from 
Tcl, and I don't want to re-write it in C.  Plus a host of other reasons
that would bore the SQLite community.  I'm actually rather happy without
any primary key definition right now.

Thanks also for the tip on insertion order.  Does that also hold for 
multi-column indices (and not single-column indices transformed from two 
integers)?  I assume it's because we get more cache hits and fewer tree 
rebalances when we insert in key-order?

Before I make any more adjustments, I want to understand why I'm linear
with no indices!

I'm pretty sure I'm not doing anything stupid, like setting evil 
compile-time options or whatever.  But then again most stupid people 
don't think their results come from being stupid.  

Eric 

-- 
Eric A. Smith

Aeropalmics (ayr o palm' iks), n.: 
The study of wind resistance conducted by holding a cupped 
hand out the car window.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] where to get historical versions?

2010-06-22 Thread Eric Smith
In another thread in this forum, someone says they noticed a behavior 
in sqlite version 3.6.18 different (better) than what I've observed in 
3.6.23.1.  

Where can I find version 3.6.18 (or, more generally, any old version) 
for testing?  

Eric 

-- 
Eric A. Smith

It's up.  It sorta works.  That's a start.
-- BJ Premore
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Richard Hipp wrote: 

 When there are no indices, SQLite can put the rows into the database in 
 any order it wants, and it chooses to put them in ascending order.  Thus, 
 each insert becomes a constant-time append.  (Approximately - the truth is 
 a little more complicated, but by waving our arms vigorously, we can still 
 claim constant time per row.)  But if there are indices, each row must be 
 inserted in index order, which involves a b-tree search to find the right 
 spot and possible a rebalancing operation - O(logN) per row.  

Again, my observation is that you are *not* doing constant-time inserts 
when there are no indices.  

What do you mean, ascending order?  

The only constraint on the relevant table is a foreign-key ref to a tiny 
table.  But the asymptotic behavior is the same with pragma foreign_keys 
off or on.  

I double-checked sqlite_master and there are no indices (not even 
auto-indices) on the table.  

Inserts are *faster* at high row counts when there *are* indices.  

I am using 3.6.23.1.  I haven't tested earlier versions (waiting on a 
reply in another thread to find out where to get them).  

How vigorously are you waving?  Can you describe the real algorithm, or 
at least a second-order approximation?  

Eric 

-- 
Eric A. Smith

What the hell is it good for?
-- Robert Lloyd (engineer of the Advanced Computing Systems
   Division of IBM), to colleagues who insisted that the
   microprocessor was the wave of the future, c. 1968
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Jay A.  Kreibich wrote: 

 What OS/filesystem are you using?  
 
 SQL inserts should be near-constant, assuming the table does not 
 have an INTEGER PRIMARY KEY with explicit values.  The table's root 
 B-Tree needs to re-balance every now and then, but if the inserts are 
 in-order (which they will be with an automatic ROWID) this should be 
 rare and cheap-- should should get more rare as the number of rows 
 increases.  
 
 Many *filesystems* do not provide linear access times, however, 
 especially with larger files.  

Interesting.  But behavior is better (logarithmic) with indices 
defined.  

Right now I'm in 64-bit linux 2.6.18, rhel 5.4.  The fs is ext3.  Not 
sure if this particular box has a raid5 array like the other box did.  
But again, I think it's a moot point: even when I'm completely in the 
page cache behavior is linear, and it improves with indices.  This 
suggests a software algo issue.  

(Just got your corrections, I knew what you meant.:-)

Eric

-- 
Eric A. Smith

This non-pronunciation of initial _h_ is especially common among 
French and British people, who can't pronounce English very well.
-- Tim Pulju
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Scott Hess wrote: 

 You should reduce your demonstration case to something you'd be 
 willing to post the code for.  Probably using synthetic data 
 (preferably generated data).  There's something missing in the thread 
 right now, and it's unlikely to be exposed by random shots in the 
 dark.  

I'll start doing that and reply here with an obfuscated schema.  In 
the mean time, where can I find version 3.6.18 (whom someone 
claimed definitely does constant-time insertions)?  

-- 
Eric A. Smith

You will never amount to much. 
-- Munich Schoolmaster, to Albert Einstein, age 10
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Scott Hess wrote: 

 You should reduce your demonstration case to something you'd be 
 willing to post the code for.  Probably using synthetic data 
 (preferably generated data).  There's something missing in the thread 
 right now, and it's unlikely to be exposed by random shots in the 
 dark.  

As I was putting together the answer to this request, I decided to 
double-check the result by #ifdef'ing out the sqlite calls.

Turns out I *was* being stupid.

An old app-level error check ran after a hunk of data was inserted.
The check was supported by the PK definition, which I had just removed.
So sqlite was doing a table scan every batch.  Measurements were 
better with user-level indices because one of the indices was usable 
in the error check.  

*sigh* kill me.  Sorry for wasting your time there. :/

So the summary of this thread for those who follow is: 

1. Primary keys cause implicit indices to be defined.
2. If you insert data out of order (according to the index) then you
   have to read/write more pages per insertion.  This, among other
   things, means the journal file can grow with the starting db size,
   not just with the size of the transaction.
3. Consider reducing churn against OS-level caches (or the disk) 
   by increasing sqlite's cache_size.

Thanks again, everyone, for your help!

Eric 

-- 
Eric A. Smith

You made the fatal mistake of assuming that your friends are genuine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-20 Thread Eric Smith
Eric Smith wrote: 

 I'd like to show it to the forum -- is it possible to send emails with 
 attachments here?  It's a 60kb jpg file.  

God bless the gnuplot developers, who provided an ascii output option:

  time (minutes) to insert 2m records
10 ++--+--+---+--+---+-++
   +   +  +   +  A   +   +  +
 9 ++..++
   |  AAA A |
 8 ++..AAA..A..++
   | A      |
 7 ++.A.A..AAA.++
   |    |
 6 ++..AA...A.A++
   |AAA |
 5 ++A...AA++
 4 ++.AA.AA..A.A...++
   |   A AAA  AA|
 3 ++..++
   |   AAA A  A |
 2 ++.AA...++
   | AA |
 1 ++...AAA++
   AAA +  +   +  +   +  +
 0 ++--+--+---+--+---+-++
   0  100200 300400 500600
   millions of existing records


You can see how we ran out of RAM at 100m records.  From 100m to 200m
there was a cron job running that was causing the disk cache to get
thrown out; I killed it at around 200m records.  

You can see the linear progression quite clearly.

At ~480m records I halted the process, built user indices, and restarted.
And voila, we're back down in happy-land.

Eric

-- 
Eric A. Smith

The people can always be brought to the bidding of the leaders. That 
is easy. All you have to do is tell them they are being attacked and 
denounce the pacifists for lack of patriotism and exposing the 
country to danger. It works the same way in any country. 
-- Herman Goering, at the Nuremberg trials
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-20 Thread Eric Smith
Black, Michael (IS) wrote: 

 Hmmma 6.5X speed diff between RAM and disk?  Sounds pretty good to 
 me.  Not sure why you expect better.  

I don't expect better.  As I said, I'm not really sure what I should 
expect here, or how to find out what I should expect.  

 10,000/sec is crappy?  And you think this because  

Because I think anything besides instantaneous results is crappy.  :) 

But I'm willing for now to live with the max I should reasonably expect 
from modern hardware.  I'd like to know e.g.  how much raid5 is hurting, 
so I can know whether to recommend that the user move to a single disk.  

Jay, btw, I saw your email on stripe sizes -- thanks so much for the 
suggestion -- I'll look into it.  But see below for a recent update to 
app behavior.  

 #1 What OS are you on?  There are numerous disk speed testing programs 
 depending on your OS.  

64-bit RHEL5.4, Linux 2.6.18 for the moment.  I could probably convince 
the user to switch to FreeBSD6 if there are big wins there.  My app 
already works in both OSs.  

 #2 Are you multi-threading?  A seperate reader process could help.  

Just one process, one thread.  

I'm sure you're right -- adding a reader process would certainly help 
run time by a few percentage points, but at the (unjustified at the 
moment) expense of increased maintenance cost  dev time.  This is an 
extremely cheap project so far and I'm trying to keep it that way.  

 #3 How many records total?  

Probably a couple billion, where each record is about 300 bytes across 
about 15 columns.  

 #4 Final size of database?  

After indices I'm guessing we'll be at ~500Gb.  When the db has 0.5b 
records and all indices are defined, we're at around 130Gb.  

 #5 How fast can you read your input file?  

I can generate inputs to SQLite at a rate of at least 65k records/sec.  
I haven't measured the input generation separately from the sqlite 
calls.  

 #6 What happens if you just insert the same records the same # of times 

Haven't tested it.  

 #7 What does your CPU usage show?  

After dropping the indices (per Jay et al's suggestion) I think SQLite 
is actually spending more than half its time on the CPU under large 
record sets.  Obviously I need to measure that more carefully under the 
new DB schema.  

My initial guess that I was inserting in O(1) was wrong -- time to 
insert 2m records went up by about a minute per 100m existing records.  

And this part is interesting: I finished the initial seed and created 
my user indices.  Then I added some more records, and found that insert 
times went down from 9 minutes to 2 minutes per 2 million records.  The 
plot is *striking*.  (I'd like to show it to the forum -- is it possible 
to send emails with attachments here?  It's a 60kb jpg file.)  I'm back 
up to inserting 17k records/second and am almost entirely CPU-bound.  I 
think I'm back in RAM!  

What is going on here?  Is SQLite using index data to do inserts more 
quickly?  Do you think that's causing it to need to read fewer pages on 
a particular insert?  

I'm very interested to see how this would look if I defined the user 
indices before the initial seed.  

 I assume you're multicore (as most are now I think).  

Yes, I have multiple CPUs, but I'm in one process in one thread so I'm 
only using one CPU at a time.  

Now that I'm seeing this CPU-bound behavior after adding indices, I'm 
reconsidering the whole multi-process thing.  Still, at least 75% of CPU 
usage is in SQLite.  More testing needed.  

 Depending on what you're doing with this data are you sure you need a 
 database solution?  

No.  But let's assume for now it's the best thing available to solve 
my problem under tight time constraints -- because in any case they're 
interesting questions, right?  :) 

 I don't recall you really explaining your ultimate goal...  

I tried to state the question as generally as possible while capturing 
the relevant specifics of my problem, so that gurus' answers will be 
useful to more people (including future Eric who is writing another 
application).  

I'll try starting off with those user indices and see how we do.

Thanks again!

Eric 

-- 
Eric A. Smith

I have always wished that my computer would be as easy to use as 
my telephone. My wish has come true. I no longer know how to use 
my telephone.
-- Bjarne Stroustrup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-19 Thread Eric Smith
Jay A.  Kreibich wrote: 

 Try getting rid of the PK definition and see how much that buys you.  
 It might be worth it, or it might not.  

and Simon Slavin wrote: 

 We know you are doing a huge amount of writing to this database.  Are 
 you also reading it frequently ?  If not, then it might be worth making an 
 index on that primary key only when you're about to need it.  

I think the use case will usually be (only) writes followed by (only) 
reads.  There may be incremental writes later, but they will hopefully 
be small compared to the initial seed.  I intend to create a set of 
maybe 7 indices to serve the particular queries I think the user intends 
to run.  Let's all hope we can update the db with those indices at a 
higher rate than the user can generate data.  :p 

I tried removing the PK definition as you both suggested, and the 
journal stays fixed at less than 20Kb, even against a db size of (at 
the moment) 37Gb.  My insert batch run times are improved by a factor of 
~2.5 and seem to be O(1).  

So, bingo.  :) 

Insert rates are still a rather crappy 10k records/second (when we were 
in RAM we were doing ~65k recs/sec, which I think was bound by my app's 
speed).  I think I'm at the stupid raid5's limit -- not really sure what 
I should expect there, or how to find out what I should expect.  

Anyway, I'll build the indices after the seed.  I'll go out on a limb 
and assume that'll be a lot faster than it would've been under my 
initial approach.  

You guys were incredibly helpful -- thanks very much!  

Eric 

-- 
Eric A. Smith

Carperpetuation (kar' pur pet u a shun), n.:
The act, when vacuuming, of running over a string at least a
dozen times, reaching over and picking it up, examining it, then
putting it back down to give the vacuum one more chance.
-- Rich Hall, Sniglets
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] unexpected large journal file

2010-06-18 Thread Eric Smith
I have no user-defined indices in my db, and want to do a largish number 
of inserts (a few billion).  I COMMIT every 10 million INSERTs or so -- 
so if my app dies (or I want to kill it) I don't have to start over.  

Row sizes are small, a couple hundred bytes across 15ish columns.  
The primary key on the table is a pair of integers.

After a few BEGIN/INSERT/COMMIT cycles the journal file grows 
mid-transaction to a pretty big size, e.g.  around 1Gb against a 14Gb db 
file, meaning (right?) that sqlite wrote to ~1Gb of the existing 
pages during that round of INSERTs.  

This means the time spent doing a batch of INSERTs goes up as the number 
of existing rows, which is a big frowny-face.* 

I'd really love to avoid writing a big journal file.  And I'd love to 
avoid doing a billion-row insert in one transaction.

So, if I'm only doing INSERTs and they're always on brand new rows and 
there are no indices, why does SQLite need to update existing pages in 
the DB?  (Is it updating an auto-index based on the primary key?)  Is 
there a way to avoid it?  

Thanks!

Eric 

PS I'm using 3.6.23.1 with defaults, except PRAGMA synchronous=OFF and 
foreign_keys=1 (my only foreign key refs are to tiny tables).  I'm using 
the Tcl API, which probably doesn't matter for this question.  

% db eval {pragma compile_options} 
ENABLE_FTS3 ENABLE_LOCKING_STYLE=0 TEMP_STORE=1 THREADSAFE=1 

* I found out today my users are using a raid5 array on the deployment 
box, so it's an even bigger frowny face than it would have been by 
default.  

-- 
Eric A. Smith

We don't like their sound, and guitar music is on the way out.
-- Decca Recording Co. rejecting the Beatles, 1962.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-18 Thread Eric Smith

Terribly sorry to self-reply, but I have a correction: 

 I'm using the Tcl API, which probably doesn't matter for this question.  

Don't know what I was thinking when I typed that.  I'm sharing a
connection in a single thread, mixing C API calls and Tcl API calls.  
The C API calls drive the INSERTs; Tcl API calls drive BEGIN/COMMIT.  

That all works fine, so don't worry about it.

Thanks again!  

Eric 

-- 
Eric A. Smith

Absurdity, n.:
A statement or belief manifestly inconsistent with one's 
own opinion.
-- Ambrose Bierce, The Devil's Dictionary
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-18 Thread Eric Smith
Jay A. Kreibich wrote:

  I'd really love to avoid writing a big journal file.  And I'd love to 
  avoid doing a billion-row insert in one transaction.
 
   So turn journaling off.

... which implies possible corruption on app failure, right?

I want progress to be saved every once in a while so I don't have to
re-build the db from scratch when my app comes back to life.

--
Eric A. Smith

Don Daniels: How stable are tense systems in languages?
Tim Pulju:   I'd say about a 42 on a scale from 1 to 212.
-- Winter 2005
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-18 Thread Eric Smith
Simon Slavin wrote: 

 How do you pass the handle from your C code to your Tcl code ?  

I don't.  I pass it from Tcl to C.  

The handle in Tcl is a command name registered with the interpreter.  
SQLite attaches a structure to the registration that contains the 
underlying handle as used by the C API.  

I'm writing a Tcl extension in C and I don't want to touch a bunch of 
Tcl code.  The Tcl caller gives me its command-name handle to the DB.  
I ask the Tcl core for the registration data for that command name, and 
use it to get the underlying SQLite handle for my C calls.  

(Checking whether I've got a real Tcl db handle is not 100% fool proof, 
but takes care of basic caller stupidity and is good enough for my 
app.)  

With error checks removed:

Tcl_CmdInfo cmdInfo;
const char* zDbHandle = Tcl_GetStringFromObj(objv[2], 0);
Tcl_GetCommandInfo(pInterp, zDbHandle, cmdInfo);
sqlite3 *pDb = *((sqlite3**)cmdInfo.objClientData);
/* Carry on, using pDb in C API calls.
*/

I was considering asking about this a while ago on this group, but opted 
not to for fear of being publicly flogged by drh. :)

Actually, I was hoping the SQLite devs would promise not to change the
way they do Tcl command registration in future releases, so this kind 
of thing will continue to work.

 You are sharing the same connection to the database between the two 
 languages, right ?  

Yep.  

 You're not doing your INSERTs from one connection and your COMMITs from 
 another ?  

Right, the one connection is shared.  

Eric 

-- 
Eric A. Smith

Substitute damn every time you're inclined to write very; your
editor will delete it and the writing will be just as it should be.
-- Mark Twain
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-18 Thread Eric Smith
Jay A.  Kreibich wrote: 

 Yes.  Hence the and this is the important part comment.  Most of 
 the time when people are building billion-row files, they're building 
 a new DB by importing a static source of data.  If things go wrong, 
 you just throw out the database and try again.  

That's kinda like doing it all in a single big transaction, which I 
wanted to avoid.  :) 

But my take-away from this is conversation is generally you're SOL: 
we are backed by a tree, so we have to update existing nodes to point 
to the new ones, so we have to touch existing pages on INSERT.  Is that 
about right?  

It's not in any way a result of my schema?  My primary key is a pair of
integers A,B.  The first column in this particular use case is in the 
range A = [0, 2million) and the second is in the range B = [0, infinity).
We

insert records A=0-2million, B=0, then
insert records A=0-2million, B=1,

etc.

Could this have an impact on how many pages need to be touched on
INSERT?

 It would also help to bump the cache up...  

That works great until the db size blows through the total RAM on the 
system, at which point we're of course disk-bound again.  At the moment
I'm only inserting about 4k rows/second. :/

Eric

-- 
Eric A. Smith

I have always wished that my computer would be as easy to use as 
my telephone. My wish has come true. I no longer know how to use 
my telephone.
-- Bjarne Stroustrup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] create index before or after many inserts?

2010-06-15 Thread Eric Smith
Let's say my app has (only) inserts followed by (only) reads.  

The reads are best served by some indices.  So I can create the indices 
before the INSERTs, or after them.  

In general, should I expect a run time perf difference between these two 
options?  

Eric 

-- 
Eric A. Smith

Louis Pasteur's theory of germs is ridiculous fiction.
-- Pierre Pachet, Professor of Physiology at Toulouse, 1872
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] create index before or after many inserts?

2010-06-15 Thread Eric Smith
Simon Slavin wrote: 

 The standard assumption about SQLite is that it's faster to do your 
 INSERTs first, then create the indices.  How much of a difference this 
 makes depends on a lot of things.  

On what things does it depend?

-- 
Eric A. Smith

Sendmail may be safely run set-user-id to root.
-- Eric Allman, Sendmail Installation Guide
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] last_insert_rowid INTEGER PRIMARY KEY columns

2010-06-07 Thread Eric Smith
The tcl interface spec says: 

 The last_insert_rowid method returns an integer which is the ROWID of 
 the most recently inserted database row.  

So if I have 

db eval {CREATE TABLE t(the_key INTEGER PRIMARY KEY, data TEXT)}

and I

db eval {INSERT INTO t VALUES(NULL, 'foo')}

then does 

db last_insert_rowid

reliably return the_key of the most recently inserted database row, 
so that the returned value may safely (across vacuums etc) be used 
as a foreign reference to t's the_key column?

Is this the best way of inserting a guaranteed non-conflicting row 
getting the new row's key?

-- 
Eric A. Smith

Money is the root of all wealth.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite file Validation

2010-05-19 Thread Eric Smith
 Devs were told to make sure they are validating their sql statements, 
 but as I have seen in the code, few of them have 

 Question: Do you think that instead of getting them go back throughout 
 their code, it is feasible to create a function that just eliminates the ; 
 and replaces it with a ,?  And if so, any suggested code?  

I have this programmer who keeps writing stupid code despite explicit 
guidance.  I want an algorithm that converts his bad inputs into good 
inputs.  

Make sure to let us know if you come up with a general solution.  

Eric 

-- 
Eric A. Smith

You are standing in an open field west of a white house, with a 
boarded front door.
There is a small mailbox here.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Database in Shared Memory

2010-05-11 Thread Eric Smith
Manuj Bhatia wrote: 

 I do not have a requirement of persistence in my current design, but I 
 expect that we might extend this shared-queue solution to more areas of 
 the server and will require some sort of persistence then.  
 That is one of the main reasons I do not want to use IPC queues (there are 
 other reasons like fixed message sizes, minimal support for queue/message 
 level metadata).  

OP might consider creating a database file on a tmpfs filesystem.  
The OS tricks SQLite (and everything in user space) into thinking the 
file is a normal file with all the usual properties thereof -- but 
it's backed by RAM and not any persistent medium.  You'll get the perf 
benefits you wanted, along with the relatively easy ability to make the 
DB persistent later.  

Fedora 12 has one of these mounted at /dev/shm by default, though I 
presume any modern Linux will support this.  

Caveat (1).  I ran 'make test' on SQLite 3.6.23.1 on my box (Linux ___ 
2.6.32.11-99.fc12.i686 #1 SMP Mon Apr 5 16:32:08 EDT 2010 i686 athlon 
i386 GNU/Linux) from within a tmpfs filesystem and 23 tests failed: 

shared-1.1.1 shared-1.2.1 shared-1.4.1.1 shared-1.4.1.2 shared-1.4.1.3 
shared-2.1.1 shared-2.2.1 shared-2.4.1.1 shared-2.4.1.2 shared-2.4.1.3 
stmt-1.2 stmt-1.3 stmt-1.4 stmt-1.5 stmt-1.6 stmt-1.7 stmt-2.1 stmt-2.2 
stmt-2.3 stmt-2.5 tempdb-2.2 tempdb-2.3 tkt2565-1.X 

I wanted to investigate to see why but haven't had the time -- it has 
to do with the global Tcl variable sqlite_open_file_count.  Running the 
fixture on just those test files yields passes (every time), but running 
the whole 'veryquick' suite yields failures (every time).  I see there's 
machinery to try to clear all state between test runs -- obviously this 
is not successful in my test.  

The testfixture is dynamically linked against these libraries:
linux-gate.so.1 =  (0x00511000)
libtcl8.5.so = /usr/lib/libtcl8.5.so (0x005cb000)
libdl.so.2 = /lib/libdl.so.2 (0x00d1f000)
libm.so.6 = /lib/libm.so.6 (0x00d42000)
libpthread.so.0 = /lib/libpthread.so.0 (0x00d26000)
libc.so.6 = /lib/libc.so.6 (0x00ba9000)
/lib/ld-linux.so.2 (0x00b87000)

Caveat (2).  I don't claim this is the best solution for the OP -- just 
a possibility.  

Eric 

-- 
Eric A. Smith

I think there's a world market for about five computers.
-- attr. Thomas J. Watson (Chairman of the Board, IBM), 1943
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users