Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?

2012-10-27 Thread Pavel Ivanov
> That said, I'd still welcome any quick summary of the differences between
> sqlite and mysql when it comes to fragmentation.

This is far from main differences between sqlite and mysql that you
should consider if you want to choose between them unless of course
your question is just about an academic interest. As you are talking
about employees I guess you are not in some purely academic exercise.
In this case think more about in-process code vs pumping through
TCP/IP stack, designed mostly to be accessible from machine-local
processes only vs accessible to anyone on the network, plain access to
everything vs versatile and complicated authorization and
authentication mechanisms, and so on and so forth. Database format is
never a part of the decision which DBMS you want to use.

Pavel


On Sat, Oct 27, 2012 at 9:32 PM, David Barrett  wrote:
> Thanks, this is really helpful!  (And I lecture my employees about the
> evils of premature optimization all the time.  In fact, I'll lecture anyone
> in earshot, so frequently that it's the butt of jokes.)
>
> That said, I'd still welcome any quick summary of the differences between
> sqlite and mysql when it comes to fragmentation.  I often get in debates
> about sqlite versus other datbases, and I'm always eager to be informed.
>  Thanks!
>
> -david
>
> On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin  wrote:
>
>>
>> On 27 Oct 2012, at 11:38am, David Barrett  wrote:
>>
>> > I'm trying to learn more about MySQL versus sqllite when it comes to
>> > vacuuming and fragmentation, especially as it relates to SSD storage.
>>
>>
>> Rather than answer your questions point-by-point, I'm going to give you
>> the current state of play.  Your understanding of how the various DBMSes
>> work is right, but your excellent question "in a world with SSDs and a
>> tremendous amount of RAM, does vacuuming matter nearly as much as on a
>> spinning disk with constrained RAM?" cuts to the heart of all your other
>> points.  The following involves a little simplification and handwaving
>> because otherwise it would be two chapters long and you'd have to do
>> homework.
>>
>> Fragmentation made a big difference to apps running on Windows, but very
>> little on any other platform.  This is because Windows does something
>> called 'read-ahead caching' which assumes that if you read block B, you're
>> soon going to want to read block B+1, so at quite a low level it helpfully
>> pre-reads it for you.  Other operating systems don't make this assumption.
>>  This is why Windows users talk about defragmentation so much, but Unix
>> users don't care about it.
>>
>> SSDs negate the whole point of defragmentation.  On a rotational disk it's
>> faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random
>> blocks from the disk, because the read heads are already positioned in the
>> right place, and the disk is going to rotate to show those five blocks in
>> order.  SSDs are just like RAM: they're Random Access Memory.  Reading any
>> five blocks in any order takes roughly the same time.  So nobody cares
>> about fragmentation on an SSD.  Read whatever blocks you want in whatever
>> order you want.
>>
>> As to clever management of disk block alignment with respect to rows and
>> columns, this is rarely worth attention these days.  The amount of
>> programming and debugging time it takes to get this right, and the amount
>> of extra processing and disk access you need to do, give you less return on
>> investment than if you spent the same money on buying a faster hard disk.
>>  It's "premature optimization" (look up the term) except for two cases:
>> overnight runs and realtime 3D graphics.  If your overnight run takes more
>> than one night, you have a problem.  If you're programming realtime 3D
>> graphics and they're jerky, your users won't enjoy your simulation.  But
>> you wouldn't be using a SQL engine for 3D graphics anyway.
>>
>> The matters you mentioned were all worth attention back in the 1980s when
>> storage and bandwidth were expensive.  As you pointed out near the end of
>> your post, these things matter less now.
>>
>> Simon.
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?

2012-10-27 Thread David Barrett
Thanks, this is really helpful!  (And I lecture my employees about the
evils of premature optimization all the time.  In fact, I'll lecture anyone
in earshot, so frequently that it's the butt of jokes.)

That said, I'd still welcome any quick summary of the differences between
sqlite and mysql when it comes to fragmentation.  I often get in debates
about sqlite versus other datbases, and I'm always eager to be informed.
 Thanks!

-david

On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin  wrote:

>
> On 27 Oct 2012, at 11:38am, David Barrett  wrote:
>
> > I'm trying to learn more about MySQL versus sqllite when it comes to
> > vacuuming and fragmentation, especially as it relates to SSD storage.
>
>
> Rather than answer your questions point-by-point, I'm going to give you
> the current state of play.  Your understanding of how the various DBMSes
> work is right, but your excellent question "in a world with SSDs and a
> tremendous amount of RAM, does vacuuming matter nearly as much as on a
> spinning disk with constrained RAM?" cuts to the heart of all your other
> points.  The following involves a little simplification and handwaving
> because otherwise it would be two chapters long and you'd have to do
> homework.
>
> Fragmentation made a big difference to apps running on Windows, but very
> little on any other platform.  This is because Windows does something
> called 'read-ahead caching' which assumes that if you read block B, you're
> soon going to want to read block B+1, so at quite a low level it helpfully
> pre-reads it for you.  Other operating systems don't make this assumption.
>  This is why Windows users talk about defragmentation so much, but Unix
> users don't care about it.
>
> SSDs negate the whole point of defragmentation.  On a rotational disk it's
> faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random
> blocks from the disk, because the read heads are already positioned in the
> right place, and the disk is going to rotate to show those five blocks in
> order.  SSDs are just like RAM: they're Random Access Memory.  Reading any
> five blocks in any order takes roughly the same time.  So nobody cares
> about fragmentation on an SSD.  Read whatever blocks you want in whatever
> order you want.
>
> As to clever management of disk block alignment with respect to rows and
> columns, this is rarely worth attention these days.  The amount of
> programming and debugging time it takes to get this right, and the amount
> of extra processing and disk access you need to do, give you less return on
> investment than if you spent the same money on buying a faster hard disk.
>  It's "premature optimization" (look up the term) except for two cases:
> overnight runs and realtime 3D graphics.  If your overnight run takes more
> than one night, you have a problem.  If you're programming realtime 3D
> graphics and they're jerky, your users won't enjoy your simulation.  But
> you wouldn't be using a SQL engine for 3D graphics anyway.
>
> The matters you mentioned were all worth attention back in the 1980s when
> storage and bandwidth were expensive.  As you pointed out near the end of
> your post, these things matter less now.
>
> Simon.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Decimal point and localisation

2012-10-27 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 27/10/12 12:43, Steinar Midtskogen wrote:
> But why can't sqlite observe LC_NUMERIC for the output - no ambiguity 
> would then arise?

Because the SQLite library does not know where the output is going.  If
you ask for a number to be provided as a string there is no way it can
even guess.

If you are referring to the shell then it has a similar issue.  Is the
output going to be read by a program (where being deterministic is good)
or by a human (where local conventions matter more)?

The rule of thumb is very easy - the people directly using SQLite are
developers.  Those developers should then take whatever formatting and
similar actions are necessary for their end users as appropriate.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlCMTd0ACgkQmOOfHg372QQqxQCfTA1Tw+TRYoeh9aLkzjnPTvcU
5GkAmwU8DagckuXxsrkRY6CthYR/FTFJ
=zZ1u
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Decimal point and localisation

2012-10-27 Thread Steinar Midtskogen
According to the documenation, "the "." character is always used as
the decimal point even if the locale setting specifies "," for this
role - the use of "," for the decimal point would result in syntactic
ambiguity".

But why can't sqlite observe LC_NUMERIC for the output - no ambiguity
would then arise?
-- 
Steinar Midtskogen
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?

2012-10-27 Thread Richard Hipp
On Sat, Oct 27, 2012 at 6:38 AM, David Barrett wrote:

> I'm trying to learn more about MySQL versus sqllite when it comes to
> vacuuming and fragmentation, especially as it relates to SSD storage.  Can
> anybody summarize for me the differences?
>
> 1) My understanding is sqlite, in general, has no automatic
> defragmentation: there is no process to gradually and incrementally
> reposition data on disk in index order.  (Even auto-VACUUM just reclaims
> deleted data by inserting new transactions in the holes left behind by
> previous deletes, which the documentation says can actually worsen
> fragmentation.)  The only defragmentation option is a full VACUUM, which
> rebuilds the entire database from scratch in index order.  During this
> period, the whole database is locked and cannot be used.  Is this roughly
> right?
>

Roughly.  SQLite does make some effort to keep things in index order as
changes are made.  If SQLite needs a new database page because of new data
being inserted, it looks for a free page in approximately the right spot.
But SQLite never moves pages around in order to put them all in the right
order (apart from VACUUM) since that involves extra I/O and slows down the
INSERT.

In practice, though, fragmentation can still occur.  Depending on your
workload.

In WAL mode, a VACUUM can be ongoing while there are other readers.  This
will lead to a *-wal file that is as big or bigger than the original
database, however.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to differentiate between sqlite database empty char* strings and DBNULL char* string?

2012-10-27 Thread Igor Tandetnik
Frank Chang  wrote:
>   Good afternoon, Is it possible to differentiate between sqlite database
> empty char* strings and DBNULL char* strings? If so, what is the est way to
> do that? Thank you.

You might be looking for sqlite3_column_type.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to differentiate between sqlite database empty char* strings and DBNULL char* string?

2012-10-27 Thread Simon Slavin

On 27 Oct 2012, at 7:17pm, Frank Chang  wrote:

>   Good afternoon, Is it possible to differentiate between sqlite database
> empty char* strings and DBNULL char* strings? If so, what is the est way to
> do that?

I do not know what you mean by DBNULL.  SQLite has a NULL type of variable.  
But it has no characters in it, it is always null.

You can use the function typeof(value) to find out the type of a value:



"The typeof(X) function returns a string that indicates the datatype of the 
expression X: "null", "integer", "real", "text", or "blob"."

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to differentiate between sqlite database empty char* strings and DBNULL char* string?

2012-10-27 Thread Frank Chang
   Good afternoon, Is it possible to differentiate between sqlite database
empty char* strings and DBNULL char* strings? If so, what is the est way to
do that? Thank you.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite versus MySQL automatic defragmentation on SSDs?

2012-10-27 Thread Simon Slavin

On 27 Oct 2012, at 11:38am, David Barrett  wrote:

> I'm trying to learn more about MySQL versus sqllite when it comes to
> vacuuming and fragmentation, especially as it relates to SSD storage.


Rather than answer your questions point-by-point, I'm going to give you the 
current state of play.  Your understanding of how the various DBMSes work is 
right, but your excellent question "in a world with SSDs and a tremendous 
amount of RAM, does vacuuming matter nearly as much as on a spinning disk with 
constrained RAM?" cuts to the heart of all your other points.  The following 
involves a little simplification and handwaving because otherwise it would be 
two chapters long and you'd have to do homework.

Fragmentation made a big difference to apps running on Windows, but very little 
on any other platform.  This is because Windows does something called 
'read-ahead caching' which assumes that if you read block B, you're soon going 
to want to read block B+1, so at quite a low level it helpfully pre-reads it 
for you.  Other operating systems don't make this assumption.  This is why 
Windows users talk about defragmentation so much, but Unix users don't care 
about it.

SSDs negate the whole point of defragmentation.  On a rotational disk it's 
faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random 
blocks from the disk, because the read heads are already positioned in the 
right place, and the disk is going to rotate to show those five blocks in 
order.  SSDs are just like RAM: they're Random Access Memory.  Reading any five 
blocks in any order takes roughly the same time.  So nobody cares about 
fragmentation on an SSD.  Read whatever blocks you want in whatever order you 
want.

As to clever management of disk block alignment with respect to rows and 
columns, this is rarely worth attention these days.  The amount of programming 
and debugging time it takes to get this right, and the amount of extra 
processing and disk access you need to do, give you less return on investment 
than if you spent the same money on buying a faster hard disk.  It's "premature 
optimization" (look up the term) except for two cases: overnight runs and 
realtime 3D graphics.  If your overnight run takes more than one night, you 
have a problem.  If you're programming realtime 3D graphics and they're jerky, 
your users won't enjoy your simulation.  But you wouldn't be using a SQL engine 
for 3D graphics anyway.

The matters you mentioned were all worth attention back in the 1980s when 
storage and bandwidth were expensive.  As you pointed out near the end of your 
post, these things matter less now.

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] statement prepares OK but step returns SQLITE_NOTADB

2012-10-27 Thread Simon Slavin

On 27 Oct 2012, at 6:36am, Dan Kennedy  wrote:

> On 10/27/2012 07:06 AM, Simon Slavin wrote:
>> 
>> On 26 Oct 2012, at 11:05pm, Clemens Ladisch
>> wrote:
>> 
>>> Yes; sqlite3_finalize _always_ frees the statement.
>> 
>> And if the statement is already finalized (due to an earlier error,
>> perhaps) then it is a harmless noop.  So you can do it near the end
>> of your routine harmlessly.
> 
> That's a bit deceptive. Passing the same pointer to sqlite3_finalize()
> twice is undefined behavior. You might get an SQLITE_MISUSE error, but
> you also might get a segfault.

Oh, right.  It releases the memory the statement was using.  Sorry.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-10-27 Thread Vladislav Bolkhovitin


Theodore Ts'o, on 10/25/2012 09:50 AM wrote:

Yeah  I don't buy that.  One, flash is still too expensive.  Two,
the capital costs to build enough Silicon foundries to replace the
current production volume of HDD's is way too expensive for any
company to afford (the cloud providers are buying *huge* numbers of
HDD's) --- and that's assuming companies wouldn't chose to use those
foundries for products with larger margins --- such as, for example,
CPU/GPU chips. :-) And third and finally, if you study the long-term
trends in terms of Data Retention Time (going down), Program and Read
Disturb (going up), and Write Endurance (going down) as a function of
feature size and/or time, you'd be wise to treat flash as nothing more
than short-term cache, and not as a long term stable store.

If end users completely give up on flash, and store all of their
precious family pictures on flash storage, after a couple of years,
they are likely going to be very disappointed

Speaking personally, I wouldn't want to have anything on flash for
more than a few months at *most* before I made sure I had another copy
saved on spinning rust platters for long-term retention.


Here I agree with you.

Vlad
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-10-27 Thread Vladislav Bolkhovitin


Theodore Ts'o, on 10/25/2012 01:14 AM wrote:

On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:

Yes, SCSI has full support for ordered/simple commands designed
exactly for that task: to have steady flow of commands even in case
when some of them are ordered.


SCSI does, yes --- *if* the device actually implements Tagged Command
Queuing (TCQ).  Not all devices do.

More importantly, SATA drives do *not* have this capability, and when
you compare the price of SATA drives to uber-expensive "enterprise
drives", it's not surprising that most people don't actually use
SCSI/SAS drives that have implemented TCQ.


What different in our positions is that you are considering storage as something 
you can connect to your desktop, while in my view storage is something, which 
stores data and serves them the best possible way with the best performance.


Hence, for you the least common denominator of all storage features is the most 
important, while for me to get the best of what possible from storage is the most 
important.


In my view storage should offload from the host system as much as possible: data 
movements, ordered operations requirements, atomic operations, deduplication, 
snapshots, reliability measures (eg RAIDs), load balancing, etc.


It's the same as with 2D/3D video acceleration hardware. If you want the best 
performance from your system, you should offload from it as much as possible. In 
case of video - to the video hardware, in case of storage - to the storage. The 
same as with video, for storage better offload - better performance. On hundreds 
of thousands IOPS it's clearly visible.


Price doesn't matter here, because it's completely different topic.


SATA's Native Command
Queuing (NCQ) is not equivalent; this allows the drive to reorder
requests (in particular read requests) so they can be serviced more
efficiently, but it does *not* allow the OS to specify a partial,
relative ordering of requests.


And so? If SATA can't do it, does it mean that nobody else can't do it too? I know 
a plenty of non-SATA devices, which can do the ordering requirements you need.


Vlad
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-10-27 Thread Vladislav Bolkhovitin


Nico Williams, on 10/24/2012 05:17 PM wrote:

Yes, SCSI has full support for ordered/simple commands designed exactly for
that task: [...]

[...]

But historically for some reason Linux storage developers were stuck with
"barriers" concept, which is obviously not the same as ORDERED commands,
hence had a lot troubles with their ambiguous semantic. As far as I can tell
the reason of that was some lack of sufficiently deep SCSI understanding
(how to handle errors, believe that ACA is something legacy from parallel
SCSI times, etc.).


Barriers are a very simple abstraction, so there's that.


It isn't simple at all. If you think for some time about barriers from the storage 
point of view, you will soon realize how bad and ambiguous they are.



Before that happens, people will keep returning again and again with those
simple questions: why the queue must be flushed for any ordered operation?
Isn't is an obvious overkill?


That [cache flushing]


It isn't cache flushing, it's _queue_ flushing. You can call it queue draining, if 
you like.


Often there's a big difference where it's done: on the system side, or on the 
storage side.


Actually, performance improvements from NCQ in many cases are not because it 
allows the drive to reorder requests, as it's commonly thought, but because it 
allows to have internal drive's processing stages stay always busy without any 
idle time. Drives often have a long internal pipeline.. Hence the need to keep 
every stage of it always busy and hence why using ORDERED commands is important 
for performance.



is not what's being asked for here. Just a
light-weight barrier.  My proposal works without having to add new
system calls: a) use a COW format, b) have background threads doing
fsync()s, c) in each transaction's root block note the last
known-committed (from a completed fsync()) transaction's root block,
d) have an array of well-known ubberblocks large enough to accommodate
as many transactions as possible without having to wait for any one
fsync() to complete, d) do not reclaim space from any one past
transaction until at least one subsequent transaction is fully
committed.  This obtains ACI- transaction semantics (survives power
failures but without durability for the last N transactions at
power-failure time) without requiring changes to the OS at all, and
with support for delayed D (durability) notification.


I believe what you really want is to be able to send to the storage a sequence of 
your favorite operations (FS operations, async IO operations, etc.) like:


Write back caching disabled:

data op11, ..., data op1N, ORDERED data op1, data op21, ..., data op2M, ...

Write back caching enabled:

data op11, ..., data op1N, ORDERED sync cache, ORDERED FUA data op1, data op21, 
..., data op2M, ...


Right?

(ORDERED means that it is guaranteed that this ordered command never in any 
circumstances will be executed before any previous command completed AND after any 
subsequent command completed.)


Vlad
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-27 Thread Elefterios Stamatogiannakis
Thank you Simon. Your solution would work for the example i gave. 
Nevertheless my problem is still more complex because i also use SQLite 
as a generic streaming engine (yes i know, SQLite wasn't designed for 
doing things like that).


Appart from input VTs (FILE) we also have output VTs like so [*]:

OUTPUT '10.0.0.1:8080' select upper(c1), c2, lower(c3) from
(
select * from file('http://www.foo.com/list1.tsv.gz')
union all
select * from file('http://www.foo.com/list2.tsv.gz')
)
where c2!=c4;

Internally this is implemented as a VT (OUTPUT) that takes as input a 
query and sends its results to another SQLite. The beauty of the concept 
is that a streaming query flow can go from machine to machine and still 
the transactional properties hold. If anything happens anywhere in the 
query's path, all of the machines will rollback due to the transactional 
properties of SQLite.


In addition to above we are developing a parallel processing engine 
(like Hadoop) where we use SQLite "chunks" for processing and data 
storage. So tables are split (or collected) into multiple SQLite DBs and 
transferred to multiple machines, where the queries will be executed in 
parallel [**].


For above we heavilly use UNION ALLs over the attached DBs to scan over 
the many chunks of a table.


A UNION ALL that unnecessarily buffers to the disk its inputs (for a 
single scan over them), is very nasty, performance wise, to all of the 
above machinations.


I can guess that changing UNION ALL processing for single scans, isn't 
so easy, nevertheless it would be nice if the /var/tmp buffering 
directory could be at least changed.


lefteris.

[*] In practise we pipe to named pipes that point to other machines.

[**] Other projects based on the same idea are:

http://hadoopdb.sourceforge.net/

and the company that sprang from above:

http://hadapt.com/


On 27/10/2012 3:05 πμ, Simon Slavin wrote:


Does this work instead ?

CREATE TABLE t ([[whatever columns you want]]);
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list1.tsv.gz') WHERE c2!=c4;
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list2.tsv.gz') WHERE c2!=c4;

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite versus MySQL automatic defragmentation on SSDs?

2012-10-27 Thread David Barrett
I'm trying to learn more about MySQL versus sqllite when it comes to
vacuuming and fragmentation, especially as it relates to SSD storage.  Can
anybody summarize for me the differences?

1) My understanding is sqlite, in general, has no automatic
defragmentation: there is no process to gradually and incrementally
reposition data on disk in index order.  (Even auto-VACUUM just reclaims
deleted data by inserting new transactions in the holes left behind by
previous deletes, which the documentation says can actually worsen
fragmentation.)  The only defragmentation option is a full VACUUM, which
rebuilds the entire database from scratch in index order.  During this
period, the whole database is locked and cannot be used.  Is this roughly
right?

2) My understanding of MySQL is that the equivalent function is to OPTIMIZE
TABLE.  Unlike sqlite, which works on a full database, this works on a
single table, but is otherwise the same -- rebuilds the entire table (and
thus needs 2x the disk space), locks the table during the operation, is not
incremental (either works fully or not at all).  Is this about right?

3) I've heard mention that in some configurations (perhaps some database
engines?) MySQL doesn't need vacuuming for some reason -- somehow it just
doesn't fragment, or resolves fragmentation as it goes -- but I can't track
down an explanation of why.  Any ideas?

4) Does MySQL (or sqlite?) have any fancy defragmentation capabilities,
such as (for example) using a clustered primary index where no two
secondary rows are put into the same block?  (For example, imagine a table
containing historical credit card transactions, where transactions are
grouped first by "cardID" and then each card has a sequential list of rows
identified by "transactionID".  Every night, new transactions are added to
each card, meaning new data would ideally be inserted throughout the entire
table, as opposed to just adding at the very end. I could imagine a
clustered index ordering all transactions for a single card back-to-back on
disk, except skipping to the next full database block before adding
transactions for the next card.  This would intentionally leave space free
in the database for additional transactions to be added to each card in
disk-sequential-order, explicitly avoiding transactions from multiple cards
being stored in the same block.  This wouldn't be a complete solution --
the database blocks for a particular card wouldn't necessarily be
sequential -- but the transactions inside each block *would* be sequential,
and no two blocks would contain transactions from different cards.  Does
this make sens?  Does anything like this exist?)

4) My understanding is disk order matters a lot for spinning disks, as it
reduces seek times when dealing with data in neighboring primary keys.
 However, am I correct in assuming that the necessity for this is
dramatically reduced by SSDs, which allow random access?  Granted,
vacuuming should always help to a degree: if your rows are smaller than the
block size, then it's better to have neighboring rows in the same block (so
you needn't read multiple blocks to get the same data).  And it's always
good to ensure your database block size is the same as the disk block size
(whether SSD or spinning) so you only need to read a single disk block per
database block.  But in a world with SSDs and a tremendous amount of RAM,
does vacuuming matter nearly as much as on a spinning disk with constrained
RAM?

Thanks, I really appreciate your thoughts and links to reading material!

-david
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users