[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H

> On 10/29/15, Jason H  wrote:
> >
> > I'm open to ideas, but I was going to use this as an excuse to invent
> > something of a general tool.
> 
> Post your schema and query.  Also run ANALYZE and post the content of
> the sqlite_stat1 table.

I really appreciate the offer, but unfortunately, I can't post it, and it's not 
my design. I can alter it though, however I'd have to design a process to map 
from the original schema to the altered one every time we get a snapshot. There 
already exists a script which adds indexes on the columns that we join through. 

Perhaps I could write a script that would textually replace the table and 
column names. That will take a little time.

(FWIW, I'm rather proud of my own schemas and they resemble nothing like this 
mess)

Thanks again.



[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
>
> On 10/29/15, Jason H  wrote:
> >>
> > If I were to try to work around...
> 
> Before we go any further, have you actually measured a performance
> problem?  Or are you optimizing without prior knowledge of where your
> application is spending time?

Currently, I have a SQLite database of around 10gig that takes 25 minutes to 
run a single query against no other activity (it's never queried in read/write, 
just read). I've created indexes the best I can. To get this lower, I'm looking 
at hadoop, or use it as an excuse to re architect something in SQLite. Hadoop 
has many non-ideal attributes. I want to stick at SQLite because that's what we 
know and my research indicates it's also still the fastest at reading. 

For this specific database, we join though about 12 tables on an average query 
(519 tables total, 319 code tables), most of which have over 2 dozen columns, 
some over 256 columns, max 384. the longest row is 16k in total.


I'm open to ideas, but I was going to use this as an excuse to invent something 
of a general tool.


[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 9:34pm, Nicolas J?ger  wrote:

> damn! I wanted a date!

Erm ... those pictures are of Angie Varona, a teenager from California.  Her 
name is not even like close to Alexa.  They just stole her picture.  Unless, of 
course, you wanted a date with the woman who wrote the text.  Who may not be 
all that 'woman'.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 7:49pm, Jason H  wrote:

> If I were to try to work around all this excess reading, how good or bad 
> would it be to take the following approach:
> Define a set of smaller tables (column count and size) with a common key, 
> then create a view for the specific query, then query the view.
> 
> create table k (id integer primary key, ... other always used columns...);
> create table c1 (k integer, data text); -- seldomly used large column
> create table c2 (k integer, data text); -- seldomly used large column
> create table c3 (k integer, data text); -- seldomly used large column
> ...
> create table c128 (k integer, data text); -- seldomly used large column

Handling large numbers of tables is slow.  Handling large numbers of rows is 
fast.  For cases where you have "seldomly used large column" use sparse storage:

CREATE TABLE sparseColumns (id INTEGER, k INTEGER, value TEXT, PRIMARY KEY 
(id,k));

Generally speaking SQLite is ridiculously good at this stuff and trying to work 
around it only makes things worse.  It's in its third version and even version 
3 has been worked on, including major re-writes, for ten years.  It doesn't do 
slow complicated things (row-level locking, character-sets).  By all means tell 
us when you get slow results, but if you're trying to predict slow results just 
from reading the documentation you'll probably not getting useful data.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 6:22pm, Jason H  wrote:

> Ah, so this is what I seem to have missed. The pages... This is unfortunate 
> as the read-heavy application won't likely benefit from SQLite. Only a small 
> fraction of the row data is used each query, and it seems like It'll have to 
> read the entire row (via the page) each time?

It will read the entire page each time.  But the page is normally the size of a 
disk block, and the read is done in one operation: seek-to-sector-head and a 
read.  It's the quickest possible thing to do and it happens very quickly 
indeed.  It's actually as fast as seek-to-sector and read-a-few-bytes, because 
reading the entire block is how disk subsystems work internally.

Also, because pages are quite big the page you want is probably already in 
memory.  If you're not doing multi-access stuff it won't need to re-read.


On 29 Oct 2015, at 6:32pm, Wade, William  wrote:

> if a particular row requires twenty blocks of storage

That happens almost never.  The size of a page is a power of two between 512 
and 65536 inclusive.  When you create a new database, if you know you're going 
to have really long rows you tend to set a big page size.

> and I
> want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.

Which is why you generally put small commonly-used columns before long text and 
blobs.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 8:59pm, Jason H  wrote:

> Currently, I have a SQLite database of around 10gig that takes 25 minutes to 
> run a single query against no other activity (it's never queried in 
> read/write, just read). I've created indexes the best I can.

You can do better.  I have a 43 Gigabyte database and I get responses from 
queries including JOINs in milliseconds.  It has four tables and the widest 
table has five columns.

> For this specific database, we join though about 12 tables on an average 
> query (519 tables total, 319 code tables), most of which have over 2 dozen 
> columns, some over 256 columns, max 384. the longest row is 16k in total.

There's your problem.  You need to redesign your schema.  As a general rule, if 
a database has so many tables -- or a table has so many columns -- that you 
can't keep them all in your head at the same time, it's badly-designed.  256 is 
silly.

You don't need a tool.  You need a better schema.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> On 10/29/15, Jason H  wrote:
> >
> > Ah, so this is what I seem to have missed. The pages... This is unfortunate
> > as the read-heavy application won't likely benefit from SQLite.
> 
> Your filesystem and your disk hardware work the same way.  Your
> application might only ask for 10 bytes, but the underlying OS and
> hardware are pulling a complete sector (or perhaps a complete track)
> from disk, pass the 10 bytes you asked for off to your app, and
> holding the rest in cache.
> 
> So, no, you cannot improve performance by only reading a few bytes
> here and a few bytes there.  The disk has to work just as hard.

First, thanks everyone, I've learned a lot about SQLite3!

You make a very good point here, the OS will be caching disk reads (16-64k from 
what I understand) so the reading is on average cheaper, but overall more 
expensive on average if I do not need all the data. I was hoping that those 
very large tables (width in terms of column number and size, not row count) 
could benefit from a database being able to discard fields. If I could ask one 
more question

If I were to try to work around all this excess reading, how good or bad would 
it be to take the following approach:
Define a set of smaller tables (column count and size) with a common key, then 
create a view for the specific query, then query the view.

create table k (id integer primary key, ... other always used columns...);
create table c1 (k integer, data text); -- seldomly used large column
create table c2 (k integer, data text); -- seldomly used large column
create table c3 (k integer, data text); -- seldomly used large column
...
create table c128 (k integer, data text); -- seldomly used large column

create view myquery as 
select k1.id id, c1.data c1, c2.data c2, c3.data c3 from k
left join c1 on c1.k=k.id 
left join c2 on c2.k=k.id 
left join c3 on c3.k=k.id;

select * from myquery;

Shouldn't that skip the data I don't need, make good use of disk caches. What I 
don't know though, is how efficient SQLite3 will be at putting that view 
together. Did I make a mess giving sqlite3 busy work or did I speed it up? 

"Nobody ever made them like this! I mean the architect had to be a certified 
genius or an authentic wacko!" - Dr Ray Stantz, Ghostbusters



[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 8:24 PM, Richard Hipp  wrote:

> On 10/29/15, Dominique Devienne  wrote:
> >
> > I've discussed blobs a few times already :). Especially the fact you
> can't
> > have transactional and incremental updates of blobs (i.e. the transaction
> > "copies" only the blob pages actually modified).
>
> Large blobs are transactional when they are stored in SQLite tables.
>

Even in the context of https://www.sqlite.org/c3ref/blob_write.html ?

Don't "it is not possible to increase the size of a BLOB" and
"Writes to the BLOB that occurred before the BLOB handle expired are not
rolled back"
imply that there is a loss of transactionality?


> Perhaps you are talking about storing BLOBs in separate files and then
> just storing the filename in the SQLite table?
>

No, I was talking about "in-row" blobs. Sorry, I was indeed unclear.

I wish to write blobs incrementally and transactionally *without* that
implying
having SQLite rewrite the whole blob (to maintain transactionality). Said
differently,
I'd like incremental blob updates to be transactional *and* efficient, when
the amount
written is a fraction of the total blob size.

The way Oracle achieves that, AFAIK, is that the row contains a small blob
that
contains the page indexes for the blobs, and the blob itself is stored
"out-of-row"
in blob dedicated pages. When doing an incremental blob update, only blob
pages
actually written are copied for the transaction, as well as the page for
the row, with
the "blob-index" blob now referencing the new blob pages (as well as the
same old
unmodified blob pages).


> If you do have large BLOBs or strings, SQLite handles this best if the
> large blob/string is stored in a table by itself and then linked into
> other tables using an integer primary key.  For example:
>
>CREATE TABLE BigBlobTab (
>blobid INTEGER PRIMARY KEY,
>content BLOB -- big BLOB or text field.
>);
>CREATE TABLE OtherStuff (
>field1 VARCHAR(10),
>field2 FLOAT,
>field3 BOOLEAN,
>blobid INT REFERENCES BigBlobTab
>);
>

But then there's a lifetime management issue. The blob is an "attribute" of
the row,
and the parent-child relationship is "upside-down". If the "blob" table
references the
"row" table, then the row table can exist w/o a blob (when it is a non-null
attribute).
And conversely, the same blob can be used by several rows, or several rows
can
share the same blob. And either way, you can't enforce the 1:1 relationship
that a
row attribute implicitly enjoys. (and is required in some cases).

Please there's the fact that foreign-keys (and ON DELETE CASCADE) is
optional,
so any user of the DB file that can break referential integrity of those
files, which it
can't if the blob is in-row.


> This technique work similarly to storing BLOBs in separate files, in
> that it does not mix the BLOB content with the rest of the row, making
> the row faster to search and update.  But it has the advantage over
> separate storage that BLOB changes are transactional and all content
> is stored in a single file on disk.
>

I don't want separate storage at all! I'd want my blobs *in* SQLite in fact.
But the above basically makes it too inefficient IMHO.


> This is how the Fossil VCS (https://www.fossil-scm.org/) stores content,
> fwiw.


Fossil stores its "blobs" by SHA-1, and sharing of blobs across
revisions/branches
is in fact a feature. And all those blobs are immutable by design. At least
that the
impression I got reading the documentation.

SQLite is a wonderful piece of software. And I thank you for it. I'm a big
fan of it.
But the above is one of my main pet peeve with it :). Which is based on my
limited
comprehension of how SQLite works. Please do correct me if I'm wrong.
Thanks, --DD


[sqlite] Documentation change suggestion: formatchng.html

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 6:58pm, Richard Hipp  wrote:

> That page was really old and mostly obsolete - at least the header
> text was.  I've rewritten it.  Please press "Reload"

Good.  Will it be updated for everything up to and including partial indexes ?  
It's a lot of work so I don't expect it right now.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:32 PM, Wade, William  wrote:

> From https://www.sqlite.org/fileformat2.html, if a row does not fit into
> a single block, its overflow is stored in a linked list. The implication
> (to me) is that if a particular row requires twenty blocks of storage, and
> I want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.
>
> Clearly it would be possible to do better even without switching to a
> column-major layout (instead use a record-internal b-tree mapping
> column-number-for-this-record to page number of the overflow block) but I
> don't see any documented evidence that sqlite does that. For some
> applications it would certainly be nice.
>

But that would require a format change, which the SQLite team is not
willing to make IMHO.


> It would also be nice to get something closer to random access into large
> blobs (and the same technique works there).
>

I've discussed blobs a few times already :). Especially the fact you can't
have transactional and incremental updates of blobs (i.e. the transaction
"copies" only the blob pages actually modified).
This would require a blob-index, and again that's a format change. There is
a vacuum mode each tracks a bit more info about pages I vaguely remember,
with special pages, that helps a little, but I don't recall the details.
--DD


[sqlite] Documentation change suggestion: formatchng.html

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 6:59 PM, Simon Slavin  wrote:

> I find nothing in error on the page
>
> 
>
> but it could perhaps be updated to mention that from version 3.9.0 of
> SQlite, file format changes can happen only when the middle number
> changes.  Or something of the sort.
>
> Irrelevant to that, I somehow expected the table to be written the other
> way up ... with the recent changes at the top.  Not sure whether this is
> just me.
>

I agree, most recent at the top is more "natural" (to me).

Since we're bikeshedding, the first 2 columns could be wider (and the first
split in two?) such that it doesn't wrap like this :) --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:22 PM, Jason H  wrote:
>
> Or is that just crazy talk?
>

Mostly, yes. You seem to think that reading from disk at arbitrary offsets
and in arbitrarily small increments will be efficient. It won't be.
And as you pointed out, that ignores transactionality, WAL mode, etc... --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:09 PM, Jason H  wrote:

> This seems to be at odds with my understanding of "2.1 Record Format" of
> the document. Given that it reads the row varint, which contains the length
> of itself and the serial types of the columns, in order, it should be
> completely able to skip the reading of unneeded columns. If I have 39,39,39
> as my header (excepting the length of header) I know I am dealing with 3
> text columns of 13 bytes each. If I want only the last column, I can then
> seek(78), read(26). At no time did I need to read the prior two columns.
>
> Or did I misunderstand something?
>

You seem to miss what Scott and Paul already explained, which is that
SQLite is *paged*.
The IO increment is the page (from 1/2K to 64K, with 1K, 2K, 4K, and 8K
being popular choices I think).
An SQLite file is a 100 byte header, and then N pages of a given size,
determined at DB creation time.
The sqlite_master table is the "root" page, which lists all tables,
indexes, etc, and their starting page in that file.
Look at the diagram to the right of http://www.sqlite.org/vfs.html (that I
already sent).
All IO is done by the *pager*, as it names indicate, it reads and writes
*pages*.

What you call "reading" column c1, skip c2, etc... is not "reading" per se.
It's "decrypting" the memory block that represents a disk file page, that
the pager read from file (actual IO of a whole page) and put in the page
cache.
it's a layered architecture with clean separation of concerns and
responsibilities. --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> Sent: Thursday, October 29, 2015 at 2:04 PM
> From: "Paul Sanderson" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> It reads a complete page at a time so there is no seeking other than
> to the start of each row - in the sense of a disk seek.
> 
> Note that there may be multiple required rows on the same page if the
> row length is much less than the page length, or if rows are longer
> than a size determined by some arcane math from the page size (see the
> file format documentation), a row may overflow to one or more pages


Ah, so this is what I seem to have missed. The pages... This is unfortunate as 
the read-heavy application won't likely benefit from SQLite. Only a small 
fraction of the row data is used each query, and it seems like It'll have to 
read the entire row (via the page) each time?


I wonder how much work it would be to come up with a "SQLite read-only mode"  
bypasses the page interface, but allows the SQL parser to query the data. it 
seems that pages are only needed when you're managing updates to the table. If 
you're write-once-query-forever it seems there would be tremendous performance 
gains to be had. 

Or is that just crazy talk?


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H


> Sent: Thursday, October 29, 2015 at 1:53 PM
> From: "Simon Slavin" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 5:20pm, Jason H  wrote:
> 
> > Thanks, this is some really great information!
> 
> You're welcome.
> 
> > If I could ask a followup question. You made the statement "SQLite reads 
> > that row of the table from storage, from the first column to the last 
> > column needed by the SELECT, but perhaps not all the way to the end of the 
> > columns in the row.", Given an example select that requires columns 2,3,5, 
> > does this mean that column 4 is also read, or is column 4 be skipped? I 
> > guess the question is I am assuming that a row is serialization of one or 
> > more strings that are a serialization of a string preceded by their lengths 
> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5 
> > example):
> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> > L3*byte, read L4, seek L4, read L5, read L5*byte
> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
> 
> You are missing some useful information.  Whenever SQLite has to change even 
> one variable-length column in a row it rewrites the whole row.  It has no way 
> of storing a short column in the space originally used by a longer one.  So 
> even if all you do is
> 
> UPDATE t SET c5='short string' WHERE c5='the original extremely long string'
> 
> the new copy of any of those rows has the new c6 immediately following the 
> new c5.  If you've read to the end of c5 then you're already pointing to c6 
> so there's no need to seek it.  Contriwise, the only way to seek c6 is to 
> read all the columns before it.  There is no pointer to the beginning of c6 
> and no magic mark that says "Column c6 starts here".
> 
> I'll answer your specific question (I hope) with an example.
> 
> CREATE TABLE t (c1, c2, c3, ... c8);
> CREATE INDEX t_23 ON myTable (c2, c3);
> SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;
> 
> Given the above, the clauses in the SELECT can be perfectly satisfied by the 
> index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.
> 
> First SQLite figures out which rows, are needed to answer the query.  And in 
> doing that it decides to use index t_23 which means that for each of those 
> rows it already has the value of c3.  So knows which rows to read, and it 
> knows that it still needs c1, c5 and c6 from each of those rows.  The rows go 
> up to c8, but it doesn't need any column past c6.
> 
> So for each row needed, SQLite is pointed to the beginning of c1 for that 
> row.  There's no way of knowing where to find c5 without reading all the 
> preceding columns, because any of those may be any length.  So it progresses 
> through all the columns from c1 to c6 using the length-indicator to skip them 
> or read them depending on whether it needs their values.  It knows it can 
> stop after c6 so it stops once it has seen that one.
> 
> Hope that helps.  If you want the fine detail about the file format, you can 
> find it here:
> 
> 
> 
> though that won't give you the algorithm details I've described above.

This seems to be at odds with my understanding of "2.1 Record Format" of the 
document. Given that it reads the row varint, which contains the length of 
itself and the serial types of the columns, in order, it should be completely 
able to skip the reading of unneeded columns. If I have 39,39,39 as my header 
(excepting the length of header) I know I am dealing with 3 text columns of 13 
bytes each. If I want only the last column, I can then seek(78), read(26). At 
no time did I need to read the prior two columns.

Or did I misunderstand something?

Many thanks, yet again.




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H


> Sent: Thursday, October 29, 2015 at 1:34 PM
> From: "Scott Hess" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
> >
> > If I could ask a followup question. You made the statement "SQLite reads
> > that row of the table from storage, from the first column to the last
> > column needed by the SELECT, but perhaps not all the way to the end of the
> > columns in the row.", Given an example select that requires columns 2,3,5,
> > does this mean that column 4 is also read, or is column 4 be skipped? I
> > guess the question is I am assuming that a row is serialization of one or
> > more strings that are a serialization of a string preceded by their lengths
> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
> > example):
> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
> > L3*byte, read L4, seek L4, read L5, read L5*byte
> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
> 
> 
> You should consider reading https://www.sqlite.org/fileformat2.html ,
> especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
> Pages".  If your project _really_ needs to know this level of detail, then
> you really should read up on the underlying system.  Also maybe throw in
> https://www.sqlite.org/arch.html to get a broad feel of how things break
> down.


Thanks Scott. I had actually already linked to the fileformat2 url, but I did 
not see a discussion to the level of detail for row reading, for which I am 
after. I did however re-read in the context of your previous statement and 
gained some insight. I should a have stated it more like this:
begin row read header_varint, serial_types*N, seek, read, seek, read or
begin row read header_varint, serial_types*N, seek, read, read, read

The documentation does not go into the detail of the engine is able to skip the 
reading of unneeded interior rows. In theory, it can because the length is 
contained in the header. So instead of read() on every column in the row, it 
can call seek() if it knows it doesn't need that column. My question is now 
simply: does it seek past unneeded columns, or does everything get send through 
read once data has started being read? It's a minor detail with bug performance 
implications. 

Many thanks again.



[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Nicolas Jäger

> Erm ... those pictures are of Angie Varona, a teenager from California.
ok, you even know the name... their parents should be punished to let their 
daughter posting such
pictures... stupid parents who don't take care of their children... sry about 
writting that, but
that's exactly how I think. An electronic device can destroy a life like a 
weapon does if you don't
handle it with care...

>  Her name is not even
> like close to Alexa.  They just stole her picture.  Unless, of course, you 
> wanted a date with the
> woman who wrote the text.  Who may not be all that 'woman'.

my wish was just as real as "alexa" ;)

I guess everyone heard about such scam... like the african prince who needs 5k$ 
and makes the
promise to send you back 100k$... or the microsoft lottery...

I can't beleive that such scams can work... I laugh by imagining some wife 
founding that
kind of bullshit (alexa) on her spouse mailbox, that can be the root of some 
"little" trouble...

anyway let's go back to work!


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Wade, William


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> Sent: Thursday, October 29, 2015 at 12:10 PM
> From: "Simon Slavin" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 2:29pm, Jason H  wrote:
> 
> > In college databases, we calculated the estimated number of blocks 
> > (512-byte device blocks) read given schema and statistics. Now, I'm asking 
> > how would SQLite actually do that?
> 
> SQLite uses a block structure in its database files, but one file block is 
> not always the same size as a storage-subsystem block.  If you're okay with 
> that, then the answer is to do it at the VFS level as Dominique described.
> 
> > I am particularly interested in knowing if SQLite can skip the reading of 
> > columns (and their disk blocks) not used in the query. that is, would Q1 
> > read on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 
> > 256 blocks, or 16384 blocks (the whole table again, irrespective of the 
> > previous statement and caches). Under what scenarios does the behavior 
> > change (indexes?)
> 
> From your question it seems that you believe that SQLite stores the data for 
> many rows in a column in the same block.  This is not the case.  SQLite 
> stores data in tables in one contiguous section per row of the table.  So the 
> data for one row, not one column, is stored in one chunk (possibly all in the 
> same block), with column 1 first, then column 2, etc..  Like this
> 
> A table
>   A row
>   column 1
>   column 2 ...
>   A different row 
>   column 1
>   column 2 ...
>   A different row ...
> 
> Let's assume that your SELECT's WHERE and ORDER BY clauses can be perfectly 
> served using an index -- either the primary key index for the table or 
> another index created using CREATE INDEX.  When this is the case, data from 
> the table is needed only once the right row has been found, and the columns 
> needed are detailed in the part between SELECT and FROM.  SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row. [1]
> 
> So when creating a table you would theoretically get an improvement in 
> efficiency if you put the columns in decreasing order of use.  In real life 
> situations this tends to have significant effect only when you put long text 
> columns at the end of the rows.
> 
> Something similar is done when the SELECT's WHERE and ORDER BY clauses can't 
> be perfectly served using an index.  In that case the columns needed for 
> those clauses needs to be read from the table as well as the columns needed 
> by the part between SELECT and FROM.
> 
> Simon Slavin.
> 
> [1] Columns whose values are mentioned in the index used do not need to be 
> read from the table since those values are already known.  And there are some 
> details I've left out for brevity.


Thanks, this is some really great information!

If I could ask a followup question. You made the statement "SQLite reads that 
row of the table from storage, from the first column to the last column needed 
by the SELECT, but perhaps not all the way to the end of the columns in the 
row.", Given an example select that requires columns 2,3,5, does this mean that 
column 4 is also read, or is column 4 be skipped? I guess the question is I am 
assuming that a row is serialization of one or more strings that are a 
serialization of a string preceded by their lengths (i.e. [L|data] ), would 
SQLite do something akin to (using my 2,3,5 example):
begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read L3*byte, 
read L4, seek L4, read L5, read L5*byte
or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

Many thanks again!




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 6:09pm, Jason H  wrote:

> This seems to be at odds with my understanding of "2.1 Record Format" of the 
> document. Given that it reads the row varint, which contains the length of 
> itself and the serial types of the columns, in order, it should be completely 
> able to skip the reading of unneeded columns. If I have 39,39,39 as my header 
> (excepting the length of header) I know I am dealing with 3 text columns of 
> 13 bytes each. If I want only the last column, I can then seek(78), read(26). 
> At no time did I need to read the prior two columns.

You're right.  I explained it wrong.  You only need to read the serial type of 
the preceding columns.  You don't need to read past the contents.  But you do 
need to read and interpret the serial types for all columns 1 to 4 in order to 
know where column 5 starts.  There's no indication that says "column 5 starts 
at byte 1828.".

Thanks for checking and not just believing me.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Paul Sanderson
It reads a complete page at a time so there is no seeking other than
to the start of each row - in the sense of a disk seek.

Note that there may be multiple required rows on the same page if the
row length is much less than the page length, or if rows are longer
than a size determined by some arcane math from the page size (see the
file format documentation), a row may overflow to one or more pages
. 
Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence


On 29 October 2015 at 17:59, Jason H  wrote:
>
>
>> Sent: Thursday, October 29, 2015 at 1:34 PM
>> From: "Scott Hess" 
>> To: "SQLite mailing list" 
>> Subject: Re: [sqlite] How would sqlite read this from disk?
>>
>> On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
>> >
>> > If I could ask a followup question. You made the statement "SQLite reads
>> > that row of the table from storage, from the first column to the last
>> > column needed by the SELECT, but perhaps not all the way to the end of the
>> > columns in the row.", Given an example select that requires columns 2,3,5,
>> > does this mean that column 4 is also read, or is column 4 be skipped? I
>> > guess the question is I am assuming that a row is serialization of one or
>> > more strings that are a serialization of a string preceded by their lengths
>> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
>> > example):
>> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
>> > L3*byte, read L4, seek L4, read L5, read L5*byte
>> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
>>
>>
>> You should consider reading https://www.sqlite.org/fileformat2.html ,
>> especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
>> Pages".  If your project _really_ needs to know this level of detail, then
>> you really should read up on the underlying system.  Also maybe throw in
>> https://www.sqlite.org/arch.html to get a broad feel of how things break
>> down.
>
>
> Thanks Scott. I had actually already linked to the fileformat2 url, but I did 
> not see a discussion to the level of detail for row reading, for which I am 
> after. I did however re-read in the context of your previous statement and 
> gained some insight. I should a have stated it more like this:
> begin row read header_varint, serial_types*N, seek, read, seek, read or
> begin row read header_varint, serial_types*N, seek, read, read, read
>
> The documentation does not go into the detail of the engine is able to skip 
> the reading of unneeded interior rows. In theory, it can because the length 
> is contained in the header. So instead of read() on every column in the row, 
> it can call seek() if it knows it doesn't need that column. My question is 
> now simply: does it seek past unneeded columns, or does everything get send 
> through read once data has started being read? It's a minor detail with bug 
> performance implications.
>
> Many thanks again.
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Documentation change suggestion: formatchng.html

2015-10-29 Thread Simon Slavin
I find nothing in error on the page



but it could perhaps be updated to mention that from version 3.9.0 of SQlite, 
file format changes can happen only when the middle number changes.  Or 
something of the sort.

Irrelevant to that, I somehow expected the table to be written the other way up 
... with the recent changes at the top.  Not sure whether this is just me.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 5:20pm, Jason H  wrote:

> Thanks, this is some really great information!

You're welcome.

> If I could ask a followup question. You made the statement "SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row.", Given an example select that requires columns 2,3,5, does this 
> mean that column 4 is also read, or is column 4 be skipped? I guess the 
> question is I am assuming that a row is serialization of one or more strings 
> that are a serialization of a string preceded by their lengths (i.e. [L|data] 
> ), would SQLite do something akin to (using my 2,3,5 example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

You are missing some useful information.  Whenever SQLite has to change even 
one variable-length column in a row it rewrites the whole row.  It has no way 
of storing a short column in the space originally used by a longer one.  So 
even if all you do is

UPDATE t SET c5='short string' WHERE c5='the original extremely long string'

the new copy of any of those rows has the new c6 immediately following the new 
c5.  If you've read to the end of c5 then you're already pointing to c6 so 
there's no need to seek it.  Contriwise, the only way to seek c6 is to read all 
the columns before it.  There is no pointer to the beginning of c6 and no magic 
mark that says "Column c6 starts here".

I'll answer your specific question (I hope) with an example.

CREATE TABLE t (c1, c2, c3, ... c8);
CREATE INDEX t_23 ON myTable (c2, c3);
SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;

Given the above, the clauses in the SELECT can be perfectly satisfied by the 
index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.

First SQLite figures out which rows, are needed to answer the query.  And in 
doing that it decides to use index t_23 which means that for each of those rows 
it already has the value of c3.  So knows which rows to read, and it knows that 
it still needs c1, c5 and c6 from each of those rows.  The rows go up to c8, 
but it doesn't need any column past c6.

So for each row needed, SQLite is pointed to the beginning of c1 for that row.  
There's no way of knowing where to find c5 without reading all the preceding 
columns, because any of those may be any length.  So it progresses through all 
the columns from c1 to c6 using the length-indicator to skip them or read them 
depending on whether it needs their values.  It knows it can stop after c6 so 
it stops once it has seen that one.

Hope that helps.  If you want the fine detail about the file format, you can 
find it here:



though that won't give you the algorithm details I've described above.

Simon.


[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Nicolas Jäger
damn! I wanted a date!

> On 10/29/15, Simon Slavin  wrote:
> >
> > You don't even need to mess with the genuine list server.  Just grab all the
> > addresses it sends to and use another computer to send messages directly to
> > those addresses using the binary-chop thing.  
> 
> Dan wrote a scribe to do this, which identified the offender.  That
> individual has been banished.  The mailing list has been reverted back
> to show the senders email address.
> 
> Please let me know if you get any further emails from Alexa.



[sqlite] SQLite build dependency on tclsh

2015-10-29 Thread Jaromir Capik
Hello everyone.

> > Hi,
> > with the recent replacement of awk build scripts with tclsh ones, a
> > problem has risen. The tcl is not commonly available when building the
> > environment/distribution from ground up (bootstrap), and as a
> > consequence, SQLite can no longer be built in these early stages.
> >
> > Would it be possible to bring back the awk-based build scripts?
> >
> 
> No tclsh dependency has been added as there has always been a tclsh
> dependency.

Unfortunately that is not true in our case. The distribution bootstrap
always configured the sqlite sources with the --disable-tcl switch.
There was never a need for tcl during the first 2 stages of bootstrap.


>  What we have done is remove the awk dependency by
> converting all awk scripts into tclsh scripts.
> 
> Dependencies before the change:  C-compiler, make/nmake, awk, tclsh

Not true for the distro bootstrap (we only have C, make and gawk).
Awk is one of the basic tools that is available since the first
stage of bootstrap and it's integrated in busybox. Tcl is not
and never was.


> Dependencies after the change:  C-compiler, make/nmake, tclsh
> 
> The driver for this change was Windows users, who now only have to
> install VC++ and ActiveState TCL in order to compile SQLite from
> canonical sources, whereas before they also had to locate and install
> awk.exe.

Getting free awk is not that difficult and depending on the license
you could even bundle the latest binary with your sources. But we
don't want to force you to switch from tcl back to awk. We'd like
to ask you to support both ways of building if that doesn't mean
a lot of extra maintenance work. The tcl and awk scripts could
co-exist, couldn't they?

> The fact that SQLite is a TCL extension that has "escaped
> into the wild" is also a consideration.
> 
> Hmmm...  It appears that there used to be a some build targets that
> only used awk and not tclsh in the older makefiles for unix.  So if
> you were only using the build targets that constructed SQLite from
> individual source files, and not the amalgamation, you could get by
> with just awk and without tclsh.   I see your point.  However, the
> amalgamation builds are faster (faster in the sense that the resulting
> libsqlite3.a file uses about 7% fewer CPU cycles).
> 
> I think it is better for us to try to make life easier for the
> millions and millions of Windows users than for the handful of people
> who are bootstrapping Linux.  Is it really that difficult to build
> tclsh first?

This is not about being difficult. It just goes against the trend
of striping unnecessary dependencies from the first stages.
The thing is, that how time goes, upstream devels of critical
components often create hardcoded loops in the dependencies
or introduce deps pulling huge trees of other deps and this happens
so often that we end-up patching the sources of basic components
with each new release in order to be able to build the distro
from scratch. It's a nasty and dirty work. Devels always forget
about bootstrap and that causes complications. What we're trying
to do is to scream earlier than too late. Second stage runs
natively on the hw with no ssh connection and it's usually very
difficult to troubleshoot anything during that stage. That's why
we're trying to minimize the number of components to be built
in stage2. Since sqlite is required for rpm and rpmbuild,
we would have to build tcl twice in order to get a sane build.
And of course we're scared of future. One day someone could
introduce a nasty dependency in tcl and we would end-up
building a ton of unnecessary stuff in order to satisfy that.

So, if there's still just a small chance of the awk support
resurrection, I would say 'please, help us to keep the
bootstrap procedure simple and reliable'.

Thanks in advance.

Regards,
Jaromir.

--
Jaromir Capik
Red Hat Czech, s.r.o.
Software Engineer / Secondary Arch

Email: jcapik at redhat.com
Web: www.cz.redhat.com
Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic
IC: 27690016 


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> I'm open to ideas, but I was going to use this as an excuse to invent
> something of a general tool.

Post your schema and query.  Also run ANALYZE and post the content of
the sqlite_stat1 table.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Documentation change suggestion: formatchng.html

2015-10-29 Thread Richard Hipp
On 10/29/15, Simon Slavin  wrote:
>
> On 29 Oct 2015, at 6:58pm, Richard Hipp  wrote:
>
>> That page was really old and mostly obsolete - at least the header
>> text was.  I've rewritten it.  Please press "Reload"
>
> Good.  Will it be updated for everything up to and including partial indexes
> ?  It's a lot of work so I don't expect it right now.

I don't feel a pressing need to document forwards compatibility breaks.

>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Backing up SQLite DB with unbuffered I/O

2015-10-29 Thread Clemens Ladisch
Eric Grange wrote:
> Is there a way to perform an online backup with unbuffered I/O?

A tool like ionice changes the I/O priority, which is something different.

You would have to modify SQLite, or write your own VFS.


Regards,
Clemens


[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Marc L. Allen
I save one. I can forward it to you, if you'd like. 

> On Oct 29, 2015, at 5:37 PM, Nicolas J?ger  wrote:
> 
> damn! I wanted a date!
> 
>>> On 10/29/15, Simon Slavin  wrote:
>>> 
>>> You don't even need to mess with the genuine list server.  Just grab all the
>>> addresses it sends to and use another computer to send messages directly to
>>> those addresses using the binary-chop thing.  
>> 
>> Dan wrote a scribe to do this, which identified the offender.  That
>> individual has been banished.  The mailing list has been reverted back
>> to show the senders email address.
>> 
>> Please let me know if you get any further emails from Alexa.
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.


[sqlite] Backing up SQLite DB with unbuffered I/O

2015-10-29 Thread Eric Grange
Is there a way to perform an online backup with unbuffered I/O?

I am currently performing backups with the command-line tool and ".backup",
but this appears to wreak havoc on the OS disk cache.

I am able to mitigate this by disabling buffering on the destination drive,
but that cannot be used for the source drive.

Eric


[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 3:28pm, Richard Hipp  wrote:

> Dan wrote a scribe to do this, which identified the offender.

Woo hoo !  Thanks to Dan for this and everything else he does for the list.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 2:29pm, Jason H  wrote:

> In college databases, we calculated the estimated number of blocks (512-byte 
> device blocks) read given schema and statistics. Now, I'm asking how would 
> SQLite actually do that?

SQLite uses a block structure in its database files, but one file block is not 
always the same size as a storage-subsystem block.  If you're okay with that, 
then the answer is to do it at the VFS level as Dominique described.

> I am particularly interested in knowing if SQLite can skip the reading of 
> columns (and their disk blocks) not used in the query. that is, would Q1 read 
> on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 256 
> blocks, or 16384 blocks (the whole table again, irrespective of the previous 
> statement and caches). Under what scenarios does the behavior change 
> (indexes?)


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>>
> If I were to try to work around...

Before we go any further, have you actually measured a performance
problem?  Or are you optimizing without prior knowledge of where your
application is spending time?

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 3:29 PM, Jason H  wrote:

> I'm trying to figure out how SQLite3 would read the schema and data off
> the disk below. I read https://www.sqlite.org/fileformat2.html but didn't
> find what I was looking for.
> In college databases, we calculated the estimated number of blocks
> (512-byte device blocks) read given schema and statistics. Now, I'm asking
> how would SQLite actually do that?
>
> Given this pseudo code:
> create table aTable(c1 text, ..., c128 text);
> for 1 to 100,000:
>  insert into aTable (data1, ..., data128); -- where each dataN is a random
> length from 0 to 128k bytes, with an average of 64kb
>
> Now for the magic:
> select data64 from aTable; -- Q1
> select data1, data128 from aTable; -- Q2
>
> I am particularly interested in knowing if SQLite can skip the reading of
> columns (and their disk blocks) not used in the query. that is, would Q1
> read on order of 128 blocks, or 16384 blocks (the whole table). And in Q2,
> 256 blocks, or 16384 blocks (the whole table again, irrespective of the
> previous statement and caches). Under what scenarios does the behavior
> change (indexes?)
>
> Many, many thanks in advance for this info.
>

Depends on the page size, page cache size, vacuum options, etc...

If you really want to know at that level of detail, then write your own
shim VFS (http://www.sqlite.org/vfs.html) to know when/if SQLite reads and
writes to disk.

I think I remember seeing one of the graphical SQLite tools showing a "heat
map" of DB pages, which at the time thought must be using a VFS to show
that. --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
I'm trying to figure out how SQLite3 would read the schema and data off the 
disk below. I read https://www.sqlite.org/fileformat2.html but didn't find what 
I was looking for.
In college databases, we calculated the estimated number of blocks (512-byte 
device blocks) read given schema and statistics. Now, I'm asking how would 
SQLite actually do that?

Given this pseudo code:
create table aTable(c1 text, ..., c128 text);
for 1 to 100,000:
 insert into aTable (data1, ..., data128); -- where each dataN is a random 
length from 0 to 128k bytes, with an average of 64kb

Now for the magic:
select data64 from aTable; -- Q1
select data1, data128 from aTable; -- Q2

I am particularly interested in knowing if SQLite can skip the reading of 
columns (and their disk blocks) not used in the query. that is, would Q1 read 
on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 256 
blocks, or 16384 blocks (the whole table again, irrespective of the previous 
statement and caches). Under what scenarios does the behavior change (indexes?)

Many, many thanks in advance for this info. 




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Dominique Devienne  wrote:
>
> I've discussed blobs a few times already :). Especially the fact you can't
> have transactional and incremental updates of blobs (i.e. the transaction
> "copies" only the blob pages actually modified).

Large blobs are transactional when they are stored in SQLite tables.

Perhaps you are talking about storing BLOBs in separate files and then
just storing the filename in the SQLite table?

If you do have large BLOBs or strings, SQLite handles this best if the
large blob/string is stored in a table by itself and then linked into
other tables using an integer primary key.  For example:

   CREATE TABLE BigBlobTab (
   blobid INTEGER PRIMARY KEY,
   content BLOB -- big BLOB or text field.
   );
   CREATE TABLE OtherStuff (
   field1 VARCHAR(10),
   field2 FLOAT,
   field3 BOOLEAN,
   blobid INT REFERENCES BigBlobTab
   );

This technique work similarly to storing BLOBs in separate files, in
that it does not mix the BLOB content with the rest of the row, making
the row faster to search and update.  But it has the advantage over
separate storage that BLOB changes are transactional and all content
is stored in a single file on disk.

This is how the Fossil VCS (https://www.fossil-scm.org/) stores content, fwiw.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Mailing list policy change

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 2:21pm, Adam Devita  wrote:

> Question:  Is it possible for the admin to easily backup the list,
> bisect it, and test for spam?
> That technique should identify the offending address in
> log2(N-Users-Subscribed) attempts.

You don't even need to mess with the genuine list server.  Just grab all the 
addresses it sends to and use another computer to send messages directly to 
those addresses using the binary-chop thing.  You only need to send the second 
message to half the subscribers and the third to a quarter.

I'm not saying that this should be done, just that there's an easier way to do 
it.

Simon.


[sqlite] Security vulnerability in the SQLite Encryption Extension

2015-10-29 Thread Richard Hipp
Part of the interface design to the SQLite Encryption Extension makes
it possible for an application to accidentally supply weak keys.  This
only happens if the keying interface is misused.  Though to be fair,
the documentation could be clearer about how to avoid misusing the
interface.

If an application does misuse the keying interface and supplies a weak
key, then SEE offers no warning.  The database appears to be fully
encrypted.  But an attacker will be able to easily guess the
encryption key.

I will publish patches to SEE in about two weeks that better document
how to avoid misusing the keying interface and perhaps also to provide
feedback (errors) in the case where the keying interface is misused in
a way that results in weak keys.

In the meantime, if you are an SEE licensee and are concerned that you
might be using weak keys in your application, you can contact me
privately (via email to drh at sqlite.org or by phone at +1.704.948.4565)
to learn more about the problem and how you can mitigate the problem
before it is disclosed.

Note that the details of the problem will only be disclosed to other
SEE licensees and not to the general public.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Documentation change suggestion: formatchng.html

2015-10-29 Thread Richard Hipp
On 10/29/15, Dominique Devienne  wrote:
> On Thu, Oct 29, 2015 at 6:59 PM, Simon Slavin  wrote:
>
>> I find nothing in error on the page
>>
>> 
>>

That page was really old and mostly obsolete - at least the header
text was.  I've rewritten it.  Please press "Reload"
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Wade, William  wrote:
> From https://www.sqlite.org/fileformat2.html, if a row does not fit into a
> single block, its overflow is stored in a linked list. The implication (to
> me) is that if a particular row requires twenty blocks of storage, and I
> want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.

(1) We encourage developers to put small boolean and integer value
fields, and frequently accessed fields, first in the CREATE TABLE
statement, and put huge strings and blobs at the end, for exactly this
reason.

(2) SQLite is able to skip intermediate blocks on the linked list
sometimes, when the database is in incremental- or auto-vacuum mode.
It's now 100%* but it works pretty well.  Even so, solution (1) works
better so that is what you should do.

* In the previous paragraph "It's not 100%" means that SQLite is not
always successful in skipping intermediate blocks.  It does always get
the correct answer, 100% of the time.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> Ah, so this is what I seem to have missed. The pages... This is unfortunate
> as the read-heavy application won't likely benefit from SQLite.

Your filesystem and your disk hardware work the same way.  Your
application might only ask for 10 bytes, but the underlying OS and
hardware are pulling a complete sector (or perhaps a complete track)
from disk, pass the 10 bytes you asked for off to your app, and
holding the rest in cache.

So, no, you cannot improve performance by only reading a few bytes
here and a few bytes there.  The disk has to work just as hard.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> This seems to be at odds with my understanding of "2.1 Record Format" of the
> document. Given that it reads the row varint, which contains the length of
> itself and the serial types of the columns, in order, it should be
> completely able to skip the reading of unneeded columns. If I have 39,39,39
> as my header (excepting the length of header) I know I am dealing with 3
> text columns of 13 bytes each. If I want only the last column, I can then
> seek(78), read(26). At no time did I need to read the prior two columns.
>

I think you are confusing "reading" with "decoding".

SQLite always reads content from disk in page-size chunks.  Each page
is typically 1024 or 4096 bytes, but can be any power of two between
512 and 64K.

Most of the time, a complete row is contained on a single page.  And
that whole page is read off of disk and into memory all in one go.  If
the row has 12 columns and you only ask for the 5th one, then only the
5th column is decoded.  And only the first 5 "serial type" integers
are decoded.  But since all 12 columns are on the same page, they all
get read into memory.

If a single row overflows unto multiple pages, the overflow pages are
only read if needed.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Alexa has been banished. Was: Mailing list policy change

2015-10-29 Thread Richard Hipp
On 10/29/15, Simon Slavin  wrote:
>
> You don't even need to mess with the genuine list server.  Just grab all the
> addresses it sends to and use another computer to send messages directly to
> those addresses using the binary-chop thing.

Dan wrote a scribe to do this, which identified the offender.  That
individual has been banished.  The mailing list has been reverted back
to show the senders email address.

Please let me know if you get any further emails from Alexa.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:59 AM, Jason H  wrote:
>
> The documentation does not go into the detail of the engine is able to
> skip the reading of unneeded interior rows. In theory, it can because the
> length is contained in the header. So instead of read() on every column in
> the row, it can call seek() if it knows it doesn't need that column. My
> question is now simply: does it seek past unneeded columns, or does
> everything get send through read once data has started being read? It's a
> minor detail with bug performance implications.


Everything is page-structured, so you can't skip anything smaller than a
page (even if you could, there may be no benefit, since seeks cost way more
than reads).  Rows which are larger than fit in a page use a singly-linked
list of overflow pages, so you can't get to page N+1 without reading page N
first.

-scott


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
>
> If I could ask a followup question. You made the statement "SQLite reads
> that row of the table from storage, from the first column to the last
> column needed by the SELECT, but perhaps not all the way to the end of the
> columns in the row.", Given an example select that requires columns 2,3,5,
> does this mean that column 4 is also read, or is column 4 be skipped? I
> guess the question is I am assuming that a row is serialization of one or
> more strings that are a serialization of a string preceded by their lengths
> (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
> example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?


You should consider reading https://www.sqlite.org/fileformat2.html ,
especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
Pages".  If your project _really_ needs to know this level of detail, then
you really should read up on the underlying system.  Also maybe throw in
https://www.sqlite.org/arch.html to get a broad feel of how things break
down.

-scott


[sqlite] Mailing list policy change

2015-10-29 Thread Adam Devita
Assuming the A*spammer is a basic algo subscribed to the list, and
sending to any sender in a reasonably short time after posting;

Question:  Is it possible for the admin to easily backup the list,
bisect it, and test for spam?
That technique should identify the offending address in
log2(N-Users-Subscribed) attempts.

If unfeasible, I'd prefer mangling / salting the e-mail addresses of
users (if this can be done easily) displayed to thwart bot spammers
and still see the names of the poster.   I follow interesting 'topics'
rather than people, but I think seeing the names of posters up-front
is part of the community dynamic that has been built, and makes
following the exchanges easier to follow, particularly for longer
threads.

Adam D



On Thu, Oct 29, 2015 at 4:01 AM, SQLite mailing list
 wrote:
> On Wed, Oct 28, 2015 at 6:52 PM, General Discussion of SQLite Database <
> sqlite-users at mailinglists.sqlite.org> wrote:
>
>> Effective immediately, the sender email address for mailing list posts
>> will be elided.  All replies must go back to the mailing list itself.
>>
>
> Please reconsider. Not knowing who's talking is untenable.
>
> Let each and everyone's SPAM filter take care of it.
>
> As someone already mentioned, there are tons of way to harvest past email
> addresses from archives anyway.
>
> --DD
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
--
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1


[sqlite] Simple Math Question

2015-10-29 Thread SQLite mailing list
On 29 October 2015 at 09:46, SQLite mailing list <
sqlite-users at mailinglists.sqlite.org> wrote:
>
> which I understood to mean, "if you can represent it in decimal, you
> can represent it in binary".  I didn't think that was true, but there
> seemed to be concensus that it was.
>

The consensus was the other way: "If you can represent it in binary, you
can represent it in decimal."

-Rowan


[sqlite] Mailing list policy change

2015-10-29 Thread SQLite mailing list
On Wed, Oct 28, 2015 at 6:52 PM, General Discussion of SQLite Database <
sqlite-users at mailinglists.sqlite.org> wrote:

> Effective immediately, the sender email address for mailing list posts
> will be elided.  All replies must go back to the mailing list itself.
>

Please reconsider. Not knowing who's talking is untenable.

Let each and everyone's SPAM filter take care of it.

As someone already mentioned, there are tons of way to harvest past email
addresses from archives anyway.

--DD


[sqlite] Casting ctype functions' arguments

2015-10-29 Thread Richard Hipp
On 10/28/15, SQLite mailing list  
wrote:
> Hi, everyone.
>
> I've been auditing the OpenBSD codebase for calls to ctype functions
> with potentially signed chars. This is undefined on some platforms. I
> found a number of instances in Sqlite, so I cloned your repo and ran my
> script on it.

Thank you for the audit.

Please note that all instances you found are either in obscure or
deprecated extensions or in test code or in build infrastructure.
None of the identified problems are in the SQLite core - the part that
actually gets used.  I don't know if this makes any difference to you
or not.

Some of the places you identified as problems really are not problems.  Example:

>  static int safe_isspace(char c){
> -  return (c&0x80)==0 ? isspace(c) : 0;
> +  return (c&0x80)==0 ? isspace((unsigned char)c) : 0;
>  }

The original (c&0x80)==0 test makes isspace(c) safe.  Indeed, that is
the whole point the safe_isspace() routine.

For this example, note also that FTS1 is deprecated code that is kept
for historical reference only, and not actually used for anything.

I will go through and make lots of little changes to the code to
pacify your analysis script.  Let's hope that I don't break anything
in the process!

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Simple Math Question

2015-10-29 Thread SQLite mailing list

On 29 Oct 2015, at 2:09am, SQLite mailing list  wrote:

> The consensus was the other way: "If you can represent it in binary, you
> can represent it in decimal."

Well that one is actually true.  If you can represent any non-recurring 
fraction in binary, in decimal it's a non-recurring fraction ending in a 5.

Simon.


[sqlite] Simple Math Question

2015-10-29 Thread SQLite mailing list
At 23:34 28/10/2015, you wrote:
 >---

> > Those binary representations can be converted back into precise decimal
> > representations, but those decimal representations will not be the 
> original
> > decimal values, because they were translated from decimal strings into
> > binary floating-point values and back into decimal strings.
>
> > -scott
>
>This explains the deficiency in the SQLite print function, but it doesn't
>have to be that way.
>
>See: Steele, Jr., Guy L., and White, Jon L. How to print floating-point
>numbers accurately. In Proc. ACM SIGPLAN ???90 Conf. Prog. Lang. 
>Design and
>Implementation. ACM (White Plains, NY, June 1990), 112?126. ACM SIGPLAN
>Noticess 25, 6 (June 1990).
>
>A retrospective by Steele & White is here:
>
>http://grouper.ieee.org/groups/754/email/pdfq3pavhBfih.pdf
>
>I'm not advocating that SQLite add Steele & White's Dragon algorithm, just
>pointing out that there are ways to fix the deficiency.
>
>--
>Doug Currie

While it's possible to (somehow) minimize the issues involved with 
printing a floating-point value (albeit at high cost), the issue of 
comparing them as is done in the OP is a pretty different beast. There 
you have to convert a decimal FP target constant to a binary value 
stored in FP register or memory storage then perform a comparison.

And contrary to Simon, I don't think that:

>sqlite> CREATE TABLE t(r REAL PRIMARY KEY,t TEXT);
>sqlite> INSERT INTO t VALUES (21.0,'twenty one point zero');
>sqlite> INSERT INTO t VALUES (9.2+7.9+0+1.0+1.3+1.6, 'calculation');

should bark for duplicate PK, since the values are hardly equal in 
practice.
(Else SQLite would indeed raise a dup PK error!)


BTW and following an entirely distinct thread: I'd rather filter Alexa 
out myself using my mail client features.


jcd at antichoc.net



[sqlite] Mailing list policy change

2015-10-29 Thread SQLite mailing list
I think I received about four, which I removed in a couple of seconds. 
Obviously it is a problem, but I don't think it calls for a change that 
makes it impossible to see the sender of each message. I always open 
messages from the SqLite developers sort of by default, for instance, 
which I can no longer do.

Kind regards,

Philip Bennefall

On 10/28/2015 11:49 PM, SQLite mailing list wrote:
>   
>> Has anybody received email from Alexa since the policy change?  I have
>> not
> I have never received any ... presumably Alexa's MTA (s if more than one) is 
> blacklisted ...
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> .
>