Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-11 Thread Paul
> 
> 
> CREATE INDEX samples_by_time ON samples(time);
> CREATE INDEX samples_by_name ON samples(name_id);
> CREATE INDEX samples_by_type ON samples(type_id);
> 
> 

Maybe this is dumb idea (sorry for that), but I'll try to write it down:

What if to use an index for an index?
You create N additional talbes as:

CREATE TABLE index_field_N_table (index_field INT PRIMARY KEY, main_table_id 
INT);
...


Now you can use those aux tables as indices, ie

SELECT * FROM index_field_N_table LEFT JOIN main_table ON main_table.id = 
index_field_N_table.main_table_id WHERE index_field_N_table.index_field = ;

OR if you need it just for sorting:

SELECT * FROM index_field_N_table LEFT JOIN main_table ON main_table.id = 
index_field_N_table.main_table_id WHERE ORDER BY 
index_field_N_table.index_field;

SqLite is very smart, so selects will work properly:
sqlite> create temp table xxx (id int PRIMARY key, some_val int);
sqlite> create temp table yyy (id int PRIMARY key, some_val int);
sqlite> explain query plan SELECT * FROM yyy LEFT JOIN xxx ON xxx.id = 
yyy.some_val WHERE yyy.id = 1;
selectidorder   fromdetail  

--  --  --  

0   0   0   SEARCH TABLE yyy USING INDEX 
sqlite_autoindex_yyy_1 (id=?) (~1 rows)
0   1   1   SEARCH TABLE xxx USING INDEX 
sqlite_autoindex_xxx_1 (id=?) (~1 rows)
sqlite> explain query plan SELECT * FROM yyy LEFT JOIN xxx ON xxx.id = 
yyy.some_val ORDER BY yyy.id ;
selectidorder   fromdetail  
 
--  --  --  
-
0   0   0   SCAN TABLE yyy USING INDEX 
sqlite_autoindex_yyy_1 (~100 rows)
0   1   1   SEARCH TABLE xxx USING INDEX 
sqlite_autoindex_xxx_1 (id=?) (~1 ro



AFAIK you can insert data into several tables simultaneously (we can call this 
an 'index creation').
If I'm wrong and SQLite does not support simultaneous inserts into separate 
tables, you can use separate database for each table.

To make a creation of 'indices' faster, you can aggregate their data and make 
transaction batches much bigger.
Or better yet, if possible memory-wise, dump the data into aux tables at the end


Sorry, I did zero research and I'm not an expert in this field.
I have no clue, how much bigger will your database become and how will it 
affect the performance.
I'm not even sure if 'index creation' be any faster... Make your own research 
if you like.
Just wanted to share what came to my crazy mind :)

Good luck!

Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-11 Thread Markus Schaber
Hi,

> Von: Fabian Giesen
> I'm working on a profiler that uses SQLite to store and index the
> results.
> 
> Due to the nature of the task, the workload ends up splitting into
> essentially three phases:
> 
> 1. Bulk insertion of relatively large amounts of data (low hundreds
> of megabytes to tens of gigabytes); a typical profile generates on
> the order of 10MB of sampling data per second.
> 
> 2. Indexing. The first phase normally runs without any indexes on the
> sample tables, to maximize insertion throughput. In the second phase
> we create the indexes we need to retrieve the samples efficiently.
> (We found "batch-indexing" after the fact to generally lead to
> shorter overall processing times.)
> 
> 3. Browse the results. Once the samples are indexed, the user can
> look around, analyze the results using queries, and so forth. This is
> essentially read-only.
> 
> Phase 2 is crucial for quick retrieval, but at the same time it
> essentially amounts to wait time for the user, so we'd like to get it
> over with as quickly as possible. Now, looking at what happens in
> there, I noticed that we often end up creating several indexes on the
> same table back-to-back:
> 
>CREATE INDEX samples_by_time ON samples(time);
>CREATE INDEX samples_by_name ON samples(name_id);
>CREATE INDEX samples_by_type ON samples(type_id);
> 
> Each of these statements ends up scanning over the entire table once.
> Since the tables in question are quite large, I would love to have
> some way to create multiple indexes in one pass. A cursory
> examination of the VDBE code produced for the CREATE INDEX statements
> indicates that SQLite's backend should be capable of doing so (though
> I'm no expert), but I don't see a way to express what I want in
> standard SQL.
> 
> So, is there any way to create multiple indexes for the same table in
> one operation in SQLite? If not, how hard would it be to add?

If one of your columns is unique, you could declare this column as primary key 
and use "WITHOUT ROWID".

If the column is also of type int, you could use the "INTEGER PRIMARY KEY" 
optimization which has a similar effect.

This saves the time of creating the index altogether, saves a few bytes for the 
row id column, and the table will be pre-ordered by said value.

But be sure you read the caveats in the documentation before proceeding.
https://www.sqlite.org/withoutrowid.html
https://www.sqlite.org/lang_createtable.html#rowid
 
I'm also not sure how this affects insertion time, but if the value is growing 
monotoneously, it should not differ too much from the standard auto-generated 
ROWID behavior which has the same properties.


Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Dominique Pellé
Fabian Giesen wrote:

> Simon Slavin wrote:
...
>> Also, store your time as a numeric value (Unix epoch or Julian day) rather
>> than a string.  Numbers are faster to handle than strings.  INTEGERs are
>> faster to handle than REALs.
>
>
> :-)
>
> The time fields are 64-bit ints. The exact meaning depends on the
> capture target platform (we support several) but usually it's either
> a system-synchronized CPU clock cycle counter (when available) or
> ticks for a high-frequency bus/event timer.


It's worth adding that SQLite stores integers in variable length
(from 1 to 9 bytes). Small integers are more compact to store
and are thus more efficient  (both in the table and in its indexes).
So if you can store numbers relatively to a known minimum
values, you can make records smaller.

See: http://sqlite.org/src4/doc/trunk/www/varint.wiki

Multiple indexes are independent from each other.
So it would be nice if creation of multiple indexes could
be done in parallel (one thread per index).  That's
assuming that creation of indexes is CPU bound and
not I/O bound.

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Simon Slavin

On 11 Jun 2014, at 12:21am, Fabian Giesen  wrote:

> I just wanted
> to ask to make sure there wasn't a way I was missing!

Nothing obvious from your clear description.  Good luck with it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Fabian Giesen

On 6/10/2014 2:25 PM, Simon Slavin wrote:


On 10 Jun 2014, at 7:22pm, Fabian Giesen  wrote:


  1. Bulk insertion of relatively large amounts of data (low hundreds of 
megabytes to tens of gigabytes); a typical profile generates on the order of 
10MB of sampling data per second.


I presume you do your INSERTs inside a transaction.  If not, you should.


Yes, it's all wrapped in transactions; typically on the order of 10-100
a second. We're talking (order of magnitude) about 100k rows inserted 
per second of capture here; it would be nowhere near fast enough if

every INSERT was its own transaction.


Each of these statements ends up scanning over the entire table once. Since the 
tables in question are quite large, I would love to have some way to create 
multiple indexes in one pass.


Not possible in current SQLite.  No standard syntax for doing it in SQL.  A 
quick google suggests that no other popular DBMS implements it.


I beg to differ.

MySQL's InnoDB has it, for example:

  http://dev.mysql.com/doc/refman/5.5/en/innodb-create-index-examples.html


And ... I'm not sure it would be that much faster.  The majority of the work 
it's doing is writing.  Reading the existing table data is fast and easy, but 
writing new data and figuring out the pointers takes the majority of the time.  
Because of this I don't think it likely that this feature will be added to 
standard SQLite.  Of course, if you really care about this you could add it 
yourself.


The majority of the work is sorting, as Richard correctly points out,
not writing. The indexes in question are all on one or two integer keys
so they don't end up particularly big.

The average sample record comes out at ~100 bytes. All the records are
integer values; anything more complicated associated with a sample is
stored as unique records in separate tables and referenced via foreign
keys.


To optimize this, when you create the table put the columns you're going to 
index near the beginning of the rows.  SQLite stops reading the data of a table 
row once it has all the columns it needs.  I don't know whether this will make 
a lot of difference but it won't do any harm.


When the dataset is small enough to fit in memory (this is on
workstation PCs, so this means "less than a few gigabytes") we're CPU
bound (doing sorting, mostly, as Richard correctly suggested).

However, looking at the sorting time vs. number of records, sorting (in
that scenario, for our indices) typically proceeds at well over 1M
records/s (depends on the machine, of course; on my work PC it's
closer to 1.4Mrecs/s).

At that rate, in an out-of-core situation, we would need to be reading
~140MB/s of records, sustained, during index build time, plus the write
bandwidth for the resulting index (a small fraction; the indexes on the
large tables are all on one or two integer keys), to not become IO 
bound. That's possible with a good hard drive (and most SSDs), but it's

closer than I'm fully comfortable with.

The reason I'm asking is because we're seeing customers running into
disproportionately slow indexing times at the end of long (~1h!)
capture runs; since the dataset size is the only thing that's
fundamentally different for longer runs, I started looking into it. I'm
well aware this is an extreme case in various ways though. I just wanted
to ask to make sure there wasn't a way I was missing!


Also, store your time as a numeric value (Unix epoch or Julian day) rather than 
a string.  Numbers are faster to handle than strings.  INTEGERs are faster to 
handle than REALs.


:-)

The time fields are 64-bit ints. The exact meaning depends on the
capture target platform (we support several) but usually it's either
a system-synchronized CPU clock cycle counter (when available) or
ticks for a high-frequency bus/event timer.

-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Richard Hipp
On Tue, Jun 10, 2014 at 2:22 PM, Fabian Giesen 
wrote:

>
> So, is there any way to create multiple indexes for the same table in one
> operation in SQLite? If not, how hard would it be to add?
>

Creating an index is essentially the same thing as sorting the whole table
- sorting in index order.  So you are going to have to sort the whole table
three times, once for each index.  The time to do the sort dominates.  The
time needed to scan the original table in order to pull out the elements is
usually negligible compared to the sorting time.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Nico Williams
On Tue, Jun 10, 2014 at 4:25 PM, Simon Slavin  wrote:
> On 10 Jun 2014, at 7:22pm, Fabian Giesen  wrote:
> Not possible in current SQLite.  No standard syntax for doing it in SQL.  A 
> quick google suggests that no other popular DBMS implements it.

If the CREATE INDEX statements are in a transaction then the RDBMS
could defer index creation until they are needed or a COMMIT is done,
then it could coalesce their execution plans to do a single scan.
Otherwise it can't be done at all indeed (since if not in a
transaction it's as if each CREATE INDEX were in its own transaction,
and that has to be durable (synchronous) when it completes.

> And ... I'm not sure it would be that much faster.  The majority of the work 
> it's doing is writing.  Reading the existing table data is fast and easy, but 
> writing new data and figuring out the pointers takes the majority of the 
> time.  Because of this I don't think it likely that this feature will be 
> added to standard SQLite.  Of course, if you really care about this you could 
> add it yourself.

How much faster it is depends on whether the table to be scanned fits
in memory, how much memory pressure there is, and so on.  So, yes, it
might improve performance, but probably not that much because yeah,
the limiting factor will be write bandwidth, and that will surely be
less than read bandwidth, so at most we're talking about an
improvement of 33%.  If you were creating 10 indexes then it'd be at
most 9/20s improvement.  Anything that approaches a 50% improvement is
not to be sneezed at, but given what a special case this is, I'd not
blame anyone for focusing on something better.

> To optimize this, when you create the table put the columns you're going to 
> index near the beginning of the rows.  SQLite stops reading the data of a 
> table row once it has all the columns it needs.  I don't know whether this 
> will make a lot of difference but it won't do any harm.

Well, no, SQLite reads page-by-page.  It may stop decoding data for
columns past the last one it needs, but that's different.

Also, let's look at the OP's indexes:

  CREATE INDEX samples_by_time ON samples(time);
  CREATE INDEX samples_by_name ON samples(name_id);
  CREATE INDEX samples_by_type ON samples(type_id);

I'm betting that type_id is a low-cardinality column.  It might be
best to try the skip-scan index optimization in recent releases and go
for just one or two indexes.

Assuming these are something like a time series, and assuming
skip-scan indexing, this might work:

CREATE INDEX samples_idx ON samples (time, name_id, type_id);

or maybe two:

CREATE INDEX samples_idx ON samples (time);
CREATE INDEX samples_idx ON samples (name_id, type_id);

> Also, store your time as a numeric value (Unix epoch or Julian day) rather 
> than a string.  Numbers are faster to handle than strings.  INTEGERs are 
> faster to handle than REALs.

I'm not so sure.  Number parsing involves multiplications.  This sort
of thing needs to be benchmarked in each use case.  Storing dates in
ISO 8601 format has its benefits anyways.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Simon Slavin

On 10 Jun 2014, at 7:22pm, Fabian Giesen  wrote:

>  1. Bulk insertion of relatively large amounts of data (low hundreds of 
> megabytes to tens of gigabytes); a typical profile generates on the order of 
> 10MB of sampling data per second.CREATE INDEX samples_by_time ON 
> samples(time);

I presume you do your INSERTs inside a transaction.  If not, you should.

>  CREATE INDEX samples_by_name ON samples(time);
>  CREATE INDEX samples_by_name ON samples(name_id);
>  CREATE INDEX samples_by_type ON samples(type_id);
> 
> Each of these statements ends up scanning over the entire table once. Since 
> the tables in question are quite large, I would love to have some way to 
> create multiple indexes in one pass.

Not possible in current SQLite.  No standard syntax for doing it in SQL.  A 
quick google suggests that no other popular DBMS implements it.

And ... I'm not sure it would be that much faster.  The majority of the work 
it's doing is writing.  Reading the existing table data is fast and easy, but 
writing new data and figuring out the pointers takes the majority of the time.  
Because of this I don't think it likely that this feature will be added to 
standard SQLite.  Of course, if you really care about this you could add it 
yourself.

To optimize this, when you create the table put the columns you're going to 
index near the beginning of the rows.  SQLite stops reading the data of a table 
row once it has all the columns it needs.  I don't know whether this will make 
a lot of difference but it won't do any harm.

Also, store your time as a numeric value (Unix epoch or Julian day) rather than 
a string.  Numbers are faster to handle than strings.  INTEGERs are faster to 
handle than REALs.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users