difference in scalability.
I'm not sure if this will affect page cache throughput specifically, but
these two things are both fairly easy to try.
-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo
in SQLite? If not, how hard would it be to add?
Thanks,
-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
On 6/10/2014 2:25 PM, Simon Slavin wrote:
On 10 Jun 2014, at 7:22pm, Fabian Giesen fabi...@radgametools.com wrote:
1. Bulk insertion of relatively large amounts of data (low hundreds of
megabytes to tens of gigabytes); a typical profile generates on the order of
10MB of sampling data per
Regards
Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Thanks. I think using GROUP BY without aggregates is a strange way to
remove duplicates, anyway.
Not intentional. SQLite simply fails to recognize that by using the GROUP
BY in descending order it could avoid the ORDER BY clause. This is an
optimization that we have never considered because it
that the query planner has accurate
information about your index.
Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
sqlite_stat1;
t||3
sqlite analyze t;
sqlite select * from sqlite_stat1;
t|i|3 1 1
Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Wouldn't
SELECT * FROM table1
EXCEPT
SELECT * FROM table2
solve this problem?
I think it does for the example provided. Not sure if it would work in
real-world environment.
-Ursprüngliche Nachricht-
Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org]
Im
You might want to check the following:
SELECT word FROM fts WHERE fts MATCH '^token'
Beginning with 3.7.9 this should only return records that have 'token' at the
beginning of the record.
See changelog of 3.7.9:
If a search token (on the right-hand side of the MATCH operator) in FTS4 begins
Ever since I started using FTS extensively, I frequently ran into this
limitation:
** TODO: Strangely, it is not possible to associate a column specifier
** with a quoted phrase, only with a single token. Not sure if this was
** an implementation artifact or an intentional decision when
2011/11/14 nobre rafael.ro...@novaprolink.com.br
Comment from the source:
** TODO: Strangely, it is not possible to associate a column specifier
** with a quoted phrase, only with a single token. Not sure if this was
** an implementation artifact or an intentional decision when fts3 was
When I have a basic FTS query that needs to be restricted to a column, I
can write it in two ways:
1.) WHERE column MATCH 'apple'
2.) WHERE table MATCH 'column:apple'
But when I have a phrase query, I can only write it in one way:
1.) WHERE column MATCH 'apple juice'
The problem is that when I
Suppose I have two tables, and I want to have look for a value in the first
table, but display the columns from the second table. The most obvious way
would be joining them on rowid. But I don't need to SELECT any columns from
the first table, and it's a FTS4 table (which always joins a bit slower
I'm trying to understand the following: Why is it that when I create a new
database, and fill it with rows, why can it not be commited to disk the
same way as VACUUM would do it? Currently I'm trying to manually vacuum the
database by inserting rows sorted by table and by rowid, but as soon as the
2011/11/10 Richard Hipp d...@sqlite.org
Because when you are inserting the Nth row, SQLite has no idea of how many
more rows will follow or how big the subsequent rows will be, so it has no
way to reserve contiguous space sufficient to hold them all. The result is
that parts of the table
I'm having an issue where inserts on an un-cached database are very slow.
The reason probably is that a large part of the existing index needs to be
read from disk, to be able to insert new rows to the index. The length of
the values in the indexed column are around 60 bytes, so I'm thinking about
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com
Hmmm...appears to be the same for this case which, I must say, I find
surprising.
Thanks for actually benchmarking it. I'm also a bit surprised, because I
always thought SQLite handled INTEGER more efficiently than TEXT.
I also did some
2011/11/9 Simon Slavin slav...@bigfraud.org
Didn't someone recently note that entering the first million records was
fast, but if he then closed and reopened the database, entering the next
100,000 records was slow ?
Yes, and there is still no real explanation for it, other than slow disk
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com
Are you sure you're using BEGIN/COMMIT on your transactions?
Yes
I just used my benchmark data and inserted another 100,000 rows into the
database in 2.3 seconds.
That is because you immediately insert those additional rows, after
2011/11/9 Luuk luu...@gmail.com
On 09-11-2011 17:23, Black, Michael (IS) wrote:
time sqlite3index1.sql
19.307u 0.082s 0:19.40 99.8%0+0k 0+0io 0pf+0w
time sqlite3 index2.sql
19.266u 0.092s 0:19.37 99.8%0+0k 0+0io 0pf+0w
$ time sqlite3 gendat1.sql
real0m21.094s
user
I'm in the process of converting a normal table to a FTS virtual table, and
I'm noticing different behaviour if I try to insert duplicate rowid's. I'm
using the 'INSERT OR IGNORE' statement, and on the normal table I can try
to insert millions of duplicates very quickly, but on the FTS table it's
2011/11/9 Nico Williams n...@cryptonector.com
What's wrong with reading the whole file into memory at boot time as a
way to prime the cache? Rebooting always takes some time, mostly the
time to read all sorts of files.
It's a desktop application, I cannot pre-cache anything before the user
2011/11/9 Nico Williams n...@cryptonector.com
I don't get it. You're reading practically the whole file in a random
manner, which is painfully slow, so why can't you read the file in one
fell swoop (i.e., sequential reads)??
I'm only reading the whole file when the number of additional
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com
OK...you're right...a reboot kills it.
I'm glad someone was able to reproduce this on Linux, ruling out the
possibility it's a Windows-issue.
However, reboot again and add select count(*) from a; as the first line
of gendat2.sql
So if a
2011/11/9 GB gbi...@web.de
Maybe you could try to use a pagesize that matches the size of a disk
allocation unit or memory page. For Windows since a typical NTFS partition
has a clustersize of 4KiB - which happens to also be the size of a memory
page - a pagesize of 4096 Byte seems to be a
Ever since I started using FTS, I'm always confronted with the problem that
I need two tables: one FTS table with the TEXT columns, and one normal
table with the INTEGER columns for numerical values. This causes all kinds
of problems (keeping the rowid's in sync, complex queries, etc.).
From a
It seems I finally have some sort of a solution:
As of SQLite version 3.7.9, FTS4 supports a new option - content
-designed to extend FTS4 to support the creation of full-text indexes
where:
+
+* The indexed documents are not stored within the SQLite database
+ at all (a contentless
, when commiting your transactions, thus slowing
down disk writes.
Fabian-40 wrote:
2011/11/2 Black, Michael (IS)
I do not use WAL, since I have turned 'journal_mode' off (to improve
insert
performance), and as far as I know WAL is only usefull when you need to
keep a journal?
I
I asked a similar question before, but received no response, so maybe it's
a very stupid question, but if so, feel free to say so.
I create a database, create one table with a TEXT column, insert 1 million
rows in 10 secs, create an index in 10 secs, VACUUM the database, and close
the database.
2011/11/2 Mr. Puneet Kishor punk.k...@gmail.com
Others will have better answers, but methinks that when you reboot the
computer, the operating system's caches are flushed out, which slows the
operation. Try working with the db for a bit (SELECT, repeat INSERTs, etc.)
and notice if the speed
2011/11/2 Mr. Puneet Kishor punk.k...@gmail.com
ahh, so you *are* getting expected behavior, just not what *you* expected.
Did you have a different number in mind instead of a factor of 300? And, if
so, why?
To read an (un-cached) 150mb file from disk, doesn't take 30 seconds. And
SQLite
2011/11/2 Paul Corke paul.co...@datatote.co.uk
A stateful antivirus that does lots of heavy processing when you first
open the file?
I'm running these tests on a completey clean Win7 install, without any
anti-virus (or other software) installed. I should note that it runs
virtualized (in
2011/11/2 Simon Slavin slav...@bigfraud.org
So compare your 'restart-and-INSERT' test with one where you restart, log
in, then do a few random operations for a minute: start your web browser
and load a page. Open and close a few documents or control panels. Sit
and do nothing for a minute.
2011/11/2 Black, Michael (IS)
Then, when you restart the file is on system disk and it is flushing each
insert to system disk on the WAL file slowing things down dramaticalliy.
I do not use WAL, since I have turned 'journal_mode' off (to improve insert
performance), and as far as I know WAL
2011/11/2 Nico Williams n...@cryptonector.com
But that's NOT what SQLite3 is doing. SQLite3 is doing random I/O.
And the OS sees the random I/O pattern and concludes it's better to
not read the whole file in. So for those 10K inserts you pay -worst
case- 10K I/Os. At ~12ms per random I/O
2011/11/2 Nico Williams n...@cryptonector.com
Incidentally, it pays to make the SQLite3 page size match the
filesystem preferred block size.
I already have the page_size set to 4096, which should match the default
NTFS cluster size.
But note that this can still fail you when the file is
2011/10/30 Black, Michael (IS) michael.bla...@ngc.com
#1 What version?
3.7.8, using System.Data.Sqlite wrapper
#2 How long to insert the 1M?
10 seconds
#3 What's the average string size?
55 characters
#5 How long to create the index?
10 seconds
#6 How long to insert the
I have a table with one TEXT column. I insert 1 million rows of short
strings, and then create an UNIQUE INDEX. The speed is very acceptable. Then
I insert another 10.000 short strings, and the performance is very bad, it
almosts take longer than inserting the initial million(!) rows to fill the
2011/10/29 Simon Slavin slav...@bigfraud.org
When you insert the 10,000 strings are you doing it inside a transaction ?
BEGIN TRANSACTION;
INSERT ...
INSERT ...
INSERT ...
COMMIT;
Yes, I use transactions, prepared statements, cache_size, journal_mode and
synchronous PRAGMA's, almost
I have a column with a normal INDEX, and I would like to turn it into an
UNIQUE index, but I'm a bit worried about the performance implications for
inserts. Can someone give some insight into how UNIQUE is implemented in
SQLite, does it create extra tables compared to a normale index, are there
I have two tables, both containing 1 million rows, which frequently need to
be joined by rowid. Right now, the insert loop is like this:
For I = 1 to 1000
INSERT INTO TABLE1 ...
INSERT INTO TABLE2 ...
Next
When I look at the structure of the created database-file, the rows for the
No, a UNIQUE index and a regular index are implemented the exact same way.
It's just that, at INSERT and UPDATE time, after finding a proper place to
insert the new value, an additional check is made that the place isn't
already occupied.
So the only overhead for UNIQUE is that extra
2011/10/23 Simon Slavin slav...@bigfraud.org
My immediate question is why this is two rows in two separate tables rather
than one row in one table. After all, if tables always have the same rows
in, they might as well be the same row in one table.
I would love to have those rows into a
2011/10/23 Simon Slavin slav...@bigfraud.org
In this example, the indexed column is a text column. The text fields
could have been very long, and checking long text fields for uniqueness can
involve comparing every byte. Nevertheless, I do not understand the results
you quoted. I wonder
2011/10/23 Simon Slavin slav...@bigfraud.org
In that case, try defragging your file sometime. May make a big
difference.
If you mean Windows defrag, it would be pointless, since it doesn't change
the database structure? If you mean VACUUM, it will generate the exact same
structure as
I have a very simple table: it just contains one single (text) column with
an index. This table contains million of hash-values, and because of the
index, SQLite is storing all the data twice. Behind the scenes it creates a
second table, containing all the same values in sorted order, causing the
Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT
INDEX column? I don't need many of the extra features of FTS, because I
always need to look up rows by prefix or exact match, and both can be
implemented efficiently via TEXT INDEX too. But if the overhead is
comparable,
D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
behalf of Fabian [fabianpi...@gmail.com
2011/10/19 Alexey Pechnikov pechni...@mobigroup.ru
FTS use index multi-tree and de-facto has _no_ insert speed degradation.
Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that
same multi-tree mechanism for regular indexes, but that's a whole different
question.
2011/10/19 Scott Hess sh...@google.com
To be clear, how it works is that new insertions are batched into a
new index tree, with index trees periodically aggregated to keep
selection efficient and to keep the size contained. So while the
speed per insert should remain pretty stable constant,
Using the default tokenizer, everything that is not an alphanumeric
character or an underscore, will generate a new token.
I have a lot of columns that contains e-mail addresses or URL's, and most of
them have characters like '.', '@' and '/'. Is there a simple way to make
FTS see them as one
I'm working on a pagination system where two tables need to be joined. At
first my query looked like this:
SELECT table1.data1, table2.data2
FROM table1
JOIN table2 ON table1.rowid = table2.rowid
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
OFFSET 0 LIMIT 250
I got very good advice on this
Assuming that is a typo, does the following not do what you want?
SELECT table1.data1, table2.data2
FROM
(
SELECT table1.rowid, table1.data1
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
OFFSET 0 LIMIT 250
) AS table1
JOIN table2
ON table1.rowid = table2.rowid
ORDER BY
2011/10/19 Fabian fabianpi...@gmail.com
Maybe there is just no easy way to do what I want, I will take a good night
sleep about it ;)
I think the correct query would be:
SELECT table1.data1, table2.data2
FROM
(
SELECT table1.rowid, table1.data1
WHERE table1.data1 = 10
) AS table1
JOIN
2011/10/16 Frank Missel i...@missel.sg
But it sounds a bit like Fabian both wants to have the total number of
records available and at the same time limit the count.
No, I only want to have a capped total available.
If I would go with Simons solution, I have to read the rows for the first
How can you limit a count-query? I tried:
SELECT COUNT(*) FROM table LIMIT 5000
But it ignores the LIMIT clause. I think the workaround would be counting
the results of a sub-query, but I'm trying to understand whats wrong with
the syntax above. The goal is to make the engine stop iterating
2011/10/16 Frank Missel i...@missel.sg
What do you want to attain with the count?
I want to allow users to paginate through a result set. The pages are
retreived through LIMIT/OFFSET, but to calculate the total number of pages,
I have execute a separate COUNT() query (without LIMIT) once.
Exactly.
I still don't have optimal performance in the query (although it's much
better now), and it seems to be related to ORDER BY.
When I execute:
SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50
It's very fast, but it's get much slower (10 times) when I add an ORDER BY
2011/10/14 Petite Abeille petite.abei...@gmail.com
Much? Really? I get the broadly same execution time for either variant:
0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows)
0|0|0|SCAN TABLE mail_header (~2192503 rows)
I get
SELECT mail_header.rowid
FROM mail_header
2011/10/14 Simon Slavin slav...@bigfraud.org
If you have an index on the data column then that's the index SQLite would
used for that query. Once it has picked that index it no longer has access
to the rowid index.
CREATE INDEX tdr ON table (data, rowid)
Thanks! That explains everything!
2011/10/14 Petite Abeille petite.abei...@gmail.com
Hurray! Now you must have the finest query ever to grace the intraweb! A
true work of beauty :))
Here in Belgium we have a saying: Who doesnt honor the petite, is not worth
the big ;)
___
2011/10/14 Dan Kennedy danielk1...@gmail.com
If SQLite cannot use an index to for an ORDER BY in a SELECT
query and has to do an external sort, the EXPLAIN QUERY PLAN
output will have something like this in it:
0|0|0|USE TEMP B-TREE FOR ORDER BY
Since my EXPLAIN does not show this, what
2011/10/14 Dan Kennedy danielk1...@gmail.com
Good question. Can you enter the following commands into
the shell tool and post the complete output (no QUERY PLAN
this time):
.version
.schema
.explain
EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
50;
2011/10/14 Dan Kennedy danielk1...@gmail.com
Your EXPLAIN output shows that it is doing a linear scan of
table1. Which is different from what I get here with the same
schema and query. When I run them here, both queries (with and
without the ORDER BY rowid) use the same query plan.
Do you
Do you have an sqlite_stat1 table in the database (created by
running ANALYZE)? What is the output of the shell command
How about the contents of the sqlite_stat1 table? What does
the shell command .dump sqlite_stat1 show?
This is the output with a fresh database, where ANALYZE hasn't been
2011/10/14 Simon Slavin slav...@bigfraud.org
So that should never happen, right ? ANALYZE is meant to make things
faster, not slower. So is that an actual fixable bug or is it one of those
extremely unlikely situations that is hard to fix ?
So, OP, try issuing these commands
DROP TABLE
2011/10/12 Petite Abeille petite.abei...@gmail.com
Now the join is performed only 250 times, adding just a small overhead
compare the the bare bone query without the join.
The short of it: minimize the amount of work upfront :)
Thank you very much! This approach solved the problem.
I have two tables, one normal table (table1), and one FTS4 table (table2).
The ROWID's between the tables are in sync. I often run queries on table1,
but every row returned must also include a value from table2 (data2), so I
create a query like this:
SELECT table1.data1, table2.data2
FROM table1
I have three types columns in my table, which I want to move to a FTS4
table, but I'm not sure whether that would make sense.
1.) This column contains hash-values. I always need to lookup 100% exact
matches. Normally you would store them with 'TEXT INDEX' in a standard
table. Would there be any
How would I optimize the above queries, to take advantage of the
LIMIT/OFFSET values, making them suitable for fast pagination?
Are you sure the issue is with the fts table? Or is it the order by desc?
If I omit the FTS table in the query, like so:
SELECT table1.data1
FROM table1
WHERE
2011/10/12 Petite Abeille petite.abei...@gmail.com
In any case, you can easily refactor the query into two steps:
(1) First get your pagination
(2) Then join to your fts
I tried implementing your refactoring, but immediately run into a problem.
The user must be able to sort the data on any
2011/10/12 Dan Kennedy danielk1...@gmail.com
Are you using 3.7.7 or newer? And specifying the order=DESC
option when creating the FTS4 table? If not, it might be a big
help in this case.
http://www.sqlite.org/fts3.**html#fts4orderhttp://www.sqlite.org/fts3.html#fts4order
Yes I'm using
2011/10/12 Igor Tandetnik itandet...@mvps.org
See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor
Thanks! Very interesting! I already was aware that using a large OFFSET
could potentially be slow, because SQLite internally reads all preceding
rows, and just discards them.
2011/10/12 Igor Tandetnik itandet...@mvps.org
See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor
I tried to implement the method as suggested in the article, but it will
only work for pagination where the user is only allowed to go 1 page back or
1 page forward (since you
2011/10/12 Dan Kennedy danielk1...@gmail.com
Can we see the output of EXPLAIN for this query?
Without selecting table2.data2:
0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
Including selecting table2.data2:
0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3
2011/10/12 Dan Kennedy danielk1...@gmail.com
(INDEX 1) == lookup by rowid.
Okay! Then maybe my problem is unrelated to FTS, I will create a new
question.
___
sqlite-users mailing list
sqlite-users@sqlite.org
Why is this very fast (20 ms):
SELECT table1.data1, table1.data2
FROM table1
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 50
0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
And this very slow (3500ms):
SELECT
that is
that my application executes some queries on the table
from time to time.
Please help,
Fabian
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
78 matches
Mail list logo