Re: [sqlite] Partial Index on "~" malfunctions when used with likely/unlikely
Thanks for the explanation and the quick fix! Best, Manuel On Sat, May 4, 2019 at 7:41 PM Richard Hipp wrote: > Here is another case: > > CREATE TABLE t1(a,b,c); > INSERT INTO t1 VALUES(NULL,8,'yes'); > CREATE INDEX t1b ON t1(b) WHERE a IS NOT NULL; > SELECT c FROM t1 WHERE b=8 AND (a OR 1); > > The problem was in the theorem prover that determines when a partial > index can be used. The problem goes all the way back to the initial > introduction of partial indexes in SQLite version 3.8.0 (2013-08-26). > The theorem prover was (incorrectly) assuming that if the expression > "a OR 1" is true, then "a IS NOT NULL" must also be true. And that > assumption is correct for most binary operators - just not for OR. > Fixed now. > > On 5/4/19, Manuel Rigger wrote: > > This similar test case, that I just found now, demonstrates that this > could > > be a pattern that is used in practice (TRUE can also be computed): > > > > CREATE TABLE t0 (c0); > > CREATE INDEX index_0 ON t0(c0) WHERE c0 NOTNULL; > > INSERT INTO t0(c0) VALUES (NULL); > > SELECT * FROM t0 WHERE (c0 OR TRUE); > > > > Also here, the row is not fetched. > > > > Best, > > Manuel > > > > On Sat, May 4, 2019 at 3:45 PM Manuel Rigger > > wrote: > > > >> Hi, > >> > >> I discovered a bug, which is demonstrated through the following test > case: > >> > >> CREATE TABLE t0(c0); > >> CREATE INDEX index_0 ON t0(c0) WHERE (~c0) NOT NULL; > >> INSERT INTO t0(c0) VALUES (NULL); > >> SELECT * FROM t0 WHERE (LIKELY(~c0) OR TRUE); > >> > >> No row is fetched, although the WHERE clause is always TRUE. I could > >> reproduce this bug only when creating a partial index, and when using > >> either LIKELY or UNLIKELY. The datatype of the c0 column seems to > >> irrelevant. PRAGMA integrity_check; and REINDEX could not detect this > >> error. > >> > >> Best, > >> Manuel > >> > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial Index on "~" malfunctions when used with likely/unlikely
Here is another case: CREATE TABLE t1(a,b,c); INSERT INTO t1 VALUES(NULL,8,'yes'); CREATE INDEX t1b ON t1(b) WHERE a IS NOT NULL; SELECT c FROM t1 WHERE b=8 AND (a OR 1); The problem was in the theorem prover that determines when a partial index can be used. The problem goes all the way back to the initial introduction of partial indexes in SQLite version 3.8.0 (2013-08-26). The theorem prover was (incorrectly) assuming that if the expression "a OR 1" is true, then "a IS NOT NULL" must also be true. And that assumption is correct for most binary operators - just not for OR. Fixed now. On 5/4/19, Manuel Rigger wrote: > This similar test case, that I just found now, demonstrates that this could > be a pattern that is used in practice (TRUE can also be computed): > > CREATE TABLE t0 (c0); > CREATE INDEX index_0 ON t0(c0) WHERE c0 NOTNULL; > INSERT INTO t0(c0) VALUES (NULL); > SELECT * FROM t0 WHERE (c0 OR TRUE); > > Also here, the row is not fetched. > > Best, > Manuel > > On Sat, May 4, 2019 at 3:45 PM Manuel Rigger > wrote: > >> Hi, >> >> I discovered a bug, which is demonstrated through the following test case: >> >> CREATE TABLE t0(c0); >> CREATE INDEX index_0 ON t0(c0) WHERE (~c0) NOT NULL; >> INSERT INTO t0(c0) VALUES (NULL); >> SELECT * FROM t0 WHERE (LIKELY(~c0) OR TRUE); >> >> No row is fetched, although the WHERE clause is always TRUE. I could >> reproduce this bug only when creating a partial index, and when using >> either LIKELY or UNLIKELY. The datatype of the c0 column seems to >> irrelevant. PRAGMA integrity_check; and REINDEX could not detect this >> error. >> >> Best, >> Manuel >> > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial Index on "~" malfunctions when used with likely/unlikely
This similar test case, that I just found now, demonstrates that this could be a pattern that is used in practice (TRUE can also be computed): CREATE TABLE t0 (c0); CREATE INDEX index_0 ON t0(c0) WHERE c0 NOTNULL; INSERT INTO t0(c0) VALUES (NULL); SELECT * FROM t0 WHERE (c0 OR TRUE); Also here, the row is not fetched. Best, Manuel On Sat, May 4, 2019 at 3:45 PM Manuel Rigger wrote: > Hi, > > I discovered a bug, which is demonstrated through the following test case: > > CREATE TABLE t0(c0); > CREATE INDEX index_0 ON t0(c0) WHERE (~c0) NOT NULL; > INSERT INTO t0(c0) VALUES (NULL); > SELECT * FROM t0 WHERE (LIKELY(~c0) OR TRUE); > > No row is fetched, although the WHERE clause is always TRUE. I could > reproduce this bug only when creating a partial index, and when using > either LIKELY or UNLIKELY. The datatype of the c0 column seems to > irrelevant. PRAGMA integrity_check; and REINDEX could not detect this error. > > Best, > Manuel > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial Index and Query Planner
On 8/5/16, Detlef Golze wrote: > As documented, the query planner only uses the index if the SELECT contains > that exact same condition. > > According to .eqp with 3.12.1 the following statement indeed uses the index: > > SELECT Value1 FROM MyTable WHERE Value1<>0 AND Value1=7; > --EQP-- 0,0,0,SEARCH TABLE MyTable USING COVERING INDEX MyIndex1 (Value1=?) > > Can I safely assume that this 'workaround' works with all future versions? Yes. In a future release of SQLite, the theorem prover inside the query planner that determines when it is acceptable to use a partial index might grow smarter, and realize that value1=7 implies value1<>0. At that point the extra value1<>0 term in the WHERE clause will become superfluous. But it will not cause problems. The partial index will continue to be used. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial index to find maximum
Hick Gunter wrote: >create the primary key index ordered properly > >CREATE TABLE t (..., PRIMARY KEY ( a ASC, b DESC)...); DESC is not necessary here; SQLite has no problem reading the index in reverse order, if needed. (DESC in an index is useful only when you want to optimize multiple ORDER BYs in a single query.) >SELECT b FROM t WHERE a = ? LIMIT 1; This is wrong with or without a DESC index: there is no guarantee that the returned row is the first in the index unless you're using ORDER BY or MAX(). Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial index to find maximum
create the primary key index ordered properly CREATE TABLE t (..., PRIMARY KEY ( a ASC, b DESC)...); SELECT b FROM t WHERE a = ? LIMIT 1; If you insist on using a partial index for this (for example if each a has a lot of b entries) you could add a field b_is_max and keep it current using triggers. CREATE INDEX t_max ON t (a,b) WHERE b_is_max; SELECT b FROM t WHERE a=? AND b_is_max; CREATE TRIGGER t_ins AFTER INSERT ON t WHEN new.b > (SELECT b FROM t WHERE a = new.a AND b_is_max) BEGIN UPDATE t SET b_is_max = 0 WHERE a = new.a AND b_is_max; UPDATE t SET b_is_max = 1 WHERE a = new.a AND b = new.b; END; CREATE TRIGGER t_del BEFORE DELETE ON t WHEN old. b_is_max) BEGIN UPDATE t SET b_is_max = 0 WHERE a = old.a AND b = old.b); UPDATE t SET b_is_max = 1 WHERE a = old.a AND b = (SELECT max(b) FROM t WHERE a = old.a and b < old.b); END; CREATE TRIGGER t_upd_new AFTER UPDATE ON t WHEN new.b > (SELECT b FROM t WHERE a = new.a AND b_is_max) BEGIN UPDATE t SET b_is_max = 0 WHERE a = new.a AND b_is_max; UPDATE t SET b_is_max = 1 WHERE a = new.a AND b = new.b; END; CREATE TRIGGER t_upd_old BEFORE UPDATE ON t WHEN old.b_is_max) BEGIN UPDATE t SET b_is_max = 0 WHERE a = old.a AND b = old.b); UPDATE t SET b_is_max = 1 WHERE a = old.a AND b =(SELECT max(b) FROM t WHERE a = old.a and b < old.b); END; Triggers shown are untested. -Ursprüngliche Nachricht- Von: Baruch Burstein [mailto:bmburst...@gmail.com] Gesendet: Montag, 29. Dezember 2014 09:34 An: General Discussion of SQLite Database Betreff: [sqlite] Partial index to find maximum Hi, I have a table with a 2 column PK, say 'a' and 'b'. I need to find, for a given value of 'a', the highest matching 'b'. The query itself it simple: SELECT max(b) FROM t WHERE a=:whatever To speed this up, I would add an index on 'a'. Now, the question is is there some way to tell the index that I am only interested in the maximum value of b? For example, for the following table: a|b 1|1 1|2 2|2 2|3 I only need the index to contain the rows (1,2) and (2,3). The docs for partial indexes say that they can't contain functions (like max()). Any suggestions? -- ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Partial index to find maximum
You should include both a and b in the index to be most helpful. CREATE INDEX whatever ON t (a, b); However, you say that (a, b) is already the primary key and therefore this index already exists and you do not need to create another one. Although the index will contain all rows, finding the max(b) given an "a" value should take just 1 index lookup. If the optimizer does not do this for you (I believe it does) then you can just do: select b from t where a=:whatever order by b desc limit 1; to find the answer you seek with nothing more than a single lookup in the index. So lets test what SQLite is going to do: On SQLite 3.8.8 (head of trunk) we see that: SQLite version 3.8.8 2014-12-25 12:19:56 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table t(a INTEGER, b INTEGER, primary key (a, b)); sqlite> insert into t values (1,1),(1,5),(2,3),(1,8),(2,7); sqlite> .explain sqlite> explain select max(b) from t where a=1; addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 21000 Start at 21 1 Null 0 1 200 r[1..2]=NULL 2 OpenRead 1 3 0 k(3,nil,nil,nil) 00 root=3 iDb=0; sqlite_autoindex_t_1 3 Explain0 0 0 SEARCH TABLE t USING COVERING INDEX sqlite_autoindex_t_1 (a=?) 00 4 Noop 0 0 000 Begin WHERE-loop0: t 5 Integer1 3 000 r[3]=1 6 SeekLE 1 153 1 00 key=r[3] 7 IdxLT 1 153 1 00 key=r[3] 8 Noop 0 0 000 Begin WHERE-core 9 Column 1 1 400 r[4]=t.b 10 CollSeq0 0 0 (BINARY) 00 11 AggStep0 4 1 max(1) 01 accum=r[1] step(r[4]) 12 Goto 0 16000 max() by index 13 Noop 0 0 000 End WHERE-core 14Prev 1 7 000 15Noop 0 0 000 End WHERE-loop0: t 16Close 1 0 000 17AggFinal 1 1 0 max(1) 00 accum=r[1] N=1 18Copy 1 5 000 r[5]=r[1] 19ResultRow 5 1 000 output=r[5] 20Halt 0 0 000 21Transaction0 0 1 0 01 22TableLock 0 2 0 t 00 iDb=0 root=2 write=0 23Goto 0 1 000 sqlite> explain select b from t where a=1 order by b desc limit 1; addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 18000 Start at 18 1 Noop 0 0 000 2 Integer1 1 000 r[1]=1; LIMIT counter 3 OpenRead 2 3 0 k(3,nil,nil,nil) 00 root=3 iDb=0; sqlite_autoindex_t_1 4 Explain0 0 0 SEARCH TABLE t USING COVERING INDEX sqlite_autoindex_t_1 (a=?) 00 5 Noop 0 0 000 Begin WHERE-loop0: t 6 Integer1 2 000 r[2]=1 7 SeekLE 2 152 1 00 key=r[2] 8 IdxLT 2 152 1 00 key=r[2] 9 Noop 0 0 000 Begin WHERE-core 10 Column 2 1 300 r[3]=t.b 11 ResultRow 3 1 000 output=r[3] 12 IfZero 1 16-1 00 r[1]+=-1, if r[1]==0 goto 16 13 Noop 0 0 000 End WHERE-core 14Prev 2 8 000 15Noop 0 0 000 End WHERE-loop0: t 16Close 2 0 000 17Halt 0 0 000 18Transaction0 0 1 0 01 19TableLock 0 2 0 t 00 iDb=0 root=2 write=0 20Goto 0 1 000 sqlite> So both forms of the query will run mostly the same plan and retrieve the result you desire in a single index lookup. --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-boun...@
Re: [sqlite] Partial index to find maximum
On 29/12/2014 4:33 PM, Baruch Burstein wrote: Hi, I have a table with a 2 column PK, say 'a' and 'b'. I need to find, for a given value of 'a', the highest matching 'b'. The query itself it simple: SELECT max(b) FROM t WHERE a=:whatever To speed this up, I would add an index on 'a'. Now, the question is is there some way to tell the index that I am only interested in the maximum value of b? For example, for the following table: a|b 1|1 1|2 2|2 2|3 I only need the index to contain the rows (1,2) and (2,3). The docs for partial indexes say that they can't contain functions (like max()). Any suggestions? I don't know if you can do a partial index for this, but if this is a use case that is slow for you and is very critical, I would use a trigger to update a separate table that stores just the value of "a" and its corresponding maximum. It's quite easy to manage this for the create and insert cases. You may have to run that whole query in the trigger again if there a deletion done. So, in a way this table is the cache you want. Not sure if that helps your case. Best Regards, Mohit. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 20.08.2010 00:18, Eric Smith wrote: > Our app is caching what are basically csv files. Hundreds of files, > about 2m records per file. Sometimes we want to delete all the cache > rows for one of the files. We know ahead of time which file it will > be -- let's say it's file 7. > > The schema is roughly > > create table records(__recno INTEGER PRIMARY KEY, fileId, data); > > So sometimes we would *like* to say "delete from records where > fileId=7". > > But that is bad because does a full table scan. > > So the next cut is to say "create index recordsIdxFileId on > records(fileId)". > > But that is bad because it is a huge burden during INSERT and is not > used often enough (or with enough values) to justify its existence. > > What I really want is to be able to say "create index > recordsIdxFileIdOnFile3 on records(fileId) where fileId=7". But > sqlite doesn't do that. I believe you can do this with an index helper table managed by triggers on your main table. Use the insert trigger's when clause to control which records to "index". The delete statement is obviously a little more complex than an ordinary delete, but "explain query plan" shows that it uses indexes instead of a full table scan so it should run quite fast. Example SQL follows below. Ralf drop table if exists records; create table records (id INTEGER PRIMARY KEY, fileid, value); drop table if exists records_idx; create table records_idx (id INTEGER PRIMARY KEY, fileid); create index records_idx_idx on records_idx (fileid); create trigger records_insert_after after insert on records when new.fileid = 7 begin insert into records_idx values (new.rowid, new.fileid); end; create trigger records_delete after delete on records begin delete from records_idx where id = old.rowid; end; insert into records (fileid, value) values (1, 'one'); insert into records (fileid, value) values (7, 'seven 1'); insert into records (fileid, value) values (7, 'seven 2'); insert into records (fileid, value) values (8, 'eight'); select * from records_idx; delete from records where rowid in (select rowid from records_idx where fileid = 7); select * from records; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
OK. Thanks for the clarification, Igor and Filip. I was misunderstanding the partial index to work, in effect, like a standard index on a virtual column based on a function that returns null for the "irrelevant" values, with the index defined to ignore nulls. I see now from the "inclusion" test described in the paper that the partial index will not be used if the query itself does not contain the same set of conditions that were used to define the index. That makes the partial index safe, not the trouble I was envisioning. Regards Tim Romano On Fri, Aug 20, 2010 at 8:01 AM, Igor Tandetnik wrote: > Tim Romano wrote: > > Igor, > > Here's the example where a partial index can "hide" rows. > > > > From the wikipedia article cited by the OP: > > > > > > It is not necessary that the condition be the same as the index > criterion; > > Stonebraker's paper below presents a number of examples with indexes > similar > > to the following: > > > > create index partial_salary on employee(age) where salary > 2100; > > > > > > > > What would happen if you issued these queries? > > > > select max(age) from employee > > select avg(age) from employee > > > > Would the ages of employees earning <= 2100 be included? > > Of course. The presence or absence of an index never affect the meaning of > a query - just its performance. > > > Is the > > partial-index used under those circumstances? > > No, I don't see how it could be beneficial for these queries. > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
> Maybe it'll be clearer if I describe my (quite simple) use case. Our > app is caching what are basically csv files. Hundreds of files, about 2m > records per file. Sometimes we want to delete all the cache rows for one > of the files. We know ahead of time which file it will be -- let's say > it's file 7. > > The schema is roughly > > create table records(__recno INTEGER PRIMARY KEY, fileId, data); > > So sometimes we would *like* to say "delete from records where > fileId=7". > > But that is bad because does a full table scan. > > If your csv inserts are made as a whole, what about keeping track of rowid ranges (rowidfrom..rowitto) for every csv insert in a separate table? You can be sure it will be a new range not intersecting with any previous if rowid is AUTOINCREMENT (or you can provide your own id starting current max(id) to be absolutely sure). In this case you will get fast deletes and you no longer need an extra indexed field. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 19 Aug 2010, at 11:18pm, Eric Smith wrote: > The schema is roughly > > create table records(__recno INTEGER PRIMARY KEY, fileId, data); > > So sometimes we would *like* to say "delete from records where > fileId=7". > > But that is bad because does a full table scan. > > So the next cut is to say "create index recordsIdxFileId on > records(fileId)". > > But that is bad because it is a huge burden during INSERT and is not > used often enough (or with enough values) to justify its existence. > > What I really want is to be able to say "create index > recordsIdxFileIdOnFile3 on records(fileId) where fileId=7". > But sqlite doesn't do that. But in order to create that partial index, SQLite would have to scan the entire table anyway. Otherwise how would it know which records had fileId=7 ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Simon Slavin wrote: > http://www.sqlite.org/lang_createview.html > > This is the SQL standard way to reduce your view of a table to just > certain rows. If I understand your request, this feature should provide > exactly what you want. Appropriate indexes will be used when consulting > any VIEW you've defined. I don't think that helps either. A view in sqlite is just syntactic sugar for a select statement. I don't want to define any real indices -- they are a performance burden. > > Something like DELETE FROM records WHERE __recno IN (SELECT __recno > > FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records. > > I don't understand what you're looking up here. If you have some > method of recognising which rows of a table should be deleted just use the > appropriate > > DELETE FROM ... WHERE ... > > command. No need for any sub-SELECT clause. Maybe it'll be clearer if I describe my (quite simple) use case. Our app is caching what are basically csv files. Hundreds of files, about 2m records per file. Sometimes we want to delete all the cache rows for one of the files. We know ahead of time which file it will be -- let's say it's file 7. The schema is roughly create table records(__recno INTEGER PRIMARY KEY, fileId, data); So sometimes we would *like* to say "delete from records where fileId=7". But that is bad because does a full table scan. So the next cut is to say "create index recordsIdxFileId on records(fileId)". But that is bad because it is a huge burden during INSERT and is not used often enough (or with enough values) to justify its existence. What I really want is to be able to say "create index recordsIdxFileIdOnFile3 on records(fileId) where fileId=7". But sqlite doesn't do that. So let's assume SQLite is much faster at deleting rows by the INTEGER PRIMARY KEY than it is by deleting rows by some other value. Then we can optimize by keeping track of which __recnos we will want to delete. Hence my idea for having a separate table that maps fileId --> __recno, but only for the fileId we care about. Eric -- Eric A. Smith Keeping Young #6: Don't look back. Something might be gaining on you. -- Satchel Paige ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Kees Nuyt wrote: > You could add a "deleted" column with value range (0,1) and > create an index on it if benchmarks show that makes it > faster. As a bonus it is easier to code and maintain than a > separate table with references and triggers. > > Alternatively, you can create an composite index with the > "deleted" column as one of the components. > > From a theoretical view, if you care about the visibility of > a row, you should express it as an attribute of the entity. > The solutions above comply with that notion. I think you misunderstand what I want. I don't care about keeping the row around after it is deleted. I don't care about visibility. I only want to find rows quickly in order to delete them. -- Eric A. Smith Bowlikinetics (boh lih kih neh' tiks), n.: The act of trying to control a released bowling ball by twisting one's body in the direction one wants it to go. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Sorry, let me amend that: > The schema is roughly > > create table records(__recno INTEGER PRIMARY KEY, fileId, data); Forget the INTEGER PRIMARY KEY. My partial index would reference the _rowid_. I don't permit vacuums on the database so, if I'm not mistaken, this shouldn't be an issue. -- Eric A. Smith Parkinson's Fourth Law: The number of people in any working group tends to increase regardless of the amount of work to be done. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Stephen Oberholtzer wrote: > I believe what he's getting at is this: {snip explanation} You exactly understand what I'm going for and my use case. Is there a better way to implement it in sql itself than what I outlined? I.e. create my own index table that points to the proper rows and keep it updated via triggers? Eric -- Eric A. Smith Gnagloot, n.: A person who leaves all his ski passes on his jacket just to impress people. -- Rich Hall, "Sniglets" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Igor Tandetnik wrote: > > How would you find a row whose column X contained value Y if the > > "partial" index on column X specified that rows containing value Y > > in column X should never be returned? > > No one suggests partial index should be capable of hiding anything. The > idea is that, when the query can be proven to only involve rows covered > by the partial index, the index can be used to speed up the query. > Otherwise, it simply won't be used. Right. -- Eric A. Smith Well then, let's go on. Sorry, there's nothing to go on to. Let's digress. -- Tim Pulju, Winter 2005 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Fri, Aug 20, 2010 at 06:52:40AM -0700, Cory Nelson scratched on the wall: > On Thu, Aug 19, 2010 at 1:30 PM, Eric Smith wrote: > > Afaict sqlite doesn't support indices on subsets of rows in a table, ?? > > la http://en.wikipedia.org/wiki/Partial_index -- right? > > > > Any plans to implement that? > > +1 for this feature request. They've got a very specific and fairly > rare use case, but when opportunity strikes partial indexes are much > more convenient, straightforward, and efficient than the alternative. This strikes me as more trouble than it is worth. You admit it has a very specific and rare use case, yet implementing it is going to add complexity and size to the query processor and optimizer. This is not a "low cost" feature. Any time a partial index comes into play, the optimizer needs to determine if the index can be used or not by verifying that the query domain is a subset of the index domain, based off the conditions on the query and on the partial index. This may or may not involve some significant logic (or it will tend to be rather dumb). It requires noticeable amounts of development and testing time-- especially testing, as it introduces the possibility of returning a looks-correct-but-is-wrong answer. And, because you're talking about modifications to a very central part of the database engine, adding complexity adds upkeep costs from here on out. It is also likely you would need to bump the file format, since you now need the ability to attach conditions to an index and flag it as partial and unusable for the general case. I have no doubt that it is useful for those cases where it applies, but that seems like a high costs for such a narrow gain. > > Are there any known hacks to implement something similar? > > There's not really any good solution. You can create a separate table > with the subset in it, that's pretty much it. Right. Create a shadow table that has the fields you want to partially index. Map the ROWID values one-to-one. Index the shadow table (or not, depending on your needs). Put triggers on the main table to keep the shadow table up to date on INSERT/UPDATE/DELETEs. If you index the shadow table, you now have a partial index. You have to make the call with each query about using the partial index or not, and you have to manually add an extra JOIN into your query, but for most of the situations described in the tread so far, that doesn't seem that difficult. Performance-wise, queries should be the same as a native partial index, since the ROWID value can be pulled directly from the index and mapped back to the main table, exactly like a native index. > But that has some overhead too, and can complicate your queries. The complexity is there no matter what. The difference is if the person writing the query sees it or not. My personal opinion is that this is a unique enough feature, and workarounds exist (even if they aren't exactly pretty), that it does not justify the long-term testing and upkeep costs. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 8/20/10, Cory Nelson wrote: > On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson wrote: >> ... >> The best I could come up with is a separate table. The problem is, >> indexing the SHA1 normally means there is a copy in the row and a copy >> in the index. Using a separate table, which still has to be indexed, >> means there is a copy in the row of the main table, a copy in the >> separate table, and a copy in the separate table's index. >> > > You might want to index the "sha" column on the primary table, then > the secondary table can just be indexedblocks(blockid INTEGER PRIMARY > KEY). No wasted space that way, but will require more I/Os to JOIN > the tables. In my case, the point of a partial index would be to save space. If the sha column were indexed completely in the primary table, no other tables or indexes are necessary, but then no space is saved. If I only want to index 80% of the rows, I'd save 20*.2n bytes with a partial index, where n is the number of rows. (SHA1 is 20 bytes) Using a separate table, I'd need 40 bytes (roughly) for each sha indexed: 20 bytes for the main table, 20 for the index; or 40m for m entries. This saves space if half or fewer of all blocks are indexed. If more than than half are indexed, indexing all rows in the main table uses less space. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson wrote: > ... > The best I could come up with is a separate table. The problem is, > indexing the SHA1 normally means there is a copy in the row and a copy > in the index. Using a separate table, which still has to be indexed, > means there is a copy in the row of the main table, a copy in the > separate table, and a copy in the separate table's index. > You might want to index the "sha" column on the primary table, then the secondary table can just be indexedblocks(blockid INTEGER PRIMARY KEY). No wasted space that way, but will require more I/Os to JOIN the tables. > > I guess one way to do this using the SQL partial index feature being > discussed would be to have an extra column in the row called > "isindexed", setting that to 1 or 0 depending on whether the SHA > should be included in the index, and using create index ... where > isindexed=1. Then on the queries, say "select blockid from blocks > where isindexed=1 and sha=X". And any query that didn't have > isindexed=1 wouldn't be able to use the index at all. > > Is that how it would work? Yes, that's exactly how it works. Partial indexes are best used when: - You only need to search a specific subset of your rows. "isindexed" is a good candidate for this. - You have no other indexes that can already narrow down searches efficiently. - You have existing queries that use rows from both inside and outside of the partial index. If you don't, then there's no reason to group them together and you might as well put the subset in a separate table if it's not too inconvenient. - The column is used outside of the index, or the storage used by it is acceptable. Ie. if 99% of your rows have "isindexed=1", you've got a lot of redundant columns taking up space and might investigate JOINing with another smaller table instead. -- Cory Nelson http://int64.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 8/20/10, Cory Nelson wrote: > +1 for this feature request. They've got a very specific and fairly > rare use case, but when opportunity strikes partial indexes are much > more convenient, straightforward, and efficient than the alternative. > > - If a table has 100,000,000 rows and each index page for it holds > 100 rows, a full index will need 1,000,000 pages of storage and 4 page > reads to search. > - With a partial index covering only 10,000 rows of the same table, > it will only need 100 pages of storage and 2 page reads to search. > > Big improvement while keeping query syntax and results exactly the same! > >> Are there any known hacks to implement something similar? > > There's not really any good solution. You can create a separate table > with the subset in it, that's pretty much it. But that has some > overhead too, and can complicate your queries. A while back I needed something like a partial index. The backup program I'm working on does block dedup, so I made an index on each block's SHA1. But then I wanted to only dedup some blocks, not all. I needed to say "insert this block row and also index the sha in the sha index", or "insert this block row but don't index the sha in the sha index". Then I wanted to be able to say "is X in the sha index?", ie, only look in the index. SQL doesn't really have a way to do that. It's a simple concept, but it doesn't seem to fit within the SQL framework. The best I could come up with is a separate table. The problem is, indexing the SHA1 normally means there is a copy in the row and a copy in the index. Using a separate table, which still has to be indexed, means there is a copy in the row of the main table, a copy in the separate table, and a copy in the separate table's index. I guess one way to do this using the SQL partial index feature being discussed would be to have an extra column in the row called "isindexed", setting that to 1 or 0 depending on whether the SHA should be included in the index, and using create index ... where isindexed=1. Then on the queries, say "select blockid from blocks where isindexed=1 and sha=X". And any query that didn't have isindexed=1 wouldn't be able to use the index at all. Is that how it would work? Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Thu, Aug 19, 2010 at 1:30 PM, Eric Smith wrote: > Afaict sqlite doesn't support indices on subsets of rows in a table, Ю > la http://en.wikipedia.org/wiki/Partial_index -- right? > > Any plans to implement that? +1 for this feature request. They've got a very specific and fairly rare use case, but when opportunity strikes partial indexes are much more convenient, straightforward, and efficient than the alternative. - If a table has 100,000,000 rows and each index page for it holds 100 rows, a full index will need 1,000,000 pages of storage and 4 page reads to search. - With a partial index covering only 10,000 rows of the same table, it will only need 100 pages of storage and 2 page reads to search. Big improvement while keeping query syntax and results exactly the same! > Are there any known hacks to implement something similar? There's not really any good solution. You can create a separate table with the subset in it, that's pretty much it. But that has some overhead too, and can complicate your queries. -- Cory Nelson http://int64.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Tim Romano wrote: > Igor, > Here's the example where a partial index can "hide" rows. > > From the wikipedia article cited by the OP: > > > It is not necessary that the condition be the same as the index criterion; > Stonebraker's paper below presents a number of examples with indexes similar > to the following: > > create index partial_salary on employee(age) where salary > 2100; > > > > What would happen if you issued these queries? > > select max(age) from employee > select avg(age) from employee > > Would the ages of employees earning <= 2100 be included? Of course. The presence or absence of an index never affect the meaning of a query - just its performance. > Is the > partial-index used under those circumstances? No, I don't see how it could be beneficial for these queries. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Fri, Aug 20, 2010 at 1:47 PM, Tim Romano wrote: > Igor, > Here's the example where a partial index can "hide" rows. > > From the wikipedia article cited by the OP: > > > It is not necessary that the condition be the same as the index criterion; > Stonebraker's paper below presents a number of examples with indexes similar > to the following: > > create index partial_salary on employee(age) where salary > 2100; > > > > What would happen if you issued these queries? > > select max(age) from employee > select avg(age) from employee > > Would the ages of employees earning <= 2100 be included? Yes > Is the partial-index used under those circumstances? No, it would change outcome of the query. The partial index is used only for optimizing queries that satisfy the index condition. F. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Igor, Here's the example where a partial index can "hide" rows. >From the wikipedia article cited by the OP: It is not necessary that the condition be the same as the index criterion; Stonebraker's paper below presents a number of examples with indexes similar to the following: create index partial_salary on employee(age) where salary > 2100; What would happen if you issued these queries? select max(age) from employee select avg(age) from employee Would the ages of employees earning <= 2100 be included? Is the partial-index used under those circumstances? -- Tim Romano On Thu, Aug 19, 2010 at 9:16 PM, Igor Tandetnik wrote: > Tim Romano wrote: > > How would you find a row whose column X contained value Y if the > "partial" > > index on column X specified that rows containing value Y in column X > should > > never be returned? > > No one suggests partial index should be capable of hiding anything. The > idea is that, when the query can be proven to only involve rows covered by > the partial index, the index can be used to speed up the query. Otherwise, > it simply won't be used. > -- > Igor Tandetnik > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
> Tim Romano wrote: >> How would you find a row whose column X contained value Y if the >> "partial" >> index on column X specified that rows containing value Y in column X >> should >> never be returned? > > No one suggests partial index should be capable of hiding anything. The > idea is that, when the query can be proven to only involve rows covered by > the partial index, the index can be used to speed up the query. Otherwise, > it simply won't be used. > -- > Igor Tandetnik > > The thing, that I don't get is: Why should the rows be hidden? If the index ist just used to cover some special values, it didn't hide others. If you create an index on some columns, it didn't hide the other columns, or? Artur Reilin sqlite.yuedream.de ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Tim Romano wrote: > How would you find a row whose column X contained value Y if the "partial" > index on column X specified that rows containing value Y in column X should > never be returned? No one suggests partial index should be capable of hiding anything. The idea is that, when the query can be proven to only involve rows covered by the partial index, the index can be used to speed up the query. Otherwise, it simply won't be used. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Typo: "... more performant than partial query" should read "more performant than a partial index". Tim Romano > >> > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Eric, How would you find a row whose column X contained value Y if the "partial" index on column X specified that rows containing value Y in column X should never be returned? If the index hides the row, how do you cause the row to become visible to a query? You have to drop the index. However, I would be willing to accept an index on a *virtual* column whose set of discrete possible values was a subset of the values in the actual underlying table, or some translated form of those values, for example a column that was the result of a function that converted a date to 'Q1', 'Q2', 'Q3', or 'Q4'. Compare: http://www.oracle-base.com/articles/11g/VirtualColumns_11gR1.php If your goal is performance, moving rows out of the table when they cease to meet your business rule's definition of relevance will be more performant than partial query: not only will the index contain just as few nodes, but the table itself will contain fewer rows than the table when using a partial index. And programming would not be more difficult: you'd simply substitute a trigger for the partial index declaration. Moreover, this technique would be highly portable. Partial indexes, not. Regards Tim Romano Swarthmore PA > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Thu, Aug 19, 2010 at 5:53 PM, Kees Nuyt wrote: > On Thu, 19 Aug 2010 17:39:14 -0400, Eric Smith > wrote: > >>Am I missing something? > > You could add a "deleted" column with value range (0,1) and > create an index on it if benchmarks show that makes it > faster. As a bonus it is easier to code and maintain than a > separate table with references and triggers. > > Alternatively, you can create an composite index with the > "deleted" column as one of the components. > > From a theoretical view, if you care about the visibility of > a row, you should express it as an attribute of the entity. > The solutions above comply with that notion. > -- > ( Kees Nuyt I think you've missed the point. I believe what he's getting at is this: >> CREATE INDEX foo ON bar (to_be_deleted) << Imagine if he had 100 million rows in his table, and 100 of them were marked "to_be_deleted". His index will have 100 million rows, probably 500MB or 900MB (not sure if rowid is 32- or 64-bit), consisting of 99,999,900 "0"s and 100 "1"s. If he could create what MSSQL calls a "filtered index", using a syntax like this: >> CREATE INDEX foo_filtered ON bar (to_be_deleted) WHERE to_be_deleted = 1 << he could speed up the statement >> DELETE FROM bar WHERE to_be_deleted = 1 << using that index, just like he could with the unfiltered "foo" index. The only difference is that where foo has 100 million rows, foo_filtered only contains 100 rows, taking up only 500-900 bytes (thus actually having like 300% overhead due to page sizes!) Now, in order to implement this, the following changes would have to be made: 1. Conditional logic would have to be generated inside the VDBE programs for INSERT statements. This is pretty straightforward. 2. Conditional logic would have to be generated inside the VDBE programs for UPDATE statements. Care must be taken to make sure that the index is updated properly when the column(s) referenced in the WHERE clause are updated, but other than that, it's probably pretty straightforward. 3. Depending on how the IdxDelete operator handles "key not found in index" errors, the VDBE code generated for DELETE statements may also need to be updated. 4. The statement parser needs to be modified to parse this syntax. 5. The schema parser needs to be modified to decode this syntax. 6. The optimizer needs to flatten and check that every possible branch of the WHERE clause on a SELECT/DML statement is compatible with the WHERE clause of the index, before it can use that index. Now, I personally could do #1-3, because they're pretty easy. I could probably even manage #4 and #5 if I spent a week familiarizing myself with the code. But #6, as far as I can tell, is a LOT harder. Consider the following examples: create index ix1 on Bar (baz) where quux between 30 and 95; select * from baz where quux = 35; -- Index is viable select * from baz where quux between 31 and 94; -- Index is viable select * from baz where quux = 38 or quux between 80 and 90; -- Index is viable select * from baz where quux in (40,50,60,70); -- again, index is viable select * from baz where quux between 25 and 35; -- index is NOT viable select * from baz where quux = 38 or baz = 5; -- index is NOT viable -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 19 Aug 2010, at 10:39pm, Eric Smith wrote: > I want an index that only can be used to find rows with a particular > value or set of values. Take a look at VIEWs: http://www.sqlite.org/lang_createview.html This is the SQL standard way to reduce your view of a table to just certain rows. If I understand your request, this feature should provide exactly what you want. Appropriate indexes will be used when consulting any VIEW you've defined. > Since SQLite doesn't support partial indices directly, I'm > thinking about making my own index as a separate table and > populating/depopulating it using triggers on the main table. I only > need it for fast lookups during deletion of the relevant rows, so I'll > hijack the app logic that wants to delete those rows and instead use > the secondary table to get the row ids, and delete those directly. > > Something like DELETE FROM records WHERE __recno IN (SELECT __recno > FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records. I don't understand what you're looking up here. If you have some method of recognising which rows of a table should be deleted just use the appropriate DELETE FROM ... WHERE ... command. No need for any sub-SELECT clause. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Thu, 19 Aug 2010 17:39:14 -0400, Eric Smith wrote: >Am I missing something? You could add a "deleted" column with value range (0,1) and create an index on it if benchmarks show that makes it faster. As a bonus it is easier to code and maintain than a separate table with references and triggers. Alternatively, you can create an composite index with the "deleted" column as one of the components. >From a theoretical view, if you care about the visibility of a row, you should express it as an attribute of the entity. The solutions above comply with that notion. -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Tim Romano wrote: > The partial index is one very messy thing, fraught with ambiguities, > something to avoid. I want an index that only can be used to find rows with a particular value or set of values. In what way is that ambiguous? Other databases (e.g. postgres) seem to support this kind of thing. > I can imagine other business rules being really > bollixed up by the sudden reappearance of zombie rows. This isn't a 'business rule', this is an optimization. No high level logic will change. Just like when we use other sql indices. > Under the partial index method, how would > you ever find a row again once it has become invisible, unless you were > perhaps to change or suspend the partial index rule, and cause the missing > rows to reappear? "Become invisible", meaning it no longer contains data that I care about? I don't need to find it quickly because it no longer contains data that I care about. So, I'm not sure I understand your concerns. Since SQLite doesn't support partial indices directly, I'm thinking about making my own index as a separate table and populating/depopulating it using triggers on the main table. I only need it for fast lookups during deletion of the relevant rows, so I'll hijack the app logic that wants to delete those rows and instead use the secondary table to get the row ids, and delete those directly. Something like DELETE FROM records WHERE __recno IN (SELECT __recno FROM idxTable), where __recno is the INTEGER PRIMARY KEY on records. Am I missing something? Eric -- Eric A. Smith Furbling, v.: Having to wander through a maze of ropes at an airport or bank even when you are the only person in line. -- Rich Hall, "Sniglets" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On Thu, 19 Aug 2010 17:15:40 -0400, Tim Romano wrote: >Ah, an opportunity for another purist tirade presents itself. > >I don't have a hack for SQLite but something I consider to be a much better >practice that accomplishes the same goal. If your business rules would >declare that rows with value X in column Y no longer belong to the set, the >most straightforward way to implement such a rule is to move those rows to >another table where they do belong. Use an after update/insert trigger to do >this. [...] >The partial index is one very messy thing, fraught with ambiguities, >something to avoid. I can imagine other business rules being really >bollixed up by the sudden reappearance of zombie rows. +1 My 2 cents: Pure SQL doesn't mind about indexes, in RDBMS implementations they are just an optimization feature and a way to implement unique constraints. Anything more is a can of worms indeed. Optional indexes are a codasyl hierarchical or network database feature, where indexes are exposed to the DML. -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
Ah, an opportunity for another purist tirade presents itself. I don't have a hack for SQLite but something I consider to be a much better practice that accomplishes the same goal. If your business rules would declare that rows with value X in column Y no longer belong to the set, the most straightforward way to implement such a rule is to move those rows to another table where they do belong. Use an after update/insert trigger to do this Splitting the rows into separate tables In that manner, you could move an inactive|invisible row back into active|visible status if the need should ever arise, simply by changing the column value and moving the row back into the active table. Under the partial index method, how would you ever find a row again once it has become invisible, unless you were perhaps to change or suspend the partial index rule, and cause the missing rows to reappear? The partial index is one very messy thing, fraught with ambiguities, something to avoid. I can imagine other business rules being really bollixed up by the sudden reappearance of zombie rows. Regards Tim Romano Swarthmore PA on the Gender column. On Thu, Aug 19, 2010 at 4:30 PM, Eric Smith wrote: > Afaict sqlite doesn't support indices on subsets of rows in a table, Ю > la http://en.wikipedia.org/wiki/Partial_index -- right? > > Any plans to implement that? > > Are there any known hacks to implement something similar? > > -- > Eric A. Smith > > Keeping Young #3: > Keep the juices flowing by janglin round gently as you move. >-- Satchel Paige > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users