[sqlite] UNIQUE constraint
I have a column with a normal INDEX, and I would like to turn it into an UNIQUE index, but I'm a bit worried about the performance implications for inserts. Can someone give some insight into how UNIQUE is implemented in SQLite, does it create extra tables compared to a normale index, are there many extra checks? And a related question: I need to insert 1 million rows, and currently I create the INDEX afterwards, because that should be faster. I cannot create the UNIQUE INDEX afterwards, because there's a slight possibility there will be 1 or 2 duplicates, which will make the creation fail. I now have the possibility to specify UNIQUE upfront (in the TABLE definition) or manually filter out any doubles before the insert (which will also take CPU time). Would there be any advantage doing this manually, or will SQLite do it just as efficiently? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
Fabian fabianpi...@gmail.com wrote: I have a column with a normal INDEX, and I would like to turn it into an UNIQUE index, but I'm a bit worried about the performance implications for inserts. Can someone give some insight into how UNIQUE is implemented in SQLite, does it create extra tables compared to a normale index, are there many extra checks? No, a UNIQUE index and a regular index are implemented the exact same way. It's just that, at INSERT and UPDATE time, after finding a proper place to insert the new value, an additional check is made that the place isn't already occupied. And a related question: I need to insert 1 million rows, and currently I create the INDEX afterwards, because that should be faster. I cannot create the UNIQUE INDEX afterwards, because there's a slight possibility there will be 1 or 2 duplicates, which will make the creation fail. I now have the possibility to specify UNIQUE upfront (in the TABLE definition) or manually filter out any doubles before the insert (which will also take CPU time). Would there be any advantage doing this manually, or will SQLite do it just as efficiently? I suspect the way you are going to manually filter duplicates will involve inserting them into a moral equivalent of a UNIQUE index. The performace is probably going to be similar to that of just creating a UNIQUE index up front. But only measurements with your particular data can tell for sure. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
No, a UNIQUE index and a regular index are implemented the exact same way. It's just that, at INSERT and UPDATE time, after finding a proper place to insert the new value, an additional check is made that the place isn't already occupied. So the only overhead for UNIQUE is that extra check? I suspect the way you are going to manually filter duplicates will involve inserting them into a moral equivalent of a UNIQUE index. The performace is probably going to be similar to that of just creating a UNIQUE index up front. But only measurements with your particular data can tell for sure. I was planning using the HashList from the .Net framework, which should be the fastest way to do it. If SQLite only has to check if a certain position is occupied, it should outperform creating a hash for each value. What got me worried was the post Slow insertion for Unique Text Column to this mailinglist, I quote: --- I was creating a new table and populating it with 100,000 rows of data (as a test case; I really wanted to populate it with over a million rows). [Insertion A] When a Text Column was NOT Unique it would take: 8875 ms = ~9 seconds [Insertion B] When a Text Column was Unique it would take: 155781 ms = ~156 seconds - The difference between 9 and 156 seconds is too large for me. The original author solved this by pre-sorting the data for the indexed column, which made the difference 9 to 12 seconds (including the pre-sort), which is very acceptable. So why does SQLite not pre-sort the data itself, when doing such large batch inserts inside a transaction? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
On 23 Oct 2011, at 3:49pm, Fabian wrote: So the only overhead for UNIQUE is that extra check? [snip] Right. When doing an INSERT or UPDATE, it checks to see whether the value it's trying to add to the index already exists in the index. If it does, the result is an error. There is no difference in the file format used to store the index. The difference between 9 and 156 seconds is too large for me. The original author solved this by pre-sorting the data for the indexed column, which made the difference 9 to 12 seconds (including the pre-sort), which is very acceptable. So why does SQLite not pre-sort the data itself, when doing such large batch inserts inside a transaction? In this example, the indexed column is a text column. The text fields could have been very long, and checking long text fields for uniqueness can involve comparing every byte. Nevertheless, I do not understand the results you quoted. I wonder whether some use of transactions would have vastly reduced the problem. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
2011/10/23 Simon Slavin slav...@bigfraud.org In this example, the indexed column is a text column. The text fields could have been very long, and checking long text fields for uniqueness can involve comparing every byte. Nevertheless, I do not understand the results you quoted. I wonder whether some use of transactions would have vastly reduced the problem. In my case, the indexed column is a text column too, and the original author already stated he batched the inserts into transactions, and made use of PRAGMA statements. Only after pre-sorting the data, the problem disappeared, but I'd rather avoid having to pre-sort everything (because SQLite's C code should be able to do it faster, than my C# code). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint on column
A. Pagaltzis [EMAIL PROTECTED] wrote: It's a pity that INSERT OR IGNORE (apparently?) does not set last_insert_id properly regardless of outcome, Consider this case: CREATE TABLE t1(x INTEGER PRIMARY KEY, y UNIQUE, z UNIQUE); INSERT INTO t1 VALUES(1,'a','b'); INSERT INTO t1 VALUES(2,'x','y'); INSERT OR IGNORE INTO t1 VALUES(NULL, 'a','y'); For the INSERT OR IGNORE statement, does the last_insert_rowid get set to 1 or 2? -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
Shane Harrelson [EMAIL PROTECTED] wrote: I have two tables, an Objects table with a foreign key into a second Strings table which is composed of unique values. It is a many to one relationship, that is, several Objects may reference the same String. When an Object is added, its associated String is added to the Strings table. If the String already exists in the Strings table, I'd like the new Object to reference the existing copy. Currently, I've implemented it as so (leaving out error handling, etc.): begin transaction insert into Strings (value) VALUES ( 'foo') if string insert result is SQLITE_OK get rowid of last insert (sqlite3_last_insert_rowid) else if result is SQLITE_CONSTRAINT select rowid from Strings where value = 'foo' end if if rowid insert into Objects (string_id) VALUES (rowid) end if if no error commit transaction else rollback transaction end if With my dataset, there is about a 10% chance of the string being a duplicate -- that is about 1 in 10 string inserts hit the UNIQUE constraint violation. I've tested viloating the internals of the VBDE and pulling the rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT result is returned and it is measurably (5-10%) faster then doing the subsequent SELECT. What you are doing is the most efficient way that I can think of right off hand. If violating the internals is something you want to do that's fine - just remember that it is likely to break in irrepaiable ways in some future point release. No tears. Notice that an sqlite3_last_constraint_rowid() function doesn't really work because an insert might fail due to multiple constraint violations all on different rows. In your schema, perhaps, there can be no more than one constraint violated at a time, but it is easy enough to construct a schema where multiple rows can violate a constraint, so the sqlite3_last_constraint_rowid() idea does not generalize well. -- D. Richard Hipp [EMAIL PROTECTED] Thanks all for the feedback. I was able to get comparable speed-ups to my vbde hack on the string insertion by using a small cache of rowids for previously inserted strings. This reduced the duplicate inserts to about 1 in 100. I still think someway of getting the rowid of the record that forced the query to end with a constraint violation (even if it isn't the only one) would be useful. Perhaps it could be added to the current error string which already includes the column name(s). - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] UNIQUE constraint on column
when i try to insert a row into a table that has a UNIQUE constraint on a column, and I get the SQLITE_CONSTRAINT result code because i'm inserting a duplicate value, is there anyway to determine the rowid of the conflict? looking at the internals of the VDBE, i found that the rowid of the conflicting row is pushed on top of the VDBE stack. if i'm willing to violate the interface, i can dereference the rowid from the internals of the VDBE struct. i'd rather not do this... is there a more formal mechanism for getting this value without having do to do a separate select query? something like sqlite3_last_insert_rowid() -- ie. sqlite3_last_conflict_rowid()? thanks. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
Shane Harrelson [EMAIL PROTECTED] wrote: when i try to insert a row into a table that has a UNIQUE constraint on a column, and I get the SQLITE_CONSTRAINT result code because i'm inserting a duplicate value, is there anyway to determine the rowid of the conflict? looking at the internals of the VDBE, i found that the rowid of the conflicting row is pushed on top of the VDBE stack. if i'm willing to violate the interface, i can dereference the rowid from the internals of the VDBE struct. i'd rather not do this... is there a more formal mechanism for getting this value without having do to do a separate select query? something like sqlite3_last_insert_rowid() -- ie. sqlite3_last_conflict_rowid()? The official way to find the conflicting entry is to do a query. SELECT rowid FROM table WHERE uniquecolumn=?; -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Shane Harrelson [EMAIL PROTECTED] wrote: when i try to insert a row into a table that has a UNIQUE constraint on a column, and I get the SQLITE_CONSTRAINT result code because i'm inserting a duplicate value, is there anyway to determine the rowid of the conflict? looking at the internals of the VDBE, i found that the rowid of the conflicting row is pushed on top of the VDBE stack. if i'm willing to violate the interface, i can dereference the rowid from the internals of the VDBE struct. i'd rather not do this... is there a more formal mechanism for getting this value without having do to do a separate select query? something like sqlite3_last_insert_rowid() -- ie. sqlite3_last_conflict_rowid()? The official way to find the conflicting entry is to do a query. SELECT rowid FROM table WHERE uniquecolumn=?; -- D. Richard Hipp [EMAIL PROTECTED] Thank you for the reply. I assumed this was most likely the case, and as I said in my original email, I was hoping to avoid having to do a seperate select query for the sake of speed... especially since the value I needed was so tantalizing close in the VDBE struct. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
Shane Harrelson wrote: On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: The official way to find the conflicting entry is to do a query. SELECT rowid FROM table WHERE uniquecolumn=?; Thank you for the reply. I assumed this was most likely the case, and as I said in my original email, I was hoping to avoid having to do a seperate select query for the sake of speed... especially since the value I needed was so tantalizing close in the VDBE struct. Shane, Why do you want the rowid of the conflicting row? I only ask because I suspect there may be a better way to do what you want to do. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
Hi Shane, On Wed, 31 Jan 2007 09:29:24 -0500, you wrote: On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Shane Harrelson [EMAIL PROTECTED] wrote: when i try to insert a row into a table that has a UNIQUE constraint on a column, and I get the SQLITE_CONSTRAINT result code because i'm inserting a duplicate value, is there anyway to determine the rowid of the conflict? looking at the internals of the VDBE, i found that the rowid of the conflicting row is pushed on top of the VDBE stack. if i'm willing to violate the interface, i can dereference the rowid from the internals of the VDBE struct. i'd rather not do this... is there a more formal mechanism for getting this value without having do to do a separate select query? something like sqlite3_last_insert_rowid() -- ie. sqlite3_last_conflict_rowid()? The official way to find the conflicting entry is to do a query. SELECT rowid FROM table WHERE uniquecolumn=?; -- D. Richard Hipp [EMAIL PROTECTED] Thank you for the reply. I assumed this was most likely the case, and as I said in my original email, I was hoping to avoid having to do a seperate select query for the sake of speed... especially since the value I needed was so tantalizing close in the VDBE struct. The SELECT will be quite fast, chances are the required pages of the unique index will still be in the cache. In my opinion it is always better to write portable code, so I would prefer to rely on generic SQL than on yet another implementation specific API. Just my 2 cents. -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
On 1/31/07, Dennis Cote [EMAIL PROTECTED] wrote: Shane Harrelson wrote: On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: The official way to find the conflicting entry is to do a query. SELECT rowid FROM table WHERE uniquecolumn=?; Thank you for the reply. I assumed this was most likely the case, and as I said in my original email, I was hoping to avoid having to do a seperate select query for the sake of speed... especially since the value I needed was so tantalizing close in the VDBE struct. Shane, Why do you want the rowid of the conflicting row? I only ask because I suspect there may be a better way to do what you want to do. Dennis Cote I have two tables, an Objects table with a foreign key into a second Strings table which is composed of unique values. It is a many to one relationship, that is, several Objects may reference the same String. When an Object is added, its associated String is added to the Strings table. If the String already exists in the Strings table, I'd like the new Object to reference the existing copy. Currently, I've implemented it as so (leaving out error handling, etc.): begin transaction insert into Strings (value) VALUES ( 'foo') if string insert result is SQLITE_OK get rowid of last insert (sqlite3_last_insert_rowid) else if result is SQLITE_CONSTRAINT select rowid from Strings where value = 'foo' end if if rowid insert into Objects (string_id) VALUES (rowid) end if if no error commit transaction else rollback transaction end if With my dataset, there is about a 10% chance of the string being a duplicate -- that is about 1 in 10 string inserts hit the UNIQUE constraint violation. I've tested viloating the internals of the VBDE and pulling the rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT result is returned and it is measurably (5-10%) faster then doing the subsequent SELECT. Any help or suggestions with how to do this better would be appreciated. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
On Wed, Jan 31, 2007 at 05:23:29PM -0500, Shane Harrelson wrote: I have two tables, an Objects table with a foreign key into a second Strings table which is composed of unique values. It is a many to one relationship, that is, several Objects may reference the same String. When an Object is added, its associated String is added to the Strings table. If the String already exists in the Strings table, I'd like the new Object to reference the existing copy. Currently, I've implemented it as so (leaving out error handling, etc.): begin transaction insert into Strings (value) VALUES ( 'foo') if string insert result is SQLITE_OK Sounds like you should want to use INSERT OR IGNORE ... INTO Strings and then SELECT the rowid of the string for use in INSERTing INTO Object. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE constraint on column
Shane Harrelson [EMAIL PROTECTED] wrote: I have two tables, an Objects table with a foreign key into a second Strings table which is composed of unique values. It is a many to one relationship, that is, several Objects may reference the same String. When an Object is added, its associated String is added to the Strings table. If the String already exists in the Strings table, I'd like the new Object to reference the existing copy. Currently, I've implemented it as so (leaving out error handling, etc.): begin transaction insert into Strings (value) VALUES ( 'foo') if string insert result is SQLITE_OK get rowid of last insert (sqlite3_last_insert_rowid) else if result is SQLITE_CONSTRAINT select rowid from Strings where value = 'foo' end if if rowid insert into Objects (string_id) VALUES (rowid) end if if no error commit transaction else rollback transaction end if With my dataset, there is about a 10% chance of the string being a duplicate -- that is about 1 in 10 string inserts hit the UNIQUE constraint violation. I've tested viloating the internals of the VBDE and pulling the rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT result is returned and it is measurably (5-10%) faster then doing the subsequent SELECT. What you are doing is the most efficient way that I can think of right off hand. If violating the internals is something you want to do that's fine - just remember that it is likely to break in irrepaiable ways in some future point release. No tears. Notice that an sqlite3_last_constraint_rowid() function doesn't really work because an insert might fail due to multiple constraint violations all on different rows. In your schema, perhaps, there can be no more than one constraint violated at a time, but it is easy enough to construct a schema where multiple rows can violate a constraint, so the sqlite3_last_constraint_rowid() idea does not generalize well. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] UNIQUE constraint on column
when i try to insert a row into a table that has a UNIQUE constraint on a column, and I get the SQLITE_CONSTRAINT result code because i'm inserting a duplicate value, is there anyway to determine the rowid of the conflict? looking at the internals of the VDBE, i found that the rowid of the conflicting row is pushed on top of the VDBE stack. if i'm willing to violate the interface, i can dereference the rowid from the internals of the VDBE struct. i'd rather not do this... is there a more formal mechanism for getting this value without having do to do a separate select query? something like sqlite3_last_insert_rowid() -- ie. sqlite3_last_conflict_rowid()? thanks. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] UNIQUE Constraint but case sensitive
Hi, I need to create a unique constraint on a column of type varchar, but it is not case sensitive by default. Does any one know how to make a unique constraint case sensitive? CREATE TABLE abc(a UNIQUE COLLATE NOCASE); Bear in mind that the NOCASE collation sequence only knows about the 26 letters used in the English language. __ Yahoo! Mail Stay connected, organized, and protected. Take the tour: http://tour.mail.yahoo.com/mailtour.html
[sqlite] UNIQUE Constraint but case sensitive
Hi, I need to create a unique constraint on a column of type varchar, but it is not case sensitive by default. Does any one know how to make a unique constraint case sensitive? Thanks and best regards, Ajay Sonawane