[sqlite] UNIQUE constraint

2011-10-23 Thread Fabian
I have a column with a normal INDEX, and I would like to turn it into an
UNIQUE index, but I'm a bit worried about the performance implications for
inserts. Can someone give some insight into how UNIQUE is implemented in
SQLite, does it create extra tables compared to a normale index, are there
many extra checks?

And a related question: I need to insert 1 million rows, and currently I
create the INDEX afterwards, because that should be faster. I cannot create
the UNIQUE INDEX afterwards, because there's a slight possibility there will
be 1 or 2 duplicates, which will make the creation fail. I now have the
possibility to specify UNIQUE upfront (in the TABLE definition) or manually
filter out any doubles before the insert (which will also take CPU time).
Would there be any advantage doing this manually, or will SQLite do it just
as efficiently?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Igor Tandetnik
Fabian fabianpi...@gmail.com wrote:
 I have a column with a normal INDEX, and I would like to turn it into an
 UNIQUE index, but I'm a bit worried about the performance implications for
 inserts. Can someone give some insight into how UNIQUE is implemented in
 SQLite, does it create extra tables compared to a normale index, are there
 many extra checks?

No, a UNIQUE index and a regular index are implemented the exact same way. It's 
just that, at INSERT and UPDATE time, after finding a proper place to insert 
the new value, an additional check is made that the place isn't already 
occupied.

 And a related question: I need to insert 1 million rows, and currently I
 create the INDEX afterwards, because that should be faster. I cannot create
 the UNIQUE INDEX afterwards, because there's a slight possibility there will
 be 1 or 2 duplicates, which will make the creation fail. I now have the
 possibility to specify UNIQUE upfront (in the TABLE definition) or manually
 filter out any doubles before the insert (which will also take CPU time).
 Would there be any advantage doing this manually, or will SQLite do it just
 as efficiently?

I suspect the way you are going to manually filter duplicates will involve 
inserting them into a moral equivalent of a UNIQUE index. The performace is 
probably going to be similar to that of just creating a UNIQUE index up front. 
But only measurements with your particular data can tell for sure.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Fabian


 No, a UNIQUE index and a regular index are implemented the exact same way.
 It's just that, at INSERT and UPDATE time, after finding a proper place to
 insert the new value, an additional check is made that the place isn't
 already occupied.


So the only overhead for UNIQUE is that extra check?


 I suspect the way you are going to manually filter duplicates will involve
 inserting them into a moral equivalent of a UNIQUE index. The performace is
 probably going to be similar to that of just creating a UNIQUE index up
 front. But only measurements with your particular data can tell for sure.


I was planning using the HashList from the .Net framework, which should be
the fastest way to do it.

If SQLite only has to check if a certain position is occupied, it should
outperform creating a hash for each value.

What got me worried was the post Slow insertion for Unique Text Column to
this mailinglist, I quote:

---

I was creating a new table and populating it with 100,000 rows of data (as
a
test case; I really wanted to populate it with over a million rows).

[Insertion A] When a Text Column was NOT Unique it would take:
8875 ms = ~9 seconds

[Insertion B] When a Text Column was Unique it would take:
155781 ms = ~156 seconds

-

The difference between 9 and 156 seconds is too large for me. The original
author solved this by pre-sorting the data for the indexed column, which
made the difference 9 to 12 seconds (including the pre-sort), which is very
acceptable.

So why does SQLite not pre-sort the data itself, when doing such large batch
inserts inside a transaction?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Simon Slavin

On 23 Oct 2011, at 3:49pm, Fabian wrote:

 So the only overhead for UNIQUE is that extra check? [snip]

Right.  When doing an INSERT or UPDATE, it checks to see whether the value it's 
trying to add to the index already exists in the index.  If it does, the result 
is an error.  There is no difference in the file format used to store the index.

 The difference between 9 and 156 seconds is too large for me. The original
 author solved this by pre-sorting the data for the indexed column, which
 made the difference 9 to 12 seconds (including the pre-sort), which is very
 acceptable.
 
 So why does SQLite not pre-sort the data itself, when doing such large batch
 inserts inside a transaction?

In this example, the indexed column is a text column.  The text fields could 
have been very long, and checking long text fields for uniqueness can involve 
comparing every byte.  Nevertheless, I do not understand the results you 
quoted.  I wonder whether some use of transactions would have vastly reduced 
the problem.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Fabian
2011/10/23 Simon Slavin slav...@bigfraud.org


 In this example, the indexed column is a text column.  The text fields
 could have been very long, and checking long text fields for uniqueness can
 involve comparing every byte.  Nevertheless, I do not understand the results
 you quoted.  I wonder whether some use of transactions would have vastly
 reduced the problem.


In my case, the indexed column is a text column too, and the original author
already stated he batched the inserts into transactions, and made use of
PRAGMA statements. Only after pre-sorting the data, the problem disappeared,
but I'd rather avoid having to pre-sort everything (because SQLite's C code
should be able to do it faster, than my C# code).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint on column

2007-02-04 Thread drh
A. Pagaltzis [EMAIL PROTECTED] wrote:

 It's a pity that INSERT OR IGNORE (apparently?) does not set
 last_insert_id properly regardless of outcome,

Consider this case:

  CREATE TABLE t1(x INTEGER PRIMARY KEY, y UNIQUE, z UNIQUE);
  INSERT INTO t1 VALUES(1,'a','b');
  INSERT INTO t1 VALUES(2,'x','y');

  INSERT OR IGNORE INTO t1 VALUES(NULL, 'a','y');

For the INSERT OR IGNORE statement, does the last_insert_rowid
get set to 1 or 2?

--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-02-01 Thread Shane Harrelson

Shane Harrelson [EMAIL PROTECTED] wrote:

 I have two tables, an Objects table with a foreign key into a second
 Strings table which is composed of unique values.  It is a many to
 one relationship, that is, several Objects may reference the same
 String.   When an Object is added, its associated String is added to
 the Strings table.   If the String already exists in the Strings
 table, I'd like the new Object to reference the existing copy.

 Currently, I've implemented it as so (leaving out error handling, etc.):

 begin transaction
 insert into Strings (value) VALUES ( 'foo')
 if string insert result is SQLITE_OK
get rowid of last insert (sqlite3_last_insert_rowid)
 else if result is SQLITE_CONSTRAINT
select rowid from Strings where value = 'foo'
 end if
 if rowid
insert into Objects (string_id) VALUES (rowid)
 end if
 if no error
commit transaction
 else
rollback transaction
 end if

 With my dataset, there is about a 10% chance of the string being a
 duplicate -- that is about 1 in 10 string inserts hit the UNIQUE
 constraint violation.

 I've tested viloating the internals of the VBDE and pulling the
 rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT
 result is returned and it is measurably (5-10%) faster then doing the
 subsequent SELECT.


What you are doing is the most efficient way that I can think of
right off hand.  If violating the internals is something you want
to do that's fine - just remember that it is likely to break in
irrepaiable ways in some future point release.  No tears.

Notice that an sqlite3_last_constraint_rowid() function doesn't really
work because an insert might fail due to multiple constraint violations
all on different rows.  In your schema, perhaps, there can be no more
than one constraint violated at a time, but it is easy enough to
construct a schema where multiple rows can violate a constraint, so the
sqlite3_last_constraint_rowid() idea does not generalize well.

--
D. Richard Hipp  [EMAIL PROTECTED]



Thanks all for the feedback.   I was able to get comparable speed-ups
to my vbde hack on the string insertion by using a small cache of
rowids for previously inserted strings.  This reduced the duplicate
inserts to about 1 in 100.

I still think someway of getting the rowid of the record that forced
the query to end with a constraint violation (even if it isn't the
only one) would be useful.  Perhaps it could be added to the current
error string which already includes the column name(s).

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] UNIQUE constraint on column

2007-01-31 Thread Shane Harrelson

when i try to insert a row into a table that has a UNIQUE constraint
on a column, and I get the SQLITE_CONSTRAINT result code because i'm
inserting a duplicate value, is there anyway to determine the rowid of
the conflict?

looking at the internals of the VDBE, i found that the rowid of the
conflicting row is pushed on top of the VDBE stack.

if i'm willing to violate the interface, i can dereference the rowid
from the internals of the VDBE struct.  i'd rather not do this... is
there a more formal mechanism for getting this value without having do
to do a separate select query?  something like
sqlite3_last_insert_rowid() -- ie.  sqlite3_last_conflict_rowid()?

thanks.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread drh
Shane Harrelson [EMAIL PROTECTED] wrote:
 when i try to insert a row into a table that has a UNIQUE constraint
 on a column, and I get the SQLITE_CONSTRAINT result code because i'm
 inserting a duplicate value, is there anyway to determine the rowid of
 the conflict?
 
 looking at the internals of the VDBE, i found that the rowid of the
 conflicting row is pushed on top of the VDBE stack.
 
 if i'm willing to violate the interface, i can dereference the rowid
 from the internals of the VDBE struct.  i'd rather not do this... is
 there a more formal mechanism for getting this value without having do
 to do a separate select query?  something like
 sqlite3_last_insert_rowid() -- ie.  sqlite3_last_conflict_rowid()?
 

The official way to find the conflicting entry is to do a query.

   SELECT rowid FROM table WHERE uniquecolumn=?;

--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread Shane Harrelson

On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Shane Harrelson [EMAIL PROTECTED] wrote:
 when i try to insert a row into a table that has a UNIQUE constraint
 on a column, and I get the SQLITE_CONSTRAINT result code because i'm
 inserting a duplicate value, is there anyway to determine the rowid of
 the conflict?

 looking at the internals of the VDBE, i found that the rowid of the
 conflicting row is pushed on top of the VDBE stack.

 if i'm willing to violate the interface, i can dereference the rowid
 from the internals of the VDBE struct.  i'd rather not do this... is
 there a more formal mechanism for getting this value without having do
 to do a separate select query?  something like
 sqlite3_last_insert_rowid() -- ie.  sqlite3_last_conflict_rowid()?


The official way to find the conflicting entry is to do a query.

  SELECT rowid FROM table WHERE uniquecolumn=?;

--
D. Richard Hipp  [EMAIL PROTECTED]



Thank you for the reply.  I assumed this was most likely the case, and
as I said in my original email, I was hoping to avoid having to do a
seperate select query for the sake of speed... especially since the
value I needed was so tantalizing close in the VDBE struct.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread Dennis Cote

Shane Harrelson wrote:

On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


The official way to find the conflicting entry is to do a query.

  SELECT rowid FROM table WHERE uniquecolumn=?;




Thank you for the reply.  I assumed this was most likely the case, and
as I said in my original email, I was hoping to avoid having to do a
seperate select query for the sake of speed... especially since the
value I needed was so tantalizing close in the VDBE struct.


Shane,

Why do you want the rowid of the conflicting row?

I only ask because I suspect there may be a better way to do what you 
want to do.


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread Kees Nuyt

Hi Shane,

On Wed, 31 Jan 2007 09:29:24 -0500, you wrote:

On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Shane Harrelson [EMAIL PROTECTED] wrote:
  when i try to insert a row into a table that has a UNIQUE constraint
  on a column, and I get the SQLITE_CONSTRAINT result code because i'm
  inserting a duplicate value, is there anyway to determine the rowid of
  the conflict?
 
  looking at the internals of the VDBE, i found that the rowid of the
  conflicting row is pushed on top of the VDBE stack.
 
  if i'm willing to violate the interface, i can dereference the rowid
  from the internals of the VDBE struct.  i'd rather not do this... is
  there a more formal mechanism for getting this value without having do
  to do a separate select query?  something like
  sqlite3_last_insert_rowid() -- ie.  sqlite3_last_conflict_rowid()?
 

 The official way to find the conflicting entry is to do a query.

   SELECT rowid FROM table WHERE uniquecolumn=?;

 --
 D. Richard Hipp  [EMAIL PROTECTED]


Thank you for the reply.  I assumed this was most likely the case, and
as I said in my original email, I was hoping to avoid having to do a
seperate select query for the sake of speed... especially since the
value I needed was so tantalizing close in the VDBE struct.

The SELECT will be quite fast, chances are the required pages of
the unique index will still be in the cache.
In my opinion it is always better to write portable code, so I
would prefer to rely on generic SQL than on yet another
implementation specific API.

Just my 2 cents.
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread Shane Harrelson

On 1/31/07, Dennis Cote [EMAIL PROTECTED] wrote:

Shane Harrelson wrote:
 On 1/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 The official way to find the conflicting entry is to do a query.

   SELECT rowid FROM table WHERE uniquecolumn=?;



 Thank you for the reply.  I assumed this was most likely the case, and
 as I said in my original email, I was hoping to avoid having to do a
 seperate select query for the sake of speed... especially since the
 value I needed was so tantalizing close in the VDBE struct.

Shane,

Why do you want the rowid of the conflicting row?

I only ask because I suspect there may be a better way to do what you
want to do.

Dennis Cote



I have two tables, an Objects table with a foreign key into a second
Strings table which is composed of unique values.  It is a many to
one relationship, that is, several Objects may reference the same
String.   When an Object is added, its associated String is added to
the Strings table.   If the String already exists in the Strings
table, I'd like the new Object to reference the existing copy.

Currently, I've implemented it as so (leaving out error handling, etc.):

begin transaction
insert into Strings (value) VALUES ( 'foo')
if string insert result is SQLITE_OK
  get rowid of last insert (sqlite3_last_insert_rowid)
else if result is SQLITE_CONSTRAINT
  select rowid from Strings where value = 'foo'
end if
if rowid
  insert into Objects (string_id) VALUES (rowid)
end if
if no error
  commit transaction
else
  rollback transaction
end if

With my dataset, there is about a 10% chance of the string being a
duplicate -- that is about 1 in 10 string inserts hit the UNIQUE
constraint violation.

I've tested viloating the internals of the VBDE and pulling the
rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT
result is returned and it is measurably (5-10%) faster then doing the
subsequent SELECT.

Any help or suggestions with how to do this better would be appreciated.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread Nicolas Williams
On Wed, Jan 31, 2007 at 05:23:29PM -0500, Shane Harrelson wrote:
 
 I have two tables, an Objects table with a foreign key into a second
 Strings table which is composed of unique values.  It is a many to
 one relationship, that is, several Objects may reference the same
 String.   When an Object is added, its associated String is added to
 the Strings table.   If the String already exists in the Strings
 table, I'd like the new Object to reference the existing copy.
 
 Currently, I've implemented it as so (leaving out error handling, etc.):
 
 begin transaction
 insert into Strings (value) VALUES ( 'foo')
 if string insert result is SQLITE_OK

Sounds like you should want to use INSERT OR IGNORE ... INTO Strings and
then SELECT the rowid of the string for use in INSERTing INTO Object.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE constraint on column

2007-01-31 Thread drh
Shane Harrelson [EMAIL PROTECTED] wrote:
 
 I have two tables, an Objects table with a foreign key into a second
 Strings table which is composed of unique values.  It is a many to
 one relationship, that is, several Objects may reference the same
 String.   When an Object is added, its associated String is added to
 the Strings table.   If the String already exists in the Strings
 table, I'd like the new Object to reference the existing copy.
 
 Currently, I've implemented it as so (leaving out error handling, etc.):
 
 begin transaction
 insert into Strings (value) VALUES ( 'foo')
 if string insert result is SQLITE_OK
get rowid of last insert (sqlite3_last_insert_rowid)
 else if result is SQLITE_CONSTRAINT
select rowid from Strings where value = 'foo'
 end if
 if rowid
insert into Objects (string_id) VALUES (rowid)
 end if
 if no error
commit transaction
 else
rollback transaction
 end if
 
 With my dataset, there is about a 10% chance of the string being a
 duplicate -- that is about 1 in 10 string inserts hit the UNIQUE
 constraint violation.
 
 I've tested viloating the internals of the VBDE and pulling the
 rowid of the duplicate off the VBDE stack when the SQLITE_CONSTRAINT
 result is returned and it is measurably (5-10%) faster then doing the
 subsequent SELECT.
 

What you are doing is the most efficient way that I can think of
right off hand.  If violating the internals is something you want
to do that's fine - just remember that it is likely to break in
irrepaiable ways in some future point release.  No tears.

Notice that an sqlite3_last_constraint_rowid() function doesn't really
work because an insert might fail due to multiple constraint violations
all on different rows.  In your schema, perhaps, there can be no more
than one constraint violated at a time, but it is easy enough to 
construct a schema where multiple rows can violate a constraint, so the
sqlite3_last_constraint_rowid() idea does not generalize well.

--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] UNIQUE constraint on column

2007-01-30 Thread Shane Harrelson

when i try to insert a row into a table that has a UNIQUE constraint
on a column, and I get the SQLITE_CONSTRAINT result code because i'm
inserting a duplicate value, is there anyway to determine the rowid of
the conflict?

looking at the internals of the VDBE, i found that the rowid of the
conflicting row is pushed on top of the VDBE stack.

if i'm willing to violate the interface, i can dereference the rowid
from the internals of the VDBE struct.  i'd rather not do this... is
there a more formal mechanism for getting this value without having do
to do a separate select query?  something like
sqlite3_last_insert_rowid() -- ie.  sqlite3_last_conflict_rowid()?

thanks.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] UNIQUE Constraint but case sensitive

2005-06-24 Thread Dan Kennedy

 Hi,
 
 I need to create a unique constraint on a column of type varchar, but it is
 not case sensitive by default. 
 Does any one know how to make a unique constraint case sensitive? 
 

CREATE TABLE abc(a UNIQUE COLLATE NOCASE);

Bear in mind that the NOCASE collation sequence only knows about the 26
letters used in the English language.





__ 
Yahoo! Mail 
Stay connected, organized, and protected. Take the tour: 
http://tour.mail.yahoo.com/mailtour.html 



[sqlite] UNIQUE Constraint but case sensitive

2005-06-23 Thread Ajay









 
  
  
  
  
  Hi,
  I need
  to create a unique constraint on a column of type varchar, but it is not case
  sensitive by default. 
  Does any one know how to make a unique constraint case sensitive? 
  Thanks
  and best regards, 
  
 


Ajay Sonawane