Re: [sqlite] Index creation on huge table will never finish.
Chris Jones wrote: Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. Maybe you could start by breaking the data into 8 equal groups and make a table of each group. Then merge the original groups pairwise, then merge those 4 groups, and finally the 2 semifinal groups (kinda like March Madness, come to think of it). Since each merging will be of already sorted/indexed data, it might save a lot of time. Or perhaps do a block sort based on the first character of the string (or nth char if most of the first chars are the same) and have a bunch of smaller tables with that character as (part of) the table name. The global unique identifier could be the character concatenated with the rowid in its table. HTH, Gerry - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index creation on huge table will never finish.
You stated in your OP I need to assign a unique identifier to a large collection of strings Obviously I misunderstood that to mean you wanted the strings tagged uniquely, not that the strings were unique. In your case, it seems then, you will have to put up with checking each string, and as the db gets bigger, that task will take longer time. Maybe someone else has other ideas... On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote: I don't think that solves my problem. Sure, it guarantees that the IDs are unique, but not the strings. My whole goal is to be able to create a unique identifier for each string, in such a way that I dont have the same string listed twice, with different identifiers. In your solution, there is no way to lookup a string to see if it already exists, since there is no index on the string. Thanks, Chris -- View this message in context: http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9607996 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] - -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index creation on huge table will never finish.
I don't think that solves my problem. Sure, it guarantees that the IDs are unique, but not the strings. My whole goal is to be able to create a unique identifier for each string, in such a way that I dont have the same string listed twice, with different identifiers. In your solution, there is no way to lookup a string to see if it already exists, since there is no index on the string. Thanks, Chris -- View this message in context: http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9607996 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Index creation on huge table will never finish.
On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote: Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. My schema looks as follows: CREATE TABLE rawfen ( fen VARCHAR(80) ); CREATE INDEX rawfen_idx_fen ON rawfen(fen); Unforuntately, data loading this table takes virtually forever. After 24 hours, its not finished, and that is inserting rows in transactions of 100,000 rows per transaction. I tried dropping the index, and building it after row insertion. That has two problems. First, since I have no index, I can't guarantee string uniqueness (and I'd like to). Second, it still doesn't solve my speed problem. Insertion without the index takes a little over an hour, but the index creation never finishes. Well, I gave it 6 hours and it was unclear if it was making any progress. .. It is the uniqueness that is killing it. Doesn't matter how you do it. As more your bucket becomes bigger, it has to do more work to determine if the string is unique or not. I tried a little test. First with CREATE TABLE tbl (fen PRIMARY KEY) I got results that looked like so, committing every 100,000,... 10: 9 wallclock secs ( 7.36 usr + 1.63 sys = 8.99 CPU) 20: 12 wallclock secs ( 8.06 usr + 2.87 sys = 10.93 CPU) 30: 18 wallclock secs ( 8.27 usr + 3.29 sys = 11.56 CPU) 40: 14 wallclock secs ( 8.32 usr + 3.50 sys = 11.82 CPU) 50: 14 wallclock secs ( 8.45 usr + 3.71 sys = 12.16 CPU) 60: 15 wallclock secs ( 8.49 usr + 3.89 sys = 12.38 CPU) 70: 16 wallclock secs ( 8.68 usr + 4.15 sys = 12.83 CPU) 80: 15 wallclock secs ( 8.61 usr + 4.16 sys = 12.77 CPU) 90: 16 wallclock secs ( 8.69 usr + 4.29 sys = 12.98 CPU) 100: 17 wallclock secs ( 8.65 usr + 4.38 sys = 13.03 CPU) Then I tried with CREATE TABLE tbl (fen) It was a more flat curve like so... 10: 5 wallclock secs ( 5.19 usr + 0.09 sys = 5.28 CPU) 20: 6 wallclock secs ( 5.23 usr + 0.09 sys = 5.32 CPU) 30: 5 wallclock secs ( 5.24 usr + 0.09 sys = 5.33 CPU) 40: 6 wallclock secs ( 5.23 usr + 0.10 sys = 5.33 CPU) 50: 5 wallclock secs ( 5.22 usr + 0.09 sys = 5.31 CPU) 60: 5 wallclock secs ( 5.24 usr + 0.10 sys = 5.34 CPU) 70: 5 wallclock secs ( 5.23 usr + 0.09 sys = 5.32 CPU) 80: 6 wallclock secs ( 5.24 usr + 0.10 sys = 5.34 CPU) 90: 6 wallclock secs ( 5.26 usr + 0.10 sys = 5.36 CPU) 100: 5 wallclock secs ( 5.24 usr + 0.10 sys = 5.34 CPU) Well, since you want to tag each string with a unique id, how about CREATE TABLE tbl (id INTEGER PRIMARY KEY, fen) I get a flat curve again... 10: 6 wallclock secs ( 5.22 usr + 0.10 sys = 5.32 CPU) 20: 5 wallclock secs ( 5.24 usr + 0.09 sys = 5.33 CPU) 30: 6 wallclock secs ( 5.26 usr + 0.10 sys = 5.36 CPU) 40: 5 wallclock secs ( 5.26 usr + 0.10 sys = 5.36 CPU) 50: 5 wallclock secs ( 5.27 usr + 0.11 sys = 5.38 CPU) 60: 6 wallclock secs ( 5.27 usr + 0.10 sys = 5.37 CPU) 70: 6 wallclock secs ( 5.27 usr + 0.09 sys = 5.36 CPU) 80: 6 wallclock secs ( 5.27 usr + 0.10 sys = 5.37 CPU) 90: 6 wallclock secs ( 5.26 usr + 0.10 sys = 5.36 CPU) 100: 5 wallclock secs ( 5.26 usr + 0.10 sys = 5.36 CPU) Problem solved... id is your unique tag. It will be done in about 1.5 hours on my MacBook Pro. -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Index creation on huge table will never finish.
Hi all, I have a very simple schema. I need to assign a unique identifier to a large collection of strings, each at most 80-bytes, although typically shorter. The problem is I have 112 million of them. My schema looks as follows: CREATE TABLE rawfen ( fen VARCHAR(80) ); CREATE INDEX rawfen_idx_fen ON rawfen(fen); Unforuntately, data loading this table takes virtually forever. After 24 hours, its not finished, and that is inserting rows in transactions of 100,000 rows per transaction. I tried dropping the index, and building it after row insertion. That has two problems. First, since I have no index, I can't guarantee string uniqueness (and I'd like to). Second, it still doesn't solve my speed problem. Insertion without the index takes a little over an hour, but the index creation never finishes. Well, I gave it 6 hours and it was unclear if it was making any progress. I've read elsewhere that this is a data locality issue, which certainly makes sense. And in those threads, a suggestion has been made to insert in sorted order. But it's unclear to me exactly what the sorting function would need to be - it's likely my sorting function (say strcmp()) wouldn't match what sqlite expects. It's also a little unclear if I can even sort this many keys externally with any ease. After all, that's why I'm trying to use a database. Last, is this something that is likely to affect me if I port over to another database? Do others, say mySQL for instance, handle this better? Does anyone have any suggestions?Upgrading my server isn't an option - it already has 4Gig of memory and very fast disks in a RAID-5 configuration (with a 512MB cache on the raid controller). Thanks! -- View this message in context: http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9604705 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] beginner's questions: atexit problem
Are you mixing Cygwin and MinGW libraries and/or header files? --- timm2 <[EMAIL PROTECTED]> wrote: > Using dev-c++ 4.9.9.2-wx-beta-6.8 with mingw 3.4.2 I have got the error: > > sqlite3.A(.text+0x44):fake: multiple definition of `atexit' > C:/../lib/gcc/mingw32/3.4.2/../../../crt2.o(.text+0x260):crt1.c: first > defined here > > I did not find helpfull hint in archive of conference. > Could you help me? > Thanks for any hint, > Tim Need Mail bonding? Go to the Yahoo! Mail Q for great tips from Yahoo! Answers users. http://answers.yahoo.com/dir/?link=list=396546091 - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] beginner's questions: atexit problem
Hello, I would like to try SQLite database, but I am not able to compile quickstart example. Using dev-c++ 4.9.9.2-wx-beta-6.8 with mingw 3.4.2 I have got the error: sqlite3.A(.text+0x44):fake: multiple definition of `atexit' C:/../lib/gcc/mingw32/3.4.2/../../../crt2.o(.text+0x260):crt1.c: first defined here I did not find helpfull hint in archive of conference. Could you help me? Thanks for any hint, Tim - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] data type problem
[EMAIL PROTECTED] wrote: > but if you create a table using: > > CREATE TABLE XTollData ( > DutyID char (32) NOT NULL , > CarNumber char (10) NULL > ); > > SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'. > That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat > 'char(32)' as 'char(32)'. > I think this IS a bug. > I did some checking and standard SQL:1999 does not allow the data type (i.e. the char in your definition) in a column definition to be quoted. Therefore the column definitions given in your original post are not valid SQL syntax. You should use CREATE TABLE "XTollData" ( "DutyID" char(32) NOT NULL , "CarNumber" char(10) NULL ); or if you must use the non-standard square bracket quotes CREATE TABLE [XTollData] ( [DutyID] char(32) NOT NULL , [CarNumber] char(10) NULL ); To define your table using valid SQL (or extended SQL) syntax. SQLite accepts a wide variety of nonsense as the column data type because it is generally ignored internally since SQLite uses dynamic typing. While I think it should return whatever it accepts as the column's declared type, it really shouldn't accept the syntax you used to define your table. It should have given you a syntax error on your create statement. I have copied the pertinent sections from the standard syntax specs below. ::= { | } [ ] [ ] [ ... ] [ ] ::= ::= ::= | ::= ::= ... ::= | ::= !! See the Syntax Rules ::= "" !! two consecutive double quote characters ::= | | | | ::= [ CHARACTER SET ] | | | | | | | ::= CHARACTER [] | CHAR [] | CHARACTER VARYING | CHAR VARYING | VARCHAR | CHARACTER LARGE OBJECT [] | CHAR LARGE OBJECT [] | CLOB [] HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] data type problem
[EMAIL PROTECTED] wrote: > but if you create a table using: > > CREATE TABLE XTollData ( > DutyID char (32) NOT NULL , > CarNumber char (10) NULL > ); > > SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'. > That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat > 'char(32)' as 'char(32)'. > I think this IS a bug. > A Delphi compoenent(ASGSQLite3) use SQLite3_Column_decltype to determine the > data type and data size of a field. And this "feature" makes it get the wrong > size of '[char](32)' (as 1 byte), and get the right size of 'char(32)' (as 32 > bytes). > > I agree that what you have found is a bug, and you should report it by clicking the bugs link at http://www.sqlite.org/. In the mean time there is a workaround, and that is to eliminate the unnecessary square bracket quoting on your column type names (i.e. use char and not [char]). Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Different IN behavior from 3.2.x to 3.3.x
Paolo Vernazza <[EMAIL PROTECTED]> wrote: > Try this with an old 3.2.x sqlite version > > SQLite version 3.2.8 > Enter ".help" for instructions > sqlite> CREATE Table test (a, b, c); > sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3'); > sqlite> SELECT * FROM test WHERE a||b||c = '123'; > 1|2|3 > sqlite> SELECT * FROM test WHERE a||b||c IN ('123'); > 1|2|3 > sqlite> > > Now try with sqlite 3.3.13 > SQLite version 3.3.13 > Enter ".help" for instructions > sqlite> CREATE Table test (a, b, c); > sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3'); > sqlite> SELECT * FROM test WHERE a||b||c = '123'; > 1|2|3 > sqlite> SELECT * FROM test WHERE a||b||c IN ('123'); > sqlite> > > > Is that a sqlite bug? Am I doing something wrong? > How should I change my query to have the old behavior? > See ticket #2248 and check-in [4656] http://www.sqlite.org/cvstrac/tktview?tn=2248 http://www.sqlite.org/cvstrac/chngview?cn=3656 -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Different IN behavior from 3.2.x to 3.3.x
Try this with an old 3.2.x sqlite version SQLite version 3.2.8 Enter ".help" for instructions sqlite> CREATE Table test (a, b, c); sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3'); sqlite> SELECT * FROM test WHERE a||b||c = '123'; 1|2|3 sqlite> SELECT * FROM test WHERE a||b||c IN ('123'); 1|2|3 sqlite> Now try with sqlite 3.3.13 SQLite version 3.3.13 Enter ".help" for instructions sqlite> CREATE Table test (a, b, c); sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3'); sqlite> SELECT * FROM test WHERE a||b||c = '123'; 1|2|3 sqlite> SELECT * FROM test WHERE a||b||c IN ('123'); sqlite> But it works if values are not numbers: sqlite> INSERT into test (a,b,c) VALUES ('1a', '2b', '3c'); sqlite> SELECT * FROM test WHERE a||b||c = '1a2b3c'; 1a|2b|3c sqlite> SELECT * FROM test WHERE a||b||c IN ('1a2b3c'); 1a|2b|3c sqlite> Is that a sqlite bug? Am I doing something wrong? How should I change my query to have the old behavior? Thanks Paolo - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] data type problem
but if you create a table using: CREATE TABLE XTollData ( DutyID char (32) NOT NULL , CarNumber char (10) NULL ); SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'. That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat 'char(32)' as 'char(32)'. I think this IS a bug. A Delphi compoenent(ASGSQLite3) use SQLite3_Column_decltype to determine the data type and data size of a field. And this "feature" makes it get the wrong size of '[char](32)' (as 1 byte), and get the right size of 'char(32)' (as 32 bytes). - Original Message - From: <[EMAIL PROTECTED]> To:Sent: Tuesday, March 20, 2007 7:48 PM Subject: Re: [sqlite] data type problem <[EMAIL PROTECTED]> wrote: > if you create a table use following statement (script generated from MS SQL > Server 2000) > CREATE TABLE [XTollData] ( > [DutyID] [char] (32) NOT NULL , > [CarNumber] [char] (10) NULL > ); > > SQLite3_Column_decltype will treat DutyID as data type 'char' but not > 'char(32)' This is a feature, not a bug. If you want the database to enforce a maximum string length, then add CHECK( length(DutyID)<=32 ) -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -