Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Gerry Snyder

Chris Jones wrote:

Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.


Maybe you could start by breaking the data into 8 equal groups and make 
a table of each group. Then merge the original groups pairwise, then 
merge those 4 groups, and finally the 2 semifinal groups (kinda like 
March Madness, come to think of it). Since each merging will be of 
already sorted/indexed data, it might save a lot of time.


Or perhaps do a block sort based on the first character of the string 
(or nth char if most of the first chars are the same) and have a bunch 
of smaller tables with that character as (part of) the table name. The 
global unique identifier could be the character concatenated with the 
rowid in its table.


HTH,

Gerry

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread P Kishor

You stated in your OP


I need to assign a unique identifier to a large collection of strings


Obviously I misunderstood that to mean you wanted the strings tagged
uniquely, not that the strings were unique. In your case, it seems
then, you will have to put up with checking each string, and as the db
gets bigger, that task will take longer time.

Maybe someone else has other ideas...

On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote:


I don't think that solves my problem.  Sure, it guarantees that the IDs are
unique, but not the strings.

My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identifiers.

In your solution, there is no way to lookup a string to see if it already
exists, since there is no index on the string.

Thanks,
Chris
--
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9607996
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-





--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones

I don't think that solves my problem.  Sure, it guarantees that the IDs are
unique, but not the strings.  

My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identifiers.

In your solution, there is no way to lookup a string to see if it already
exists, since there is no index on the string.

Thanks,
Chris
-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9607996
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread P Kishor

On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote:


Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.

My schema looks as follows:

CREATE TABLE rawfen ( fen VARCHAR(80) );
CREATE INDEX rawfen_idx_fen ON rawfen(fen);

Unforuntately, data loading this table takes virtually forever.  After 24
hours, its not finished, and that is inserting rows in transactions of
100,000 rows per transaction.

I tried dropping the index, and building it after row insertion.

That has two problems.  First, since I have no index, I can't guarantee
string uniqueness (and I'd like to).

Second, it still doesn't solve my speed problem.   Insertion without the
index takes a little over an hour, but the index creation never finishes.
Well, I gave it 6 hours and it was unclear if it was making any progress.


..

It is the uniqueness that is killing it. Doesn't matter how you do it.
As more your bucket becomes bigger, it has to do more work to
determine if the string is unique or not. I tried a little test. First
with

CREATE TABLE tbl (fen PRIMARY KEY)

I got results that looked like so, committing every 100,000,...

10:  9 wallclock secs ( 7.36 usr +  1.63 sys =  8.99 CPU)
20: 12 wallclock secs ( 8.06 usr +  2.87 sys = 10.93 CPU)
30: 18 wallclock secs ( 8.27 usr +  3.29 sys = 11.56 CPU)
40: 14 wallclock secs ( 8.32 usr +  3.50 sys = 11.82 CPU)
50: 14 wallclock secs ( 8.45 usr +  3.71 sys = 12.16 CPU)
60: 15 wallclock secs ( 8.49 usr +  3.89 sys = 12.38 CPU)
70: 16 wallclock secs ( 8.68 usr +  4.15 sys = 12.83 CPU)
80: 15 wallclock secs ( 8.61 usr +  4.16 sys = 12.77 CPU)
90: 16 wallclock secs ( 8.69 usr +  4.29 sys = 12.98 CPU)
100: 17 wallclock secs ( 8.65 usr +  4.38 sys = 13.03 CPU)

Then I tried with

CREATE TABLE tbl (fen)

It was a more flat curve like so...

10:  5 wallclock secs ( 5.19 usr +  0.09 sys =  5.28 CPU)
20:  6 wallclock secs ( 5.23 usr +  0.09 sys =  5.32 CPU)
30:  5 wallclock secs ( 5.24 usr +  0.09 sys =  5.33 CPU)
40:  6 wallclock secs ( 5.23 usr +  0.10 sys =  5.33 CPU)
50:  5 wallclock secs ( 5.22 usr +  0.09 sys =  5.31 CPU)
60:  5 wallclock secs ( 5.24 usr +  0.10 sys =  5.34 CPU)
70:  5 wallclock secs ( 5.23 usr +  0.09 sys =  5.32 CPU)
80:  6 wallclock secs ( 5.24 usr +  0.10 sys =  5.34 CPU)
90:  6 wallclock secs ( 5.26 usr +  0.10 sys =  5.36 CPU)
100:  5 wallclock secs ( 5.24 usr +  0.10 sys =  5.34 CPU)

Well, since you want to tag each string with a unique id, how about

CREATE TABLE tbl (id INTEGER PRIMARY KEY, fen)

I get a flat curve again...

10:  6 wallclock secs ( 5.22 usr +  0.10 sys =  5.32 CPU)
20:  5 wallclock secs ( 5.24 usr +  0.09 sys =  5.33 CPU)
30:  6 wallclock secs ( 5.26 usr +  0.10 sys =  5.36 CPU)
40:  5 wallclock secs ( 5.26 usr +  0.10 sys =  5.36 CPU)
50:  5 wallclock secs ( 5.27 usr +  0.11 sys =  5.38 CPU)
60:  6 wallclock secs ( 5.27 usr +  0.10 sys =  5.37 CPU)
70:  6 wallclock secs ( 5.27 usr +  0.09 sys =  5.36 CPU)
80:  6 wallclock secs ( 5.27 usr +  0.10 sys =  5.37 CPU)
90:  6 wallclock secs ( 5.26 usr +  0.10 sys =  5.36 CPU)
100:  5 wallclock secs ( 5.26 usr +  0.10 sys =  5.36 CPU)

Problem solved... id is your unique tag. It will be done in about 1.5
hours on my MacBook Pro.


--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones

Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.

My schema looks as follows:

CREATE TABLE rawfen ( fen VARCHAR(80) );
CREATE INDEX rawfen_idx_fen ON rawfen(fen);

Unforuntately, data loading this table takes virtually forever.  After 24
hours, its not finished, and that is inserting rows in transactions of
100,000 rows per transaction.

I tried dropping the index, and building it after row insertion.

That has two problems.  First, since I have no index, I can't guarantee
string uniqueness (and I'd like to).

Second, it still doesn't solve my speed problem.   Insertion without the
index takes a little over an hour, but the index creation never finishes. 
Well, I gave it 6 hours and it was unclear if it was making any progress.

I've read elsewhere that this is a data locality issue, which certainly
makes sense.

And in those threads, a suggestion has been made to insert in sorted order. 
But it's unclear to me exactly what the sorting function would need to be -
it's likely my sorting function (say strcmp()) wouldn't match what sqlite
expects.  It's also a little unclear if I can even sort this many keys
externally with any ease.  After all, that's why I'm trying to use a
database.

Last, is this something that is likely to affect me if I port over to
another database?  Do others, say mySQL for instance, handle this better?

Does anyone have any suggestions?Upgrading my server isn't an option -
it already has 4Gig of memory and very fast disks in a RAID-5 configuration
(with a 512MB cache on the raid controller).

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9604705
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] beginner's questions: atexit problem

2007-03-21 Thread Joe Wilson
Are you mixing Cygwin and MinGW libraries and/or header files?

--- timm2 <[EMAIL PROTECTED]> wrote:
> Using dev-c++ 4.9.9.2-wx-beta-6.8 with mingw 3.4.2 I have got the error:
> 
> sqlite3.A(.text+0x44):fake: multiple definition of `atexit'
> C:/../lib/gcc/mingw32/3.4.2/../../../crt2.o(.text+0x260):crt1.c: first 
> defined here
> 
> I did not find helpfull hint in archive of conference. 
> Could you help me?
>  Thanks for any hint,
> Tim




 

Need Mail bonding?
Go to the Yahoo! Mail Q for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list=396546091

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] beginner's questions: atexit problem

2007-03-21 Thread timm2
Hello,

I would like to try SQLite database, but I am not able to compile quickstart 
example.

Using dev-c++ 4.9.9.2-wx-beta-6.8 with mingw 3.4.2 I have got the error:

sqlite3.A(.text+0x44):fake: multiple definition of `atexit'
C:/../lib/gcc/mingw32/3.4.2/../../../crt2.o(.text+0x260):crt1.c: first defined 
here

I did not find helpfull hint in archive of conference. 
Could you help me?
 Thanks for any hint,
Tim


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] data type problem

2007-03-21 Thread Dennis Cote
[EMAIL PROTECTED] wrote:
> but if you create a table using:
>
> CREATE TABLE XTollData (
>   DutyID char (32) NOT NULL ,
>   CarNumber char (10) NULL 
>  );
>
> SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'.
> That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat 
> 'char(32)' as 'char(32)'.
> I think this IS a bug.
>   
I did some checking and standard SQL:1999 does not allow the data type
(i.e. the char in your definition) in a column definition to be quoted.
Therefore the column definitions given in your original post are not
valid SQL syntax.

You should use

CREATE TABLE "XTollData" (
"DutyID" char(32) NOT NULL ,
"CarNumber" char(10) NULL
);

or if you must use the non-standard square bracket quotes

CREATE TABLE [XTollData] (
[DutyID] char(32) NOT NULL ,
[CarNumber] char(10) NULL
);

To define your table using valid SQL (or extended SQL) syntax.

SQLite accepts a wide variety of nonsense as the column data type
because it is generally ignored internally since SQLite uses dynamic
typing. While I think it should return whatever it accepts as the
column's declared type, it really shouldn't accept the syntax you used
to define your table. It should have given you a syntax error on your
create statement.

I have copied the pertinent sections from the standard syntax specs below.

 ::=

{  |  }
[  ]
[  ]
[ ... ]
[  ]

 ::=


 ::=

 ::=

| 

 ::=
  
 ::= ...
 ::=

| 
 ::= !! See the Syntax Rules
 ::= "" !! two consecutive double quote characters

 ::=

| 
| 
| 
| 
 ::=
 [ CHARACTER SET  ]
| 
| 
| 
| 
| 
| 
| 
 ::=
CHARACTER []
| CHAR []
| CHARACTER VARYING   
| CHAR VARYING   
| VARCHAR   
| CHARACTER LARGE OBJECT []
| CHAR LARGE OBJECT []
| CLOB []

HTH
Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] data type problem

2007-03-21 Thread Dennis Cote
[EMAIL PROTECTED] wrote:
> but if you create a table using:
>
> CREATE TABLE XTollData (
>   DutyID char (32) NOT NULL ,
>   CarNumber char (10) NULL 
>  );
>
> SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'.
> That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat 
> 'char(32)' as 'char(32)'.
> I think this IS a bug.
> A Delphi compoenent(ASGSQLite3) use SQLite3_Column_decltype to determine the 
> data type and data size of a field. And this "feature" makes it get the wrong 
> size of '[char](32)' (as 1 byte), and get the right size of 'char(32)' (as 32 
> bytes).
>
>   
I agree that what you have found is a bug, and you should report it by
clicking the bugs link at http://www.sqlite.org/.

In the mean time there is a workaround, and that is to eliminate the
unnecessary square bracket quoting on your column type names (i.e. use
char and not [char]).

Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Different IN behavior from 3.2.x to 3.3.x

2007-03-21 Thread drh
Paolo Vernazza <[EMAIL PROTECTED]> wrote:
> Try this with an old 3.2.x sqlite version
> 
> SQLite version 3.2.8
> Enter ".help" for instructions
> sqlite> CREATE Table test (a, b, c);
> sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3');
> sqlite> SELECT * FROM test WHERE a||b||c = '123';
> 1|2|3
> sqlite> SELECT * FROM test WHERE a||b||c IN ('123');
> 1|2|3
> sqlite>
> 
> Now try with sqlite 3.3.13
> SQLite version 3.3.13
> Enter ".help" for instructions
> sqlite> CREATE Table test (a, b, c);
> sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3');
> sqlite> SELECT * FROM test WHERE a||b||c = '123';
> 1|2|3
> sqlite> SELECT * FROM test WHERE a||b||c IN ('123');
> sqlite>
> 
> 
> Is that a sqlite bug? Am I doing something wrong?
> How should I change my query to have the old behavior?
> 

See ticket #2248 and check-in [4656]

  http://www.sqlite.org/cvstrac/tktview?tn=2248
  http://www.sqlite.org/cvstrac/chngview?cn=3656

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Different IN behavior from 3.2.x to 3.3.x

2007-03-21 Thread Paolo Vernazza

Try this with an old 3.2.x sqlite version

SQLite version 3.2.8
Enter ".help" for instructions
sqlite> CREATE Table test (a, b, c);
sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3');
sqlite> SELECT * FROM test WHERE a||b||c = '123';
1|2|3
sqlite> SELECT * FROM test WHERE a||b||c IN ('123');
1|2|3
sqlite>

Now try with sqlite 3.3.13
SQLite version 3.3.13
Enter ".help" for instructions
sqlite> CREATE Table test (a, b, c);
sqlite> INSERT into test (a,b,c) VALUES ('1', '2', '3');
sqlite> SELECT * FROM test WHERE a||b||c = '123';
1|2|3
sqlite> SELECT * FROM test WHERE a||b||c IN ('123');
sqlite>

But it works if values are not numbers:
sqlite> INSERT into test (a,b,c) VALUES ('1a', '2b', '3c');
sqlite> SELECT * FROM test WHERE a||b||c = '1a2b3c';
1a|2b|3c
sqlite> SELECT * FROM test WHERE a||b||c IN ('1a2b3c');
1a|2b|3c
sqlite>

Is that a sqlite bug? Am I doing something wrong?
How should I change my query to have the old behavior?

Thanks

Paolo


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] data type problem

2007-03-21 Thread qinligeng
but if you create a table using:

CREATE TABLE XTollData (
  DutyID char (32) NOT NULL ,
  CarNumber char (10) NULL 
 );

SQLite3_Column_decltype will return the result of 'DutyID' as 'char(32)'.
That is, SQLite3_Column_decltype treat '[char](32)' as 'char', treat 'char(32)' 
as 'char(32)'.
I think this IS a bug.
A Delphi compoenent(ASGSQLite3) use SQLite3_Column_decltype to determine the 
data type and data size of a field. And this "feature" makes it get the wrong 
size of '[char](32)' (as 1 byte), and get the right size of 'char(32)' (as 32 
bytes).

- Original Message - 
From: <[EMAIL PROTECTED]>
To: 
Sent: Tuesday, March 20, 2007 7:48 PM
Subject: Re: [sqlite] data type problem


<[EMAIL PROTECTED]> wrote:
> if you create a table use following statement (script generated from MS SQL 
> Server 2000)
> CREATE TABLE [XTollData] (
>  [DutyID] [char] (32) NOT NULL ,
>  [CarNumber] [char] (10) NULL 
> );
> 
> SQLite3_Column_decltype will treat DutyID as data type 'char' but not 
> 'char(32)'


This is a feature, not a bug.  If you want the database to enforce
a maximum string length, then add

   CHECK( length(DutyID)<=32 )

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-