Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Chris Jones

Ah yes, I should read more carefully :)  

Thanks, right, I was actually guaranteeing uniqueness originally by just
fetching and then inserting only if there wasn't a match (I needed a rowid
if the row existed anyway).  Now I'm guaranteeing uniqueness by letting sort
do the work for me, but similarly to my last response, it probably makes
sense for me to add it to my schema as a sanity check if nothing else.

Thanks,
Chris



Derrell.Lipman wrote:
> 
> Chris Jones <[EMAIL PROTECTED]> writes:
> 
>> Derrell.Lipman wrote:
> 
> 
> So to guarantee that the *strings* are unique, you need a UNIQUE index on
> the
> string field.  The ROWID is the INTEGER PRIMARY KEY whether you specify a
> different name for it or not, but that will not guarantee that each of
> your
> *strings* is unique.  Only a UNIQUE index on the string field will do
> that.
> 
> As long as you understand this, no need to reply.
> 
> Cheers,
> 
> Derrell
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9626741
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index creation on huge table will never finish.

2007-03-22 Thread Chris Jones

Thanks everyone for your feedback.

I ended up doing a presort on the data, and then adding the data in order.  
At first I was a little concerned about how I was going to implement an
external sort on a data set that huge, and realized that the unix "sort"
command can handle large files, and in fact does it pretty efficiently.

So, I did a "sort -u -S 1800M fenout.txt > fenoutsort.txt"

The sort took about 45 minutes, which is acceptable for me (it was much
longer without the -S option to tell it to make use of more memory), and
then loading the table was very efficient.  Inserting all the rows into my
table in sorted order took only 18 minutes.  

So, all in all, I can now load the table in just about an hour, which is
great news for me.

Thanks!
Chris

-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9618709
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones

I don't think that solves my problem.  Sure, it guarantees that the IDs are
unique, but not the strings.  

My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identifiers.

In your solution, there is no way to lookup a string to see if it already
exists, since there is no index on the string.

Thanks,
Chris
-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9607996
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Index creation on huge table will never finish.

2007-03-21 Thread Chris Jones

Hi all,

I have a very simple schema.  I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.

The problem is I have 112 million of them.

My schema looks as follows:

CREATE TABLE rawfen ( fen VARCHAR(80) );
CREATE INDEX rawfen_idx_fen ON rawfen(fen);

Unforuntately, data loading this table takes virtually forever.  After 24
hours, its not finished, and that is inserting rows in transactions of
100,000 rows per transaction.

I tried dropping the index, and building it after row insertion.

That has two problems.  First, since I have no index, I can't guarantee
string uniqueness (and I'd like to).

Second, it still doesn't solve my speed problem.   Insertion without the
index takes a little over an hour, but the index creation never finishes. 
Well, I gave it 6 hours and it was unclear if it was making any progress.

I've read elsewhere that this is a data locality issue, which certainly
makes sense.

And in those threads, a suggestion has been made to insert in sorted order. 
But it's unclear to me exactly what the sorting function would need to be -
it's likely my sorting function (say strcmp()) wouldn't match what sqlite
expects.  It's also a little unclear if I can even sort this many keys
externally with any ease.  After all, that's why I'm trying to use a
database.

Last, is this something that is likely to affect me if I port over to
another database?  Do others, say mySQL for instance, handle this better?

Does anyone have any suggestions?Upgrading my server isn't an option -
it already has 4Gig of memory and very fast disks in a RAID-5 configuration
(with a 512MB cache on the raid controller).

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Index-creation-on-huge-table-will-never-finish.-tf3444218.html#a9604705
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-