To answer your question on compound index. It just means to define an index
which includes more than one column. eg
**
CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT)
CREATE INDEX i1_2_4 ON t1(c1, c4, c2)
**
HTH,
Mamta


On 2/6/07, Nurullah Akkaya <[EMAIL PROTECTED]> wrote:


On Feb 6, 2007, at 11:25 AM, Michael Segel wrote:

> Sorry to top post, on my crackberry...
>
> I think you missed my point.
> Select the count of your documents that use the word 'the'.
>
> Ok so let's say that you want to search for all of the documents
> that use the word 'the'.
> You first lookup the integer representation of the word. Let's say
> that its = 100.
>
> How many times is the value 100 going to be in your index?
that varies with the document set with 2 million documents i have
around 2.5 million 'the' entries.

>
> Ok?
>
> But to your other point... You see that your data is not
> contiguous. Hmmm ok,so assuming that your primary index is wordID,
> how do you handle documents that have multiple words? So if you
> search on 'the' you'll get one set of data and if you then search
> on the wordID for 'is', you'll have data that isn't in sort order
> on the disk.
assume the following ids.
the -> 100
is -> 150
101 -> linux

i want my tables to be sorted like the following. not just the word
the but all id's are sorted
100
100
100
100
100
100
101
101
101
101
150
150
150

from my knowledge of databases they sorted in random order thus we
have indexes pointing where the data is. from the upper example i am
going to read one big chunk of data from the disk but in the bottom
example i will read 100 then jump a buch of records and read next.

100
101
150
101
101
100
150

where can i learn more about the compound index. create index
statement in ref manual doesn't mention it?

>
> Now here's something that may help,
> Drop all of your indexes and create a single compound index where
> the first field is wordID.
>
> That may help you out...
>
>
> Sent via BlackBerry.
>
> -Mike Segel
> Principal
> MSCC
> 312 952 8175
>
>
> -----Original Message-----
> From: Nurullah Akkaya <[EMAIL PROTECTED]>
> Date: Tue, 6 Feb 2007 11:14:02
> To:Derby Discussion <derby-user@db.apache.org>
> Subject: Re: keeping the table ordered
>
> It is not quite clear to me what you are trying to achieve.  Why do
> you want a sequential read?  Scanning the entire table of 100
> million records should take longer time than looking up a record
> using a index on wordid.  Have you retrieved the query plan and
> made sure the index on wordid is used?  Or are you talking about
> doing a lookup of many different wordids in sorted order?
>
>
>
> i did not meant sequential scanning of the whole table i meant disk
> i/o( bottom paragraph explains it )
> yes i checked the query plan and derby uses index to lookup records
> and index look up checks only two index pages. so i came to the
> conclusion that most of the time is lost making random i/o request
> for the data thats why i am trying to keep the table sorted. since
> sequential hard disk access is much faster than random i/o .
>
>
>
>
>
>
> On Feb 6, 2007, at 8:09 AM, Michael Segel wrote:
>
>
>
>
>
>
>
>
> What exactly are you trying to do?
> Based on the little snippet, it looks like this is an exercise to
> create a
> "google like" search on a series of documents.
>
>
> The problem is that your wordID, while an integer, is not going to
> be unique
> enough.
>
>
>
> wordId isn't unique at all each word in a document gets a
> corresponding posting entry i look up wordId for the word the then
> select all docId's containg the wordId. that posting list is
> basicly a big inverted list. what i am trying to do is keep the
> table sorted by wordId so insted of keeping values randomly on disk
> they are being written sequentialy to the file so that instead of
> doing random i/o i just do a sequential read from the hard drive. i
> don't want  sequential scanning of the whole table.
>
>
>
>
>
> For example, search your documents where the wordID is the integer
> look up for
> the word "the".
>
>
> Do you see the problem?
>
>
> --
> --
> Michael Segel
> Principal
> Michael Segel Consulting Corp.
> derby [EMAIL PROTECTED]: <mailto:[EMAIL PROTECTED]>
> (312) 952-8175 [mobile]
>
>
>
>
>
>
>
>
>
>


Reply via email to