Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Martin Kendall

Hello all,

Although I have installed Intermedia as part of my general DBA duties before
I have 
not experienced any particular requirements on throughput rate or indexing.

I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields). 

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Martin Kendall
  INET: [EMAIL PROTECTED]

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).



RE: Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Koivu, Lisa
Title: RE: Intermedia Performance Benchmarks anyone ?





Hi Martin, 


I've had to implement intermedia in the past.  with 1 million records intermedia will do the job in an acceptable amount of time.  

However, when testing it I found that the more advanced features I used (fuzzy, etc.) the worse performance became.  It works well if you are doing a straightforward search.  

I also found there were problems with creating the library on HP/UX (part of the install).  I had to create it manually.  

There are also tricks for indexing more than one column in one index.  The documentation says you can't, but it can be done.  I know there are people on the list that have done this. 

I didn't run any formal benchmarks.  We had indexed ~6million records on long fields (like description) and performance was directly related to how selective the keywords were.  

Indexing time - If I remember right, creating one index on one of these description columns ran in ~2 hours.  I also batched my updates.  (I didn't run ctxsrv)  The index ended up being ~1.5GB. 

Hope this helps and let me know if you have any other questions, I'll try and answer them.  


Lisa Koivu
Oracle Database Administrator
Fairfield Resorts, Inc.
954-935-4117



-Original Message-
From:   Martin Kendall [SMTP:[EMAIL PROTECTED]]
Sent:   Thursday, October 04, 2001 11:00 AM
To: Multiple recipients of list ORACLE-L
Subject:    Intermedia Performance Benchmarks anyone ?


Hello all,


Although I have installed Intermedia as part of my general DBA duties before
I have 
not experienced any particular requirements on throughput rate or indexing.


I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields). 


Indexing time and max throughput are also of interest.


Any data based on experience would be appreciated.


Thanks


Martin
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Martin Kendall
  INET: [EMAIL PROTECTED]


Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California    -- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).





RE: Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Jack C. Applewhite

Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents spread
across 5 tables (on 6 drives).  One is a 2 partition table (each partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and 7
more hours for the index on the partitioned table, which only contains one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3 hours
for the index to resync.  We don't use ctxsrv, but use CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.  We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds of
queries by watching disk I/O on Win2k's Performance Monitor while I issue
various "flavors".  Our folks use lots of complex query terms with heavy use
of the Stemmer.  I've gotten them to switch from using tons of ORs to using
the Equivalence operator and we're getting much better results using NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries returning
results in less than a second for terms that are rare in the docs, up to a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED]

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).



RE: Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Mohan, Ross

*excellent* post. thanks.

Anyone out there put the indexes and tables on solid state disk? They
have ssd up to about 10G and higher, I hearjust curious, not trying
to invoke a global listserv discussion on how it "can't work" or "wouldn't
be worth it, especially on microsoft platforms", etc. 

It would be neat to hear about an InterMedia indexing miracle. This really
neat tool just sounds WAAY to slow to scale at this point, which answers
a pet question of mine. (Something like "Why do services like 'Ask Jeeves' 
suck so hard?")

In Love and Peas, 

etc. 

-Original Message-
Sent: Thursday, October 04, 2001 5:47 PM
To: Multiple recipients of list ORACLE-L


Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents spread
across 5 tables (on 6 drives).  One is a 2 partition table (each partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and 7
more hours for the index on the partitioned table, which only contains one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3 hours
for the index to resync.  We don't use ctxsrv, but use CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.  We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds of
queries by watching disk I/O on Win2k's Performance Monitor while I issue
various "flavors".  Our folks use lots of complex query terms with heavy use
of the Stemmer.  I've gotten them to switch from using tons of ORs to using
the Equivalence operator and we're getting much better results using NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries returning
results in less than a second for terms that are rare in the docs, up to a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack

RE: Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Jack C. Applewhite

Ross,

I disagree that interMedia Text is "way too slow to scale".  Our experience
has convinced us that I/O bottlenecks are the main performance killers with
large interMedia Text indexes.  The problem is that it takes some experience
to find out how this "special" kind of index is structured (6 or 8 separate
table and index segments per "index") and how it behaves.  As usual, the
Oracle docs are pitifully inadequate - you've gotta search through TechNet
and MetaLink for details and bug workarounds (like CACHE instead of CACHE
READS for DR$<>$R).

Caching the DR$<>$R segment helped immensely and I can see that when pieces
of the DR$<>$X index are cached, queries with terms in those pieces are
lightening fast.  I am betting that when I spread the DR$<>$I table across
multiple drives, instead of the single drive ours is currently on, we'll see
much better performance of NEAR queries (which depend on the word position
info. there), as well as faster index resyncs.

In 9i Domain Indexes become partitionable, so I'm looking forward (in about
a year - experiences with 6.0, 7.0, 8.0 and 8.1.5 have made me wary) to
putting our 6 (or more) months of docs into one partitoned table.  There may
be other I/O distributing kinds of enhancements by then, as well.  For sure
I'll have explored every trick I can think of!   ;-)

With more drives and a bit more RAM, I think we can handle 10 million docs
per month (60 million total online), even on our lil' ol' Win2k box.  That's
just x3 to x4 of what we do now.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Sent: Thursday, October 04, 2001 5:36 PM
To: Multiple recipients of list ORACLE-L


*excellent* post. thanks.

Anyone out there put the indexes and tables on solid state disk? They
have ssd up to about 10G and higher, I hearjust curious, not trying
to invoke a global listserv discussion on how it "can't work" or "wouldn't
be worth it, especially on microsoft platforms", etc.

It would be neat to hear about an InterMedia indexing miracle. This really
neat tool just sounds WAAY to slow to scale at this point, which answers
a pet question of mine. (Something like "Why do services like 'Ask Jeeves'
suck so hard?")

In Love and Peas,

etc.


-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED]

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).



RE: Intermedia Performance Benchmarks anyone ?

2001-10-04 Thread Christopher Spence

 I looked into SSD, they are like $15-17k / gb.  Very expensive.
And with Oracle buffering, I would expect the performance wouldn't be huge.
There were some solutions that were like $2500 for a 1 gb, but they were not
sharable between machines in a cluster.

-Original Message-
To: Multiple recipients of list ORACLE-L
Sent: 10/4/01 6:35 PM

*excellent* post. thanks.

Anyone out there put the indexes and tables on solid state disk? They
have ssd up to about 10G and higher, I hearjust curious, not trying
to invoke a global listserv discussion on how it "can't work" or
"wouldn't
be worth it, especially on microsoft platforms", etc. 

It would be neat to hear about an InterMedia indexing miracle. This
really
neat tool just sounds WAAY to slow to scale at this point, which
answers
a pet question of mine. (Something like "Why do services like 'Ask
Jeeves' 
suck so hard?")

In Love and Peas, 

etc. 

-Original Message-
Sent: Thursday, October 04, 2001 5:47 PM
To: Multiple recipients of list ORACLE-L


Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz
CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents
spread
across 5 tables (on 6 drives).  One is a 2 partition table (each
partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs
via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets
turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index
rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and
7
more hours for the index on the partitioned table, which only contains
one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3
hours
for the index to resync.  We don't use ctxsrv, but use
CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really
slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours
today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple
drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the
DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.
We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB
DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds
of
queries by watching disk I/O on Win2k's Performance Monitor while I
issue
various "flavors".  Our folks use lots of complex query terms with heavy
use
of the Stemmer.  I've gotten them to switch from using tons of ORs to
using
the Equivalence operator and we're getting much better results using
NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries
returning
results in less than a second for terms that are rare in the docs, up to
a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not
found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things
like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties
before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of
product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographica

RE: Intermedia Performance Benchmarks anyone ?

2001-10-05 Thread Martin Kendall

Jack,

Thanks for your time on this.  Most revealing and useful for what I have
ahead of me

Kind regards from the UK.

Martin

-Original Message-
Sent: 04 October 2001 22:47
To: Multiple recipients of list ORACLE-L


Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents spread
across 5 tables (on 6 drives).  One is a 2 partition table (each partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and 7
more hours for the index on the partitioned table, which only contains one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3 hours
for the index to resync.  We don't use ctxsrv, but use CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.  We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds of
queries by watching disk I/O on Win2k's Performance Monitor while I issue
various "flavors".  Our folks use lots of complex query terms with heavy use
of the Stemmer.  I've gotten them to switch from using tons of ORs to using
the Equivalence operator and we're getting much better results using NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries returning
results in less than a second for terms that are rare in the docs, up to a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED]

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name

RE: Intermedia Performance Benchmarks anyone ?

2001-10-05 Thread Mario Alberto Ramos Arellano

Excellent doc.

I just wonder if there are only cache related bugs on using interMedia.

Mario Alberto Ramos

>>> [EMAIL PROTECTED] 04/10/01 16:47 >>>
Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents spread
across 5 tables (on 6 drives).  One is a 2 partition table (each partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and 7
more hours for the index on the partitioned table, which only contains one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3 hours
for the index to resync.  We don't use ctxsrv, but use CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.  We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds of
queries by watching disk I/O on Win2k's Performance Monitor while I issue
various "flavors".  Our folks use lots of complex query terms with heavy use
of the Stemmer.  I've gotten them to switch from using tons of ORs to using
the Equivalence operator and we're getting much better results using NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries returning
results in less than a second for terms that are rare in the docs, up to a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com 
[EMAIL PROTECTED] 
(512)327-9068


-Original Message-
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata specific
fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com 
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED] 

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP 

RE: Intermedia Performance Benchmarks anyone ?

2001-10-05 Thread Jack C. Applewhite

Mario,

Yes.

In 8.1.6.0.0 on Win2k you can't use the SCORE() operator to return the
actual document score produced by the CONTAINS operator when the query is
executed against a Union View.  That prevents us from ordering a result set
by the Score() when we query our 6 Month View - a minor annoyance.

However, SCORE() works just fine against interMedia Text indexed tables -
either "regular" or partitioned.

That's the only other real bug I can remember that we've run up against.

Jack


Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-Original Message-
Sent: Friday, October 05, 2001 11:23 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]


Excellent doc.

I just wonder if there are only cache related bugs on using interMedia.

Mario Alberto Ramos


-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED]

Fat City Network Services-- (858) 538-5051  FAX: (858) 538-5051
San Diego, California-- Public Internet access / Mailing Lists

To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).