RE: Intermedia Performance Benchmarks anyone ?

Christopher Spence Thu, 04 Oct 2001 19:46:36 -0700

 I looked into SSD, they are like $15-17k / gb.  Very expensive.
And with Oracle buffering, I would expect the performance wouldn't be huge.
There were some solutions that were like $2500 for a 1 gb, but they were not
sharable between machines in a cluster.


-----Original Message-----
To: Multiple recipients of list ORACLE-L
Sent: 10/4/01 6:35 PM

*excellent* post. thanks.

Anyone out there put the indexes and tables on solid state disk? They
have ssd up to about 10G and higher, I hear....just curious, not trying
to invoke a global listserv discussion on how it "can't work" or
"wouldn't
be worth it, especially on microsoft platforms", etc. 

It would be neat to hear about an InterMedia indexing miracle. This
really
neat tool just sounds WAAAAAAY to slow to scale at this point, which
answers
a pet question of mine. (Something like "Why do services like 'Ask
Jeeves' 
suck so hard?")

In Love and Peas, 

etc. 

-----Original Message-----
Sent: Thursday, October 04, 2001 5:47 PM
To: Multiple recipients of list ORACLE-L


Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB
documents (up to 5KB each).  We're on 8.1.6.0.0 under Win2k - 2 550MHz
CPUs,
2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents
spread
across 5 tables (on 6 drives).  One is a 2 partition table (each
partition
on its own drive) containing the current two months of docs, the other 4
hold the 4 prior months' docs.  We can query the entire 6 months of docs
via
a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets
turned
into a table (segment exchange).  The interMedia Text indexes on the
partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index
rebuild
takes about 7 hours - that's 7 hrs. for the index on the prior month and
7
more hours for the index on the partitioned table, which only contains
one
month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every
morning.  Last night we added about 200,000 docs and it took about 3
hours
for the index to resync.  We don't use ctxsrv, but use
CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really
slows
down.  The in-memory part still happens at about 150 docs/sec, but when
interMedia writes to disk it slows down a bunch.  What took 3 hours
today
will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple
drives by
spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the
DR$<>$R
segment (use CACHE not CACHE READS, due to bugs in Oracle):

  Alter Table DR$<YourIndexName>$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own
segment(s) on drive(s) separate from the "regular" data.  Make sure that
your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify
tablespaces on their own drives to spread the I/O out even further.
We're
getting 2GB more RAM on a new server, so I plan on caching the 900MB
DR$<>$X
segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds
of
queries by watching disk I/O on Win2k's Performance Monitor while I
issue
various "flavors".  Our folks use lots of complex query terms with heavy
use
of the Stemmer.  I've gotten them to switch from using tons of ORs to
using
the Equivalence operator and we're getting much better results using
NEAR
than simple ANDs.  Performance is very good, with CONTAINS queries
returning
results in less than a second for terms that are rare in the docs, up to
a
minute for terms that are common in lots (e.g. hundreds of thousands) of
docs.

If you're going to do synonym searches, you'd better start looking for a
good thesaurus - the one Oracle ships is pretty limited.  We've not
found a
good one for the technical lingo our docs contain, so we don't do ABOUT
queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things
like
what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack

--------------------------------
Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
[EMAIL PROTECTED]
(512)327-9068


-----Original Message-----
Kendall
Sent: Thursday, October 04, 2001 10:00 AM
To: Multiple recipients of list ORACLE-L


Hello all,

Although I have installed Intermedia as part of my general DBA duties
before
I have not experienced any particular requirements on throughput rate or
indexing.

I need some information on being able to deal with large volumes of
product
data (e.g. 1 million products in a retail application) and be able to
perform 'intelligent' searches against the metadata (things like
typographical error matching, synonyms etc.) as well as the more usual
parametric search (i.e. advanced search page with lots of metadata
specific
fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Mohan, Ross
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Christopher Spence
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Intermedia Performance Benchmarks anyone ?

Reply via email to