Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

Michael McCandless Fri, 09 May 2008 08:18:44 -0700


CC'ing java-dev...


Marcelo and I have been iterating / drilling down on this odd
exception when flushing a segment.  Lucene is running embedded in
Oracle, using Oracle's JVM.

After adding numerous prints, it looks as if DocumentWriter's
quickSort suddenly stops working correctly.  We pass in an array of
distinct Posting[] instances, and then somehow after quickSort is
done, the array is weirdly messed up: sometimes the same Posting
instance occurs twice, the order is not quite sorted (yet close).
Very weird.

Marcelo has a single doc which when indexed once, works, then when
indexed again, fails.  A standalone test with just that doc always
works.

It very much seems like a JRE bug at this point.

Marcelo, in one similar issue on JRE 1.6.0_04 and 1.6.0_05, adding
-Xbatch or -Xint sometimes worked around the issue.  I don't know if
you're able to try such parameters with Oracle but it may be worth a
shot if so?

Mike

Marcelo Ochoa wrote:

Michael:
  I have noticed that the differences between 1) and 2) step is that
DocumentWriters class is JITed.
  So If its true is very bad news :( because Oracle will not patch de
code so fast as Lucene dev team do.
  I'll test with latest code and send the .trc file again.
  Also I can test Lucene 2.3.1 with Oracle 10g which is based on
jdk1.4.2 and do not use JIT strategies, instead of JIT its possible to
translate Java to .so libraries out off the box.
  Best regards, Marcelo.

On Fri, May 9, 2008 at 9:58 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:


That is VERY strange about hitting exception only on 2nd time.  I'll
ponder...

In the meantime, can you run this one?  Just more prints...

Mike





On May 9, 2008, at 8:10 AM, Marcelo Ochoa wrote:


Michael:
 I have applied the patch manually and it was similar :)
 But anyway, here the .trc file made with your working copy.
 Note this:
  1) If I uploaded a new version of DocumentsWriter.class to the DB,
index creation works OK.
  2) If I drop the index and create it again, it fail.
 Is DocumentWriters class using an static variable which is not
cleared from the step 1 to 2?

Oracle JVM uses an special garbage collector which collectdiferents

objects from one execution to other.
 Best regards, Marcelo.

On Fri, May 9, 2008 at 8:37 AM, Marcelo Ochoa<[EMAIL PROTECTED]>

wrote:


Michael:
 Could you send to me your working copy of DocumentsWriter class?
 Something is different in my Lucene 2.3.1 download file and the
patch did not work automatically :(
 Marcelo.

On Fri, May 9, 2008 at 6:29 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:


That is great, that you narrowed it down to a single doc!

Alas, when I build a standalone test case indexing that onedoc, itindexes fine on Lucene 2.3.1, using JRE 1.5.0_13 on Mac OS X10.4, and

1.6.0_05 on Debian Linux.

I like that you uncommented the addPosition output; I did thesame inmy run, and diffed my log with yours, and they are identical,up until

when the flush occurs.

Here's what I know so far. When you hit the issue, somethinghas gonehorribly wrong with the sorting of the Posting[] by term text.EG the

first 3 terms in your flush are these:

 term=aampmdgjuyng
 term=aamrkamls
 term=aampqam

Already that's out of order (the 2nd & 3rd terms should beswapped).


Then the next 2 terms are the exact same Posting instance, which
should be impossible!  (Search for
[EMAIL PROTECTED] in your
test_ora_15649.trc).

Lucene calls sortPostings() to do this sort. That method firstcalls

compactPostings() to collapse out all the null entries in the hash
table, in-place.  Then doPostingsSort, which calls quickSort, is
called.

The original hash table has unique Posting instances.  Ie, there
should be no dups in there (the same Posting instance is stored in

only one place). I'm pretty sure that's the case in your run,based

on the prints from addPosition.

So, compacting it should not produce any dups, either.

And, neither should quickSort. I'm very much at a loss as towhat's

going on here.

OK I added more prints, attached the patch, showing the Posting[]
before & after compacting, and after sorting.  Can you apply this

patch & get the single document to fail again & send me theoutput?

Thanks!

Mike




Marcelo Ochoa wrote:


Well here a full .trc file of the row which causes the exception.
Best regards, Marcelo.

On Thu, May 8, 2008 at 9:08 PM, Marcelo Ochoa<[EMAIL PROTECTED]>

wrote:


Hi Michael:

I have identified a problematic doc, the problem is notdepending of

the RAM usage, is a consequence of data included into the doc.

Using batch mode indexing of Lucene Domain Index I havereduced by

partitioning in batch of 115, 10, and 1 row indexing.

So attached is .trc file of an index creation operation in atable of

one row.
 Tomorrow morning I'll create the index including your trace

information, because I commented the system to getsmaller .trc files.

 Best regards, Marcelo.
On Thu, May 8, 2008 at 2:20 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

Hmm, somewhat spooky. This time it crashed when flushing adifferent

term.

Is this DB changing? Or maybe the order of the docschange? Or,

any
other
source of change that could explain this shift in the bug?

Is it possible to turn on assertions for org.apache.lucene.*and

re-run?

I'm very curious to see if you trip any of the existingasserts.


Mike

Marcelo Ochoa wrote:


Michael:
 Here a new dump with latest debugging information:
http://www.exa.unicen.edu.ar/temp/test_ora_8974.trc.gz
 Best regards, Marcelo.

On Thu, May 8, 2008 at 9:23 AM, Marcelo Ochoa
<[EMAIL PROTECTED]>
wrote:


Michael:
 I am in Argentina, GMT-3:00.
 It's 9:21am
 But I am starting my working day at the University at 1pm :)

During the morning I am working with other projects in myhouse.

 Best regards, Marcelo.

 On Thu, May 8, 2008 at 9:16 AM, Michael McCandless


<[EMAIL PROTECTED]> wrote:


 OK, thanks!

Which country/timezone are you on? I'm in the US, onthe East

coast.
 It's
8:16 AM local time now.

Well we're not sure whether it's a Lucene or an OracleJVM issue

just
yet... but whatever it is, it's a doozie!

 Mike



 On May 8, 2008, at 8:07 AM, Marcelo Ochoa wrote:

Hi MIchael:
 Sorry for not replying the email yesterday.
 We have differentes country timezone :)
 I can test your code to compare the differences but
Oracle-Lucene
integration runs inside Oracle JVM:

http://download.oracle.com/docs/cd/B28359_01/java.111/b31225/chone.htm#BABJCDIE

Oracle JVM in a 11g database is JDK1.5 certified stackbut with
some
differences, for example memory and Thread manager areintegrated
with
the database one and all JDBC operations are in a localmemory
space.
So I need to reproduce the problem inside the JVM, maybe I got
an
Oracle JVM bug :(
I have patched the code and have created a new dumpfile, I'll
put
at our web site in four hour, now I am at home and my DSL
connection
doesn't have a good uplink :(
 Best regards, Marcelo.

On Wed, May 7, 2008 at 7:40 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
OK I parsed out your log and made a standalone, ratherhacked
up,
Java
program that attempts to reproduce the exception, but,it runs
to


completion

just fine :(

 I'm attaching the sources as a zip file.
One strange difference is I never fill up my RAMbuffer, ie, my
program
only flushes when I close the writer. I'm settingbuffer to 53
MB
too


so I

expected it to flush.  I'm not sure why that difference is
there.
Could you look over the sources and see if it properlymimics
what
your

are

doing inside Oracle? I'm pretty sure I'm getting allthe docs
because


if I


uncomment that Sys.stdout.println then diff my output from
yours,
all
of

the

"FINE: Document" lines are accounted for.
Can you try running this ("javac *.java; javaIndexOraMain")
and
see
if

you

can get the crash?
One difference is I'm running with Sun's JVM (I tried1.5 &
1.6).
 I've
tried on trunk, 2.3.x trunk, and 2.3.1. In runs finein all


cases...maybe I


am missing something?

 Mike




 On May 7, 2008, at 5:32 PM, Marcelo Ochoa wrote:

Great.
If you have more info just drop me an email.
Best regards, Marcelo.
On Wed, May 7, 2008 at 6:22 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:


 Excellent, I pulled this down!

 I'll turn this into a standalone test.

 Thanks!

 I don't need your Lucene index, just yet anyway.

 Mike



 On May 7, 2008, at 3:52 PM, Marcelo Ochoa wrote:

Hi Michael:
 Sorry for the long delay.
I have tested when the exception is raised based onhow many
rows

are

indexed.
 I found that up to 101000 rows the index is created
correctly.
So the .trc file was generated trying to create anindex on
a


table

with 102000 rows. Here a link to download:
http://www.exa.unicen.edu.ar/temp/test_ora_18656.trc.gz
As you can see one Field at Lucene document havethe Oracle


rowid,

this field is a primary key so there will be alleast one
entry

for


each document indexed.
 If you want I can make an export of the Lucene index
structure
as
traditional Lucene filesystem format, I used this
functionality
to

see

the index with Luke, remeber that Lucene-Oracleintegration
uses


BLOB

for storing Lucene Index.
 I can make two exports
 - one for a table with 101000 rows, state consitent.
- another one for a table with 102000 rows whichwill be


corrupted.

 Best regards, Marcelo.
On Wed, May 7, 2008 at 2:03 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
These are certainly "interesting" lookingdocuments, but
Lucene
should
handle them just fine ... so something weird isgoing on.
 Looking forward to your log so I can try to make a
standalone


test

case.
 Mike



 Marcelo Ochoa wrote:
Michael:
Documents are source of Oracle PLSQL code,hexadecimal
codes,

and

others, for example:
SQL> select text from test_source_big whererownum<10;
TEXT

--------------------------------------------------------------------------------

 --    pragma BUILTIN('VALUE', 1, 1, 1);
 --    pragma FIPSFLAG('VALUE', 1450);

 --  function REF (item "<ADT_WITH_OID>") return REF


"<OPAQUE_1>";


 --    pragma BUILTIN('REF', 1, 1, 1);
 --    pragma FIPSFLAG('REF', 1450);

 --  function DEREF (r REF "<OPAQUE_1>") return
"<OPAQUE_1>";
 --    pragma BUILTIN('DEREF', 1, 1, 1);
 But others are like this (PLSQL wrapped code):
TEXT

--------------------------------------------------------------------------------

ibamprTTQr5r5sJf1qGlZi5r0RZJpqamL8OUrIYhFKamUMMJ4aampmGiwLynYw7VG420/3jD

WqcIoYfCkRhEm8CEApGIB0UObqampqapDuUvdrCo+6ampi5sbKampqbKTcU2xqFapqamZ0Lh

YXS9pqaJSXDZyKYOevu6QDapDuUvdkmoa3r7ukCiaVzny4ca3lrnQ711VnjMl/HIL+anRNCU

2VfdioEX5pRCMX46E9sxe5aT4dQt9KLP80LWbJY8wcMvGzVah8I2pqkuFAZtWnpYegBYBn6v

XpNvQtyVFP88Qvrx3lpgv4EtbHVWiJZETH8m25/xnFZY5kxp3KUsGQCV7B+dQ0Nl8CvAQAE+

XBTD95ys+VeV4dkJ5VpNZpZEc/P8rMDhpqkO5S92sKj7Z16hvaamrOXmuG5GNqZDvbRlFIlJ

cNnI5+Wer69tIbK9a3r7t2PD92nph56SC682Lubm55NveitWhnDCkTSO9ZUa07pZBn6QnFb7

hrSiZrzj27c+NlwCj025vBRZ0/GcVvvKfRihbGHQwQiyvsutsNXUvd2nyLqT1Ijq0gku4TXq

WcUtEHYK2XBvpREuARyVIXWsoaVmLmvRFkmmphTB9pOeReAJ9M1xGUvEOcqmQ720ZRSpGj7w

lDD36HBxMh45wNXBrOJ1w+D4daaWlk0O45OsRJvxbOMw4eGmpt2MKo1Cjl5WysTCX4LifWPs

NqzidcPg+FcrnK9Wj7Psl3oY3nyyvcpNxTYHmHuJh9Qh6laGNjBCNTq0X2lpXOdKAYnyq9OX


 May be these are the problem, text collumn
(VARCHAR2(4000))


is a

Field value in a Lucene document.
I am using Mandriva Linux 2008 32bits, withOracle 11g.I will generate a full log which includesdocuments added
to

the

index.

 Best regards, Marcelo.

On Wed, May 7, 2008 at 12:11 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

It looks like it crashes on a fresh index, afterfilling
up

RAM

53 MB

with 85564 docs.

 What do your documents look like?
Could you print each doc as you add it, and thensend me


that

log?

I
can
extract docs from there and try to replay them onmy end
to

see

if I

can

get

the error to happen.

 Which OS are you running on?

 Mike

 On May 7, 2008, at 10:58 AM, Marcelo Ochoa wrote:

Michael:
 Full .trc file is around 18Mb compresed.
If you want, I can put it at our University Website to

download

using

http.

 Best regards, Marcelo.

On Wed, May 7, 2008 at 11:57 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

 Super, I got it, thanks!  I'll look through it...

 Mike



 On May 7, 2008, at 10:10 AM, Marcelo Ochoa wrote:
Michel:
 Did you receive the attach in this mail?
I have rewrite the buffered ouputimplementation.First test suite works well, but index onbig_source

table

fail

again.

 I'll re-run the test to sent full infoStream log
 Best regards, Marcelo.

---------- Forwarded message ----------
From: Marcelo Ochoa <[EMAIL PROTECTED]>
Date: Wed, May 7, 2008 at 9:07 AM

Subject: Re: Help to solve an issue whenupgrading

Lucene-Oracle

integration to lucene 2.3.1
To: [EMAIL PROTECTED]


Hi Michael:
 First thanks a lot for your time.
 See comments below.

Is there any way to capture & serialize theactual

documents

being

 added (this way I can "replay" those docs to


reproduce

it)?

 Documents are a column VARCHAR2 from all_source


Oracle's

System

 view, in fact is a table as:
 create table test_source_big as (select * from

all_source);


 Are you using threads?  Is autoCommit true or


false?

Oracle JVM uses by default a single Threadmodel,


except

that

Lucene

is starting a parallel Thread. InfoStreaminformation

shows

only

one

 Thread.
 AutoCommit is false.
 I am creating LuceneWritter with this code:
 IndexWriter writer = null;
 Parameters parameters = dir.getParameters();
 int mergeFactor =

Integer.parseInt(parameters.getParameter("MergeFactor",

                                            "" +
 LogMergePolicy.DEFAULT_MERGE_FACTOR));
 int maxBufferedDocs =

Integer.parseInt(parameters.getParameter("MaxBufferedDocs",

                                            "" +
 IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
 int maxMergeDocs =

Integer.parseInt(parameters.getParameter("MaxMergeDocs",

                                            "" +
 LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
 int maxBufferedDeleteTerms =

Integer.parseInt(parameters.getParameter("MaxBufferedDeleteTerms",

                                            "" +

 IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
 Analyzer analyzer = getAnalyzer(parameters);
 boolean useCompountFileName =

"true".equalsIgnoreCase(parameters.getParameter("UseCompoundFile",

"false"));

 boolean autoTuneMemory =

"true".equalsIgnoreCase(parameters.getParameter("AutoTuneMemory",

"true"));

 writer =
       new IndexWriter(dir, autoCommitEnable,

analyzer,

createEnable);

 if (autoTuneMemory) {
   long memLimit =

((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);


   logger.info(".getIndexWriterForDir - Memory


limit

for

 indexing (Mb): "+memLimit);
   writer.setRAMBufferSizeMB(memLimit);
 } else
   writer.setMaxBufferedDocs(maxBufferedDocs);
 writer.setMaxMergeDocs(maxMergeDocs);

writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);

 writer.setMergeFactor(mergeFactor);
 writer.setUseCompoundFile(useCompountFileName);
 if (logger.isLoggable(Level.FINE))
   writer.setInfoStream(System.out);
 The example pass these relevant parameters:

AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene.analysis.StopAnalyzer;MergeFactor:500

 So, because AutoTuneMemory is true, instead of


setting


 MaxBufferedDocs I am setting RAMBufferSizeMB(53)


which is

calculated

 using Oracle SGA free memory.


 Are you using payloads?

No.


 Were there any previous exceptions in this


IndexWriter

before

flushing

this segment? Could you post the fullinfoStream

output?

There is no provious exception. Attacheda .trc file

generated by

 Oracle 11g, it have infoStream information plus


logging

informartion

 from Oracle-Lucene data cartridge.

 <snip>

Could you apply the patch below & re-run?It will

likely

produce




--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

Reply via email to