CC'ing java-dev...

Marcelo and I have been iterating / drilling down on this odd
exception when flushing a segment.  Lucene is running embedded in
Oracle, using Oracle's JVM.

After adding numerous prints, it looks as if DocumentWriter's
quickSort suddenly stops working correctly.  We pass in an array of
distinct Posting[] instances, and then somehow after quickSort is
done, the array is weirdly messed up: sometimes the same Posting
instance occurs twice, the order is not quite sorted (yet close).
Very weird.

Marcelo has a single doc which when indexed once, works, then when
indexed again, fails.  A standalone test with just that doc always
works.

It very much seems like a JRE bug at this point.

Marcelo, in one similar issue on JRE 1.6.0_04 and 1.6.0_05, adding
-Xbatch or -Xint sometimes worked around the issue.  I don't know if
you're able to try such parameters with Oracle but it may be worth a
shot if so?

Mike

Marcelo Ochoa wrote:
Michael:
  I have noticed that the differences between 1) and 2) step is that
DocumentWriters class is JITed.
  So If its true is very bad news :( because Oracle will not patch de
code so fast as Lucene dev team do.
  I'll test with latest code and send the .trc file again.
  Also I can test Lucene 2.3.1 with Oracle 10g which is based on
jdk1.4.2 and do not use JIT strategies, instead of JIT its possible to
translate Java to .so libraries out off the box.
  Best regards, Marcelo.

On Fri, May 9, 2008 at 9:58 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

That is VERY strange about hitting exception only on 2nd time.  I'll
ponder...

In the meantime, can you run this one?  Just more prints...

Mike





On May 9, 2008, at 8:10 AM, Marcelo Ochoa wrote:

Michael:
 I have applied the patch manually and it was similar :)
 But anyway, here the .trc file made with your working copy.
 Note this:
  1) If I uploaded a new version of DocumentsWriter.class to the DB,
index creation works OK.
  2) If I drop the index and create it again, it fail.
 Is DocumentWriters class using an static variable which is not
cleared from the step 1 to 2?
Oracle JVM uses an special garbage collector which collect diferents
objects from one execution to other.
 Best regards, Marcelo.

On Fri, May 9, 2008 at 8:37 AM, Marcelo Ochoa <[EMAIL PROTECTED]>
wrote:

Michael:
 Could you send to me your working copy of DocumentsWriter class?
 Something is different in my Lucene 2.3.1 download file and the
patch did not work automatically :(
 Marcelo.

On Fri, May 9, 2008 at 6:29 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

That is great, that you narrowed it down to a single doc!

Alas, when I build a standalone test case indexing that one doc, it indexes fine on Lucene 2.3.1, using JRE 1.5.0_13 on Mac OS X 10.4, and
1.6.0_05 on Debian Linux.

I like that you uncommented the addPosition output; I did the same in my run, and diffed my log with yours, and they are identical, up until
when the flush occurs.

Here's what I know so far. When you hit the issue, something has gone horribly wrong with the sorting of the Posting[] by term text. EG the
first 3 terms in your flush are these:

 term=aampmdgjuyng
 term=aamrkamls
 term=aampqam

Already that's out of order (the 2nd & 3rd terms should be swapped).

Then the next 2 terms are the exact same Posting instance, which
should be impossible!  (Search for
[EMAIL PROTECTED] in your
test_ora_15649.trc).

Lucene calls sortPostings() to do this sort. That method first calls
compactPostings() to collapse out all the null entries in the hash
table, in-place.  Then doPostingsSort, which calls quickSort, is
called.

The original hash table has unique Posting instances.  Ie, there
should be no dups in there (the same Posting instance is stored in
only one place). I'm pretty sure that's the case in your run, based
on the prints from addPosition.

So, compacting it should not produce any dups, either.

And, neither should quickSort. I'm very much at a loss as to what's
going on here.

OK I added more prints, attached the patch, showing the Posting[]
before & after compacting, and after sorting.  Can you apply this
patch & get the single document to fail again & send me the output?
Thanks!

Mike




Marcelo Ochoa wrote:

Well here a full .trc file of the row which causes the exception.
Best regards, Marcelo.

On Thu, May 8, 2008 at 9:08 PM, Marcelo Ochoa <[EMAIL PROTECTED]>
wrote:

Hi Michael:
I have identified a problematic doc, the problem is not depending of
the RAM usage, is a consequence of data included into the doc.
Using batch mode indexing of Lucene Domain Index I have reduced by
partitioning in batch of 115, 10, and 1 row indexing.
So attached is .trc file of an index creation operation in a table of
one row.
 Tomorrow morning I'll create the index including your trace
information, because I commented the system to get smaller .trc files.
 Best regards, Marcelo.
On Thu, May 8, 2008 at 2:20 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

Hmm, somewhat spooky. This time it crashed when flushing a different
term.
Is this DB changing? Or maybe the order of the docs change? Or,
any
other
source of change that could explain this shift in the bug?

Is it possible to turn on assertions for org.apache.lucene.* and
re-run?
I'm very curious to see if you trip any of the existing asserts.

Mike

Marcelo Ochoa wrote:

Michael:
 Here a new dump with latest debugging information:
http://www.exa.unicen.edu.ar/temp/test_ora_8974.trc.gz
 Best regards, Marcelo.

On Thu, May 8, 2008 at 9:23 AM, Marcelo Ochoa
<[EMAIL PROTECTED]>
wrote:

Michael:
 I am in Argentina, GMT-3:00.
 It's 9:21am
 But I am starting my working day at the University at 1pm :)
During the morning I am working with other projects in my house.
 Best regards, Marcelo.

 On Thu, May 8, 2008 at 9:16 AM, Michael McCandless


<[EMAIL PROTECTED]> wrote:

 OK, thanks!

Which country/timezone are you on? I'm in the US, on the East
coast.
 It's
8:16 AM local time now.

Well we're not sure whether it's a Lucene or an Oracle JVM issue
just
yet... but whatever it is, it's a doozie!

 Mike



 On May 8, 2008, at 8:07 AM, Marcelo Ochoa wrote:

Hi MIchael:
 Sorry for not replying the email yesterday.
 We have differentes country timezone :)
 I can test your code to compare the differences but
Oracle-Lucene
integration runs inside Oracle JVM:




http://download.oracle.com/docs/cd/B28359_01/java.111/ b31225/chone.htm#BABJCDIE

Oracle JVM in a 11g database is JDK1.5 certified stack but with
some
differences, for example memory and Thread manager are integrated
with
the database one and all JDBC operations are in a local memory
space.
So I need to reproduce the problem inside the JVM, may be I got
an
Oracle JVM bug :(
I have patched the code and have created a new dump file, I'll
put
at our web site in four hour, now I am at home and my DSL
connection
doesn't have a good uplink :(
 Best regards, Marcelo.

On Wed, May 7, 2008 at 7:40 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:


OK I parsed out your log and made a standalone, rather hacked
up,
Java
program that attempts to reproduce the exception, but, it runs
to

completion

just fine :(

 I'm attaching the sources as a zip file.

One strange difference is I never fill up my RAM buffer, ie, my
program
only flushes when I close the writer. I'm setting buffer to 53
MB
too

so I

expected it to flush.  I'm not sure why that difference is
there.

Could you look over the sources and see if it properly mimics
what
your

are

doing inside Oracle? I'm pretty sure I'm getting all the docs
because

if I

uncomment that Sys.stdout.println then diff my output from
yours,
all
of

the

"FINE: Document" lines are accounted for.

Can you try running this ("javac *.java; java IndexOraMain")
and
see
if

you

can get the crash?

One difference is I'm running with Sun's JVM (I tried 1.5 &
1.6).
 I've
tried on trunk, 2.3.x trunk, and 2.3.1. In runs fine in all

cases...maybe I

am missing something?

 Mike




 On May 7, 2008, at 5:32 PM, Marcelo Ochoa wrote:


Great.
If you have more info just drop me an email.
Best regards, Marcelo.
On Wed, May 7, 2008 at 6:22 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:



 Excellent, I pulled this down!

 I'll turn this into a standalone test.

 Thanks!

 I don't need your Lucene index, just yet anyway.

 Mike



 On May 7, 2008, at 3:52 PM, Marcelo Ochoa wrote:



Hi Michael:
 Sorry for the long delay.
I have tested when the exception is raised based on how many
rows

are


indexed.


 I found that up to 101000 rows the index is created
correctly.
So the .trc file was generated trying to create an index on
a

table

with 102000 rows. Here a link to download:
http://www.exa.unicen.edu.ar/temp/test_ora_18656.trc.gz
As you can see one Field at Lucene document have the Oracle

rowid,

this field is a primary key so there will be al least one
entry

for

each document indexed.
 If you want I can make an export of the Lucene index
structure
as
traditional Lucene filesystem format, I used this
functionality
to

see

the index with Luke, remeber that Lucene-Oracle integration
uses

BLOB

for storing Lucene Index.
 I can make two exports
 - one for a table with 101000 rows, state consitent.
- another one for a table with 102000 rows which will be

corrupted.

 Best regards, Marcelo.
On Wed, May 7, 2008 at 2:03 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:




These are certainly "interesting" looking documents, but
Lucene




should




handle them just fine ... so something weird is going on.

 Looking forward to your log so I can try to make a
standalone

test




case.





 Mike



 Marcelo Ochoa wrote:




Michael:
Documents are source of Oracle PLSQL code, hexadecimal
codes,

and

others, for example:
SQL> select text from test_source_big where rownum<10;

TEXT
















------------------------------------------------------------ --------------------









 --    pragma BUILTIN('VALUE', 1, 1, 1);
 --    pragma FIPSFLAG('VALUE', 1450);

 --  function REF (item "<ADT_WITH_OID>") return REF

"<OPAQUE_1>";

 --    pragma BUILTIN('REF', 1, 1, 1);
 --    pragma FIPSFLAG('REF', 1450);

 --  function DEREF (r REF "<OPAQUE_1>") return
"<OPAQUE_1>";
 --    pragma BUILTIN('DEREF', 1, 1, 1);
 But others are like this (PLSQL wrapped code):
TEXT
















------------------------------------------------------------ --------------------



















ibamprTTQr5r5sJf1qGlZi5r0RZJpqamL8OUrIYhFKamUMMJ4aampmGiwLyn Yw7VG420/3jD
















WqcIoYfCkRhEm8CEApGIB0UObqampqapDuUvdrCo +6ampi5sbKampqbKTcU2xqFapqamZ0Lh
















YXS9pqaJSXDZyKYOevu6QDapDuUvdkmoa3r7ukCiaVzny4ca3lrnQ711VnjM l/HIL+anRNCU
















2VfdioEX5pRCMX46E9sxe5aT4dQt9KLP80LWbJY8wcMvGzVah8I2pqkuFAZt WnpYegBYBn6v
















XpNvQtyVFP88Qvrx3lpgv4EtbHVWiJZETH8m25/xnFZY5kxp3KUsGQCV7B +dQ0Nl8CvAQAE+
















XBTD95ys+VeV4dkJ5VpNZpZEc/ P8rMDhpqkO5S92sKj7Z16hvaamrOXmuG5GNqZDvbRlFIlJ
















cNnI5 +Wer69tIbK9a3r7t2PD92nph56SC682Lubm55NveitWhnDCkTSO9ZUa07pZB n6QnFb7
















hrSiZrzj27c+NlwCj025vBRZ0/ GcVvvKfRihbGHQwQiyvsutsNXUvd2nyLqT1Ijq0gku4TXq
















WcUtEHYK2XBvpREuARyVIXWsoaVmLmvRFkmmphTB9pOeReAJ9M1xGUvEOcqm Q720ZRSpGj7w
















lDD36HBxMh45wNXBrOJ1w +D4daaWlk0O45OsRJvxbOMw4eGmpt2MKo1Cjl5WysTCX4LifWPs
















NqzidcPg +FcrnK9Wj7Psl3oY3nyyvcpNxTYHmHuJh9Qh6laGNjBCNTq0X2lpXOdKAYny q9OX







 May be these are the problem, text collumn
(VARCHAR2(4000))

is a

Field value in a Lucene document.
I am using Mandriva Linux 2008 32bits, with Oracle 11g. I will generate a full log which includes documents added
to

the






index.






 Best regards, Marcelo.

On Wed, May 7, 2008 at 12:11 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:





It looks like it crashes on a fresh index, after filling
up

RAM






@














53 MB








with 85564 docs.

 What do your documents look like?

Could you print each doc as you add it, and then send me

that






log?














I














can






extract docs from there and try to replay them on my end
to

see






if I














can














get






the error to happen.

 Which OS are you running on?

 Mike

 On May 7, 2008, at 10:58 AM, Marcelo Ochoa wrote:






Michael:
 Full .trc file is around 18Mb compresed.
If you want, I can put it at our University Web site to







download
















using













http.




 Best regards, Marcelo.

On Wed, May 7, 2008 at 11:57 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:






 Super, I got it, thanks!  I'll look through it...

 Mike



 On May 7, 2008, at 10:10 AM, Marcelo Ochoa wrote:









Michel:
 Did you receive the attach in this mail?
I have rewrite the buffered ouput implementation. First test suite works well, but index on big_source









table























fail


























again.












 I'll re-run the test to sent full infoStream log
 Best regards, Marcelo.

---------- Forwarded message ----------
From: Marcelo Ochoa <[EMAIL PROTECTED]>
Date: Wed, May 7, 2008 at 9:07 AM
Subject: Re: Help to solve an issue when upgrading














Lucene-Oracle














integration to lucene 2.3.1
To: [EMAIL PROTECTED]


Hi Michael:
 First thanks a lot for your time.
 See comments below.







Is there any way to capture & serialize the actual
















documents


































being


















 added (this way I can "replay" those docs to

reproduce










it)?
















 Documents are a column VARCHAR2 from all_source

Oracle's














System














 view, in fact is a table as:
 create table test_source_big as (select * from









all_source);

















 Are you using threads?  Is autoCommit true or

false?






Oracle JVM uses by default a single Thread model,

except









that
























Lucene















is starting a parallel Thread. InfoStream information









shows























only





























one















 Thread.
 AutoCommit is false.
 I am creating LuceneWritter with this code:
 IndexWriter writer = null;
 Parameters parameters = dir.getParameters();
 int mergeFactor =















Integer.parseInt(parameters.getParameter("MergeFactor",














                                            "" +
 LogMergePolicy.DEFAULT_MERGE_FACTOR));
 int maxBufferedDocs =
















Integer.parseInt(parameters.getParameter ("MaxBufferedDocs",















                                            "" +
 IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
 int maxMergeDocs =















Integer.parseInt(parameters.getParameter("MaxMergeDocs",














                                            "" +
 LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
 int maxBufferedDeleteTerms =





















Integer.parseInt(parameters.getParameter ("MaxBufferedDeleteTerms",

















                                            "" +

 IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
 Analyzer analyzer = getAnalyzer(parameters);
 boolean useCompountFileName =





















"true".equalsIgnoreCase(parameters.getParameter ("UseCompoundFile",

































"false"));















 boolean autoTuneMemory =




















"true".equalsIgnoreCase(parameters.getParameter ("AutoTuneMemory",
































"true"));














 writer =
       new IndexWriter(dir, autoCommitEnable,









analyzer,














createEnable);





 if (autoTuneMemory) {
   long memLimit =

((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);

   logger.info(".getIndexWriterForDir - Memory

limit









for









 indexing (Mb): "+memLimit);
   writer.setRAMBufferSizeMB(memLimit);
 } else
   writer.setMaxBufferedDocs(maxBufferedDocs);
 writer.setMaxMergeDocs(maxMergeDocs);










writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);









 writer.setMergeFactor(mergeFactor);
 writer.setUseCompoundFile(useCompountFileName);
 if (logger.isLoggable(Level.FINE))
   writer.setInfoStream(System.out);
 The example pass these relevant parameters:


































AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene .analysis.StopAnalyzer;MergeFactor:500

























 So, because AutoTuneMemory is true, instead of

setting

 MaxBufferedDocs I am setting RAMBufferSizeMB(53)

which is















calculated















 using Oracle SGA free memory.







 Are you using payloads?






 No.








 Were there any previous exceptions in this

IndexWriter
















before
































flushing
















this segment? Could you post the full infoStream










output?
















There is no provious exception. Attached a .trc file














generated by














 Oracle 11g, it have infoStream information plus

logging















informartion















 from Oracle-Lucene data cartridge.













 <snip>







Could you apply the patch below & re-run? It will










likely




























produce











--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to