CC'ing java-dev...
Marcelo and I have been iterating / drilling down on this odd
exception when flushing a segment. Lucene is running embedded in
Oracle, using Oracle's JVM.
After adding numerous prints, it looks as if DocumentWriter's
quickSort suddenly stops working correctly. We pass in an array of
distinct Posting[] instances, and then somehow after quickSort is
done, the array is weirdly messed up: sometimes the same Posting
instance occurs twice, the order is not quite sorted (yet close).
Very weird.
Marcelo has a single doc which when indexed once, works, then when
indexed again, fails. A standalone test with just that doc always
works.
It very much seems like a JRE bug at this point.
Marcelo, in one similar issue on JRE 1.6.0_04 and 1.6.0_05, adding
-Xbatch or -Xint sometimes worked around the issue. I don't know if
you're able to try such parameters with Oracle but it may be worth a
shot if so?
Mike
Marcelo Ochoa wrote:
Michael:
I have noticed that the differences between 1) and 2) step is that
DocumentWriters class is JITed.
So If its true is very bad news :( because Oracle will not patch de
code so fast as Lucene dev team do.
I'll test with latest code and send the .trc file again.
Also I can test Lucene 2.3.1 with Oracle 10g which is based on
jdk1.4.2 and do not use JIT strategies, instead of JIT its possible to
translate Java to .so libraries out off the box.
Best regards, Marcelo.
On Fri, May 9, 2008 at 9:58 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
That is VERY strange about hitting exception only on 2nd time. I'll
ponder...
In the meantime, can you run this one? Just more prints...
Mike
On May 9, 2008, at 8:10 AM, Marcelo Ochoa wrote:
Michael:
I have applied the patch manually and it was similar :)
But anyway, here the .trc file made with your working copy.
Note this:
1) If I uploaded a new version of DocumentsWriter.class to the DB,
index creation works OK.
2) If I drop the index and create it again, it fail.
Is DocumentWriters class using an static variable which is not
cleared from the step 1 to 2?
Oracle JVM uses an special garbage collector which collect
diferents
objects from one execution to other.
Best regards, Marcelo.
On Fri, May 9, 2008 at 8:37 AM, Marcelo Ochoa
<[EMAIL PROTECTED]>
wrote:
Michael:
Could you send to me your working copy of DocumentsWriter class?
Something is different in my Lucene 2.3.1 download file and the
patch did not work automatically :(
Marcelo.
On Fri, May 9, 2008 at 6:29 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
That is great, that you narrowed it down to a single doc!
Alas, when I build a standalone test case indexing that one
doc, it
indexes fine on Lucene 2.3.1, using JRE 1.5.0_13 on Mac OS X
10.4, and
1.6.0_05 on Debian Linux.
I like that you uncommented the addPosition output; I did the
same in
my run, and diffed my log with yours, and they are identical,
up until
when the flush occurs.
Here's what I know so far. When you hit the issue, something
has gone
horribly wrong with the sorting of the Posting[] by term text.
EG the
first 3 terms in your flush are these:
term=aampmdgjuyng
term=aamrkamls
term=aampqam
Already that's out of order (the 2nd & 3rd terms should be
swapped).
Then the next 2 terms are the exact same Posting instance, which
should be impossible! (Search for
[EMAIL PROTECTED] in your
test_ora_15649.trc).
Lucene calls sortPostings() to do this sort. That method first
calls
compactPostings() to collapse out all the null entries in the hash
table, in-place. Then doPostingsSort, which calls quickSort, is
called.
The original hash table has unique Posting instances. Ie, there
should be no dups in there (the same Posting instance is stored in
only one place). I'm pretty sure that's the case in your run,
based
on the prints from addPosition.
So, compacting it should not produce any dups, either.
And, neither should quickSort. I'm very much at a loss as to
what's
going on here.
OK I added more prints, attached the patch, showing the Posting[]
before & after compacting, and after sorting. Can you apply this
patch & get the single document to fail again & send me the
output?
Thanks!
Mike
Marcelo Ochoa wrote:
Well here a full .trc file of the row which causes the exception.
Best regards, Marcelo.
On Thu, May 8, 2008 at 9:08 PM, Marcelo Ochoa
<[EMAIL PROTECTED]>
wrote:
Hi Michael:
I have identified a problematic doc, the problem is not
depending of
the RAM usage, is a consequence of data included into the doc.
Using batch mode indexing of Lucene Domain Index I have
reduced by
partitioning in batch of 115, 10, and 1 row indexing.
So attached is .trc file of an index creation operation in a
table of
one row.
Tomorrow morning I'll create the index including your trace
information, because I commented the system to get
smaller .trc files.
Best regards, Marcelo.
On Thu, May 8, 2008 at 2:20 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Hmm, somewhat spooky. This time it crashed when flushing a
different
term.
Is this DB changing? Or maybe the order of the docs
change? Or,
any
other
source of change that could explain this shift in the bug?
Is it possible to turn on assertions for org.apache.lucene.*
and
re-run?
I'm very curious to see if you trip any of the existing
asserts.
Mike
Marcelo Ochoa wrote:
Michael:
Here a new dump with latest debugging information:
http://www.exa.unicen.edu.ar/temp/test_ora_8974.trc.gz
Best regards, Marcelo.
On Thu, May 8, 2008 at 9:23 AM, Marcelo Ochoa
<[EMAIL PROTECTED]>
wrote:
Michael:
I am in Argentina, GMT-3:00.
It's 9:21am
But I am starting my working day at the University at 1pm :)
During the morning I am working with other projects in my
house.
Best regards, Marcelo.
On Thu, May 8, 2008 at 9:16 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
OK, thanks!
Which country/timezone are you on? I'm in the US, on
the East
coast.
It's
8:16 AM local time now.
Well we're not sure whether it's a Lucene or an Oracle
JVM issue
just
yet... but whatever it is, it's a doozie!
Mike
On May 8, 2008, at 8:07 AM, Marcelo Ochoa wrote:
Hi MIchael:
Sorry for not replying the email yesterday.
We have differentes country timezone :)
I can test your code to compare the differences but
Oracle-Lucene
integration runs inside Oracle JVM:
http://download.oracle.com/docs/cd/B28359_01/java.111/
b31225/chone.htm#BABJCDIE
Oracle JVM in a 11g database is JDK1.5 certified stack
but with
some
differences, for example memory and Thread manager are
integrated
with
the database one and all JDBC operations are in a local
memory
space.
So I need to reproduce the problem inside the JVM, may
be I got
an
Oracle JVM bug :(
I have patched the code and have created a new dump
file, I'll
put
at our web site in four hour, now I am at home and my DSL
connection
doesn't have a good uplink :(
Best regards, Marcelo.
On Wed, May 7, 2008 at 7:40 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
OK I parsed out your log and made a standalone, rather
hacked
up,
Java
program that attempts to reproduce the exception, but,
it runs
to
completion
just fine :(
I'm attaching the sources as a zip file.
One strange difference is I never fill up my RAM
buffer, ie, my
program
only flushes when I close the writer. I'm setting
buffer to 53
MB
too
so I
expected it to flush. I'm not sure why that difference is
there.
Could you look over the sources and see if it properly
mimics
what
your
are
doing inside Oracle? I'm pretty sure I'm getting all
the docs
because
if I
uncomment that Sys.stdout.println then diff my output from
yours,
all
of
the
"FINE: Document" lines are accounted for.
Can you try running this ("javac *.java; java
IndexOraMain")
and
see
if
you
can get the crash?
One difference is I'm running with Sun's JVM (I tried
1.5 &
1.6).
I've
tried on trunk, 2.3.x trunk, and 2.3.1. In runs fine
in all
cases...maybe I
am missing something?
Mike
On May 7, 2008, at 5:32 PM, Marcelo Ochoa wrote:
Great.
If you have more info just drop me an email.
Best regards, Marcelo.
On Wed, May 7, 2008 at 6:22 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Excellent, I pulled this down!
I'll turn this into a standalone test.
Thanks!
I don't need your Lucene index, just yet anyway.
Mike
On May 7, 2008, at 3:52 PM, Marcelo Ochoa wrote:
Hi Michael:
Sorry for the long delay.
I have tested when the exception is raised based on
how many
rows
are
indexed.
I found that up to 101000 rows the index is created
correctly.
So the .trc file was generated trying to create an
index on
a
table
with 102000 rows. Here a link to download:
http://www.exa.unicen.edu.ar/temp/test_ora_18656.trc.gz
As you can see one Field at Lucene document have
the Oracle
rowid,
this field is a primary key so there will be al
least one
entry
for
each document indexed.
If you want I can make an export of the Lucene index
structure
as
traditional Lucene filesystem format, I used this
functionality
to
see
the index with Luke, remeber that Lucene-Oracle
integration
uses
BLOB
for storing Lucene Index.
I can make two exports
- one for a table with 101000 rows, state consitent.
- another one for a table with 102000 rows which
will be
corrupted.
Best regards, Marcelo.
On Wed, May 7, 2008 at 2:03 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
These are certainly "interesting" looking
documents, but
Lucene
should
handle them just fine ... so something weird is
going on.
Looking forward to your log so I can try to make a
standalone
test
case.
Mike
Marcelo Ochoa wrote:
Michael:
Documents are source of Oracle PLSQL code,
hexadecimal
codes,
and
others, for example:
SQL> select text from test_source_big where
rownum<10;
TEXT
------------------------------------------------------------
--------------------
-- pragma BUILTIN('VALUE', 1, 1, 1);
-- pragma FIPSFLAG('VALUE', 1450);
-- function REF (item "<ADT_WITH_OID>") return REF
"<OPAQUE_1>";
-- pragma BUILTIN('REF', 1, 1, 1);
-- pragma FIPSFLAG('REF', 1450);
-- function DEREF (r REF "<OPAQUE_1>") return
"<OPAQUE_1>";
-- pragma BUILTIN('DEREF', 1, 1, 1);
But others are like this (PLSQL wrapped code):
TEXT
------------------------------------------------------------
--------------------
ibamprTTQr5r5sJf1qGlZi5r0RZJpqamL8OUrIYhFKamUMMJ4aampmGiwLyn
Yw7VG420/3jD
WqcIoYfCkRhEm8CEApGIB0UObqampqapDuUvdrCo
+6ampi5sbKampqbKTcU2xqFapqamZ0Lh
YXS9pqaJSXDZyKYOevu6QDapDuUvdkmoa3r7ukCiaVzny4ca3lrnQ711VnjM
l/HIL+anRNCU
2VfdioEX5pRCMX46E9sxe5aT4dQt9KLP80LWbJY8wcMvGzVah8I2pqkuFAZt
WnpYegBYBn6v
XpNvQtyVFP88Qvrx3lpgv4EtbHVWiJZETH8m25/xnFZY5kxp3KUsGQCV7B
+dQ0Nl8CvAQAE+
XBTD95ys+VeV4dkJ5VpNZpZEc/
P8rMDhpqkO5S92sKj7Z16hvaamrOXmuG5GNqZDvbRlFIlJ
cNnI5
+Wer69tIbK9a3r7t2PD92nph56SC682Lubm55NveitWhnDCkTSO9ZUa07pZB
n6QnFb7
hrSiZrzj27c+NlwCj025vBRZ0/
GcVvvKfRihbGHQwQiyvsutsNXUvd2nyLqT1Ijq0gku4TXq
WcUtEHYK2XBvpREuARyVIXWsoaVmLmvRFkmmphTB9pOeReAJ9M1xGUvEOcqm
Q720ZRSpGj7w
lDD36HBxMh45wNXBrOJ1w
+D4daaWlk0O45OsRJvxbOMw4eGmpt2MKo1Cjl5WysTCX4LifWPs
NqzidcPg
+FcrnK9Wj7Psl3oY3nyyvcpNxTYHmHuJh9Qh6laGNjBCNTq0X2lpXOdKAYny
q9OX
May be these are the problem, text collumn
(VARCHAR2(4000))
is a
Field value in a Lucene document.
I am using Mandriva Linux 2008 32bits, with
Oracle 11g.
I will generate a full log which includes
documents added
to
the
index.
Best regards, Marcelo.
On Wed, May 7, 2008 at 12:11 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
It looks like it crashes on a fresh index, after
filling
up
RAM
@
53 MB
with 85564 docs.
What do your documents look like?
Could you print each doc as you add it, and then
send me
that
log?
I
can
extract docs from there and try to replay them on
my end
to
see
if I
can
get
the error to happen.
Which OS are you running on?
Mike
On May 7, 2008, at 10:58 AM, Marcelo Ochoa wrote:
Michael:
Full .trc file is around 18Mb compresed.
If you want, I can put it at our University Web
site to
download
using
http.
Best regards, Marcelo.
On Wed, May 7, 2008 at 11:57 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Super, I got it, thanks! I'll look through it...
Mike
On May 7, 2008, at 10:10 AM, Marcelo Ochoa wrote:
Michel:
Did you receive the attach in this mail?
I have rewrite the buffered ouput
implementation.
First test suite works well, but index on
big_source
table
fail
again.
I'll re-run the test to sent full infoStream log
Best regards, Marcelo.
---------- Forwarded message ----------
From: Marcelo Ochoa <[EMAIL PROTECTED]>
Date: Wed, May 7, 2008 at 9:07 AM
Subject: Re: Help to solve an issue when
upgrading
Lucene-Oracle
integration to lucene 2.3.1
To: [EMAIL PROTECTED]
Hi Michael:
First thanks a lot for your time.
See comments below.
Is there any way to capture & serialize the
actual
documents
being
added (this way I can "replay" those docs to
reproduce
it)?
Documents are a column VARCHAR2 from all_source
Oracle's
System
view, in fact is a table as:
create table test_source_big as (select * from
all_source);
Are you using threads? Is autoCommit true or
false?
Oracle JVM uses by default a single Thread
model,
except
that
Lucene
is starting a parallel Thread. InfoStream
information
shows
only
one
Thread.
AutoCommit is false.
I am creating LuceneWritter with this code:
IndexWriter writer = null;
Parameters parameters = dir.getParameters();
int mergeFactor =
Integer.parseInt(parameters.getParameter("MergeFactor",
"" +
LogMergePolicy.DEFAULT_MERGE_FACTOR));
int maxBufferedDocs =
Integer.parseInt(parameters.getParameter
("MaxBufferedDocs",
"" +
IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
int maxMergeDocs =
Integer.parseInt(parameters.getParameter("MaxMergeDocs",
"" +
LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
int maxBufferedDeleteTerms =
Integer.parseInt(parameters.getParameter
("MaxBufferedDeleteTerms",
"" +
IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
Analyzer analyzer = getAnalyzer(parameters);
boolean useCompountFileName =
"true".equalsIgnoreCase(parameters.getParameter
("UseCompoundFile",
"false"));
boolean autoTuneMemory =
"true".equalsIgnoreCase(parameters.getParameter
("AutoTuneMemory",
"true"));
writer =
new IndexWriter(dir, autoCommitEnable,
analyzer,
createEnable);
if (autoTuneMemory) {
long memLimit =
((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);
logger.info(".getIndexWriterForDir - Memory
limit
for
indexing (Mb): "+memLimit);
writer.setRAMBufferSizeMB(memLimit);
} else
writer.setMaxBufferedDocs(maxBufferedDocs);
writer.setMaxMergeDocs(maxMergeDocs);
writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);
writer.setMergeFactor(mergeFactor);
writer.setUseCompoundFile(useCompountFileName);
if (logger.isLoggable(Level.FINE))
writer.setInfoStream(System.out);
The example pass these relevant parameters:
AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene
.analysis.StopAnalyzer;MergeFactor:500
So, because AutoTuneMemory is true, instead of
setting
MaxBufferedDocs I am setting RAMBufferSizeMB(53)
which is
calculated
using Oracle SGA free memory.
Are you using payloads?
No.
Were there any previous exceptions in this
IndexWriter
before
flushing
this segment? Could you post the full
infoStream
output?
There is no provious exception. Attached
a .trc file
generated by
Oracle 11g, it have infoStream information plus
logging
informartion
from Oracle-Lucene data cartridge.
<snip>
Could you apply the patch below & re-run?
It will
likely
produce
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]