[ 
https://issues.apache.org/jira/browse/PYLUCENE-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400092#comment-13400092
 ] 

Patrick J. McNerthney commented on PYLUCENE-17:
-----------------------------------------------

I just ran into this very problem with our use of JCC to integrate Python with 
the Eclipse BIRT Report Runtime.  I have a test which is similar to Greg's 
which is wrapped in a shell script that repeatedly runs the test, since the 
error only occurs when the class is in need of being initialized.  This test 
won't run for more then a minute before causing a seg fault. I initially tried 
adding the volatile keyword, but that had no effect.

I believe I have figured out the exact cause of the seg fault.  It is not the 
class$ pointer being null, but bad values in the mids$ array.  The sequence of 
events is something like so:

1. Thread A calls initializeClass, finds class$ to be null and starts to 
initialize it.
2. Thread A sets mids$ to an empty array and begins populating it.
3. Before Thread A sets class$ to a value, Thread B interrupts it starts 
executing.
4. Thread B calls initializeClass, finds class$ to be null and starts to 
initialize it.
5. Thread B sets mids$ to an empty array and begins populating it.
6. Before Thread B finishes populating mids$, Thread A interrupts Thread B and 
starts executing, leaving empty mids$ entry values.
7. Thread A continues executing, sets class$ to it's value and assumes that the 
class is now properly initialized.
8. Thread A accesses one of the mids$ entry values that were left empty by 
Thread B because it was interrupted.
9. Thread A causes a seg fault trying to use the empty mid value.

I found that if I changed the initializeClass method to initially store the 
mids$ array in a locally scoped variable, populate that, and then only set 
mids$ with the filled in array (and to do the same with fids$) that my test 
would not fail.

However, I do not care for that solution, because multiple threads still end up 
thinking they need to initialize the class, causing some objects to be 
allocated multiple times and are lost. My preferred solution is to use a mutex 
with a double check for class$ being null, the first check outside of the mutex 
and the second check inside the mutex. This avoids the mutex overhead once 
things are initialized, but still provides proper threading synchronization.

Attached is a patch against the current pylucene trunk which uses the JCCEnv 
mutex to protect the initializeClass code. This has only been tested under 
Linux. I attempted to implement Windows support, but have no idea if it is 
correct.
                
> Possible race condition with pylucene attachCurrentThread
> ---------------------------------------------------------
>
>                 Key: PYLUCENE-17
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-17
>             Project: PyLucene
>          Issue Type: Bug
>         Environment: Linux 2.6.39
> Sun jdk 1.6.26
>            Reporter: Greg Bowyer
>              Labels: pylucene
>         Attachments: backtrace, lucene-threadtest.py
>
>
> It looks like there is a possible race that can cause null pointer exceptions 
> in the JVM, making it crash
> Because its a race it is hard to reproduce, the best luck I have had so far 
> is dropping my FS cache in the OS, which seems to slow down the 
> initialisation of the JVM enough to make it easier to reproduce.
> Attached is my test case
> Test session follows
> ---------------------------------------------------------------
> greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > 
> /proc/sys/vm/drop_caches'
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f79226b35c8, pid=26581, tid=140158003312384
> #
> # JRE version: 6.0_26-b03
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x4b05c8]  instanceKlass::cached_itable_index(unsigned 
> long)+0x18
> #
> # An error report file with more information is saved as:
> # /home/greg/programming/python/hs_err_pid26581.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted (core dumped)
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> greg@localhost ~/programming/python $ rm -r /tmp/test-index/
> greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > 
> /proc/sys/vm/drop_caches'
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> [thread 139988165344768 also had an error][thread 139988165344768 also had an 
> error]#
> #  SIGSEGV (0xb)
>  at pc=0x00007f5197550a29, pid=27657, tid=139988039468800
> #
> # JRE version: 6.0_26-b03
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x4f2a29]  unsigned+0x299
> #
> # An error report file with more information is saved as:
> # /home/greg/programming/python/hs_err_pid27657.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted (core dumped)
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > 
> /proc/sys/vm/drop_caches'
> greg@localhost ~/programming/python $ python ./lucene-threadtest.py 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f51bc2eaa1e, pid=28124, tid=139988377052928
> #
> # JRE version: 6.0_26-b03
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x4f2a1e]  unsigned+0x28e
> #
> # An error report file with more information is saved as:
> # /home/greg/programming/python/hs_err_pid28124.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted (core dumped)
> greg@localhost ~/programming/python $ 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to