[ https://issues.apache.org/jira/browse/PYLUCENE-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400092#comment-13400092 ]
Patrick J. McNerthney commented on PYLUCENE-17: ----------------------------------------------- I just ran into this very problem with our use of JCC to integrate Python with the Eclipse BIRT Report Runtime. I have a test which is similar to Greg's which is wrapped in a shell script that repeatedly runs the test, since the error only occurs when the class is in need of being initialized. This test won't run for more then a minute before causing a seg fault. I initially tried adding the volatile keyword, but that had no effect. I believe I have figured out the exact cause of the seg fault. It is not the class$ pointer being null, but bad values in the mids$ array. The sequence of events is something like so: 1. Thread A calls initializeClass, finds class$ to be null and starts to initialize it. 2. Thread A sets mids$ to an empty array and begins populating it. 3. Before Thread A sets class$ to a value, Thread B interrupts it starts executing. 4. Thread B calls initializeClass, finds class$ to be null and starts to initialize it. 5. Thread B sets mids$ to an empty array and begins populating it. 6. Before Thread B finishes populating mids$, Thread A interrupts Thread B and starts executing, leaving empty mids$ entry values. 7. Thread A continues executing, sets class$ to it's value and assumes that the class is now properly initialized. 8. Thread A accesses one of the mids$ entry values that were left empty by Thread B because it was interrupted. 9. Thread A causes a seg fault trying to use the empty mid value. I found that if I changed the initializeClass method to initially store the mids$ array in a locally scoped variable, populate that, and then only set mids$ with the filled in array (and to do the same with fids$) that my test would not fail. However, I do not care for that solution, because multiple threads still end up thinking they need to initialize the class, causing some objects to be allocated multiple times and are lost. My preferred solution is to use a mutex with a double check for class$ being null, the first check outside of the mutex and the second check inside the mutex. This avoids the mutex overhead once things are initialized, but still provides proper threading synchronization. Attached is a patch against the current pylucene trunk which uses the JCCEnv mutex to protect the initializeClass code. This has only been tested under Linux. I attempted to implement Windows support, but have no idea if it is correct. > Possible race condition with pylucene attachCurrentThread > --------------------------------------------------------- > > Key: PYLUCENE-17 > URL: https://issues.apache.org/jira/browse/PYLUCENE-17 > Project: PyLucene > Issue Type: Bug > Environment: Linux 2.6.39 > Sun jdk 1.6.26 > Reporter: Greg Bowyer > Labels: pylucene > Attachments: backtrace, lucene-threadtest.py > > > It looks like there is a possible race that can cause null pointer exceptions > in the JVM, making it crash > Because its a race it is hard to reproduce, the best luck I have had so far > is dropping my FS cache in the OS, which seems to slow down the > initialisation of the JVM enough to make it easier to reproduce. > Attached is my test case > Test session follows > --------------------------------------------------------------- > greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > > /proc/sys/vm/drop_caches' > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f79226b35c8, pid=26581, tid=140158003312384 > # > # JRE version: 6.0_26-b03 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.so+0x4b05c8] instanceKlass::cached_itable_index(unsigned > long)+0x18 > # > # An error report file with more information is saved as: > # /home/greg/programming/python/hs_err_pid26581.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > Aborted (core dumped) > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > greg@localhost ~/programming/python $ rm -r /tmp/test-index/ > greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > > /proc/sys/vm/drop_caches' > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > # > # A fatal error has been detected by the Java Runtime Environment: > [thread 139988165344768 also had an error][thread 139988165344768 also had an > error]# > # SIGSEGV (0xb) > at pc=0x00007f5197550a29, pid=27657, tid=139988039468800 > # > # JRE version: 6.0_26-b03 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.so+0x4f2a29] unsigned+0x299 > # > # An error report file with more information is saved as: > # /home/greg/programming/python/hs_err_pid27657.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > Aborted (core dumped) > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > > /proc/sys/vm/drop_caches' > greg@localhost ~/programming/python $ python ./lucene-threadtest.py > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f51bc2eaa1e, pid=28124, tid=139988377052928 > # > # JRE version: 6.0_26-b03 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.so+0x4f2a1e] unsigned+0x28e > # > # An error report file with more information is saved as: > # /home/greg/programming/python/hs_err_pid28124.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > Aborted (core dumped) > greg@localhost ~/programming/python $ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira