SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Nam Dang
Dear all,

I am running a small benchmark for Ceph with multithreading and cephfs-java API.
I encountered this issue even when I use only two threads, and I used
only open file and creating directory operations.

The piece of code is simply:
String parent = filePath.substring(0, filePath.lastIndexOf('/'));
mount.mkdirs(parent, 0755); // create parents if the path does not exist
int fileID = mount.open(filePath, CephConstants.O_CREAT, 0666); //
open the file

Each thread mounts its own ceph mounting point (using
mount.mount(null)) and I don't have any interlocking mechanism across
the threads at all.
It appears the error is SIGSEGV sent off by libcepfs. The message is as follows:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7ff6af978d39, pid=14063, tid=140697400411904
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
#
# An error report file with more information is saved as:
# /home/namd/cephBench/hs_err_pid14063.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

I have also attached the hs_err_pid14063.log for your reference.
An excerpt from the file:

Stack: [0x7ff6aa828000,0x7ff6aa929000],
sp=0x7ff6aa9274f0,  free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
j  
Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
j  Benchmark$StatsDaemon.benchmarkOne()V+22
j  Benchmark$StatsDaemon.run()V+26
v  ~StubRoutines::call_stub

So I think the probably may be due to the locking mechanism of ceph
internally. But Dr. Weil previously answered my email stating that the
mounting is done independently so multithreading should not lead to
this problem. If there is anyway to work around this, please let me
know.

Best regards,

Nam Dang
Email: n...@de.cs.titech.ac.jp
HP: (+81) 080-4465-1587
Yokota Lab, Dept. of Computer Science
Tokyo Institute of Technology
Tokyo, Japan


hs_err_pid14063.log
Description: Binary data


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Nam Dang
It turned out my monitor went down without my knowing.
So my bad, it wasn't because of Ceph.

Best regards,

Nam Dang
Tokyo Institute of Technology
Tokyo, Japan


On Thu, May 31, 2012 at 10:08 PM, Nam Dang  wrote:
> Dear all,
>
> I am running a small benchmark for Ceph with multithreading and cephfs-java 
> API.
> I encountered this issue even when I use only two threads, and I used
> only open file and creating directory operations.
>
> The piece of code is simply:
> String parent = filePath.substring(0, filePath.lastIndexOf('/'));
> mount.mkdirs(parent, 0755); // create parents if the path does not exist
> int fileID = mount.open(filePath, CephConstants.O_CREAT, 0666); //
> open the file
>
> Each thread mounts its own ceph mounting point (using
> mount.mount(null)) and I don't have any interlocking mechanism across
> the threads at all.
> It appears the error is SIGSEGV sent off by libcepfs. The message is as 
> follows:
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ff6af978d39, pid=14063, tid=140697400411904
> #
> # JRE version: 6.0_26-b03
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
> #
> # An error report file with more information is saved as:
> # /home/namd/cephBench/hs_err_pid14063.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
>
> I have also attached the hs_err_pid14063.log for your reference.
> An excerpt from the file:
>
> Stack: [0x7ff6aa828000,0x7ff6aa929000],
> sp=0x7ff6aa9274f0,  free space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
> j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
> j  
> Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
> j  Benchmark$StatsDaemon.benchmarkOne()V+22
> j  Benchmark$StatsDaemon.run()V+26
> v  ~StubRoutines::call_stub
>
> So I think the probably may be due to the locking mechanism of ceph
> internally. But Dr. Weil previously answered my email stating that the
> mounting is done independently so multithreading should not lead to
> this problem. If there is anyway to work around this, please let me
> know.
>
> Best regards,
>
> Nam Dang
> Email: n...@de.cs.titech.ac.jp
> HP: (+81) 080-4465-1587
> Yokota Lab, Dept. of Computer Science
> Tokyo Institute of Technology
> Tokyo, Japan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Noah Watkins

On May 31, 2012, at 6:20 AM, Nam Dang wrote:

> It turned out my monitor went down without my knowing.
> So my bad, it wasn't because of Ceph.

I believe the segfault here is from client being null dereferenced in the c 
wrappers. Which patch set are you using?

> 
> Best regards,
> 
> Nam Dang
> Tokyo Institute of Technology
> Tokyo, Japan
> 
> 
> On Thu, May 31, 2012 at 10:08 PM, Nam Dang  wrote:
>> Dear all,
>> 
>> I am running a small benchmark for Ceph with multithreading and cephfs-java 
>> API.
>> I encountered this issue even when I use only two threads, and I used
>> only open file and creating directory operations.
>> 
>> The piece of code is simply:
>> String parent = filePath.substring(0, filePath.lastIndexOf('/'));
>> mount.mkdirs(parent, 0755); // create parents if the path does not exist
>> int fileID = mount.open(filePath, CephConstants.O_CREAT, 0666); //
>> open the file
>> 
>> Each thread mounts its own ceph mounting point (using
>> mount.mount(null)) and I don't have any interlocking mechanism across
>> the threads at all.
>> It appears the error is SIGSEGV sent off by libcepfs. The message is as 
>> follows:
>> 
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x7ff6af978d39, pid=14063, tid=140697400411904
>> #
>> # JRE version: 6.0_26-b03
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>> #
>> # An error report file with more information is saved as:
>> # /home/namd/cephBench/hs_err_pid14063.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> # The crash happened outside the Java Virtual Machine in native code.
>> # See problematic frame for where to report the bug.
>> 
>> I have also attached the hs_err_pid14063.log for your reference.
>> An excerpt from the file:
>> 
>> Stack: [0x7ff6aa828000,0x7ff6aa929000],
>> sp=0x7ff6aa9274f0,  free space=1021k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>> code)
>> C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>> 
>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>> j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
>> j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
>> j  
>> Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
>> j  Benchmark$StatsDaemon.benchmarkOne()V+22
>> j  Benchmark$StatsDaemon.run()V+26
>> v  ~StubRoutines::call_stub
>> 
>> So I think the probably may be due to the locking mechanism of ceph
>> internally. But Dr. Weil previously answered my email stating that the
>> mounting is done independently so multithreading should not lead to
>> this problem. If there is anyway to work around this, please let me
>> know.
>> 
>> Best regards,
>> 
>> Nam Dang
>> Email: n...@de.cs.titech.ac.jp
>> HP: (+81) 080-4465-1587
>> Yokota Lab, Dept. of Computer Science
>> Tokyo Institute of Technology
>> Tokyo, Japan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Noah Watkins

On May 31, 2012, at 6:20 AM, Nam Dang wrote:

>> Stack: [0x7ff6aa828000,0x7ff6aa929000],
>> sp=0x7ff6aa9274f0,  free space=1021k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>> code)
>> C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>> 
>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>> j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
>> j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
>> j  
>> Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
>> j  Benchmark$StatsDaemon.benchmarkOne()V+22
>> j  Benchmark$StatsDaemon.run()V+26
>> v  ~StubRoutines::call_stub

Nevermind to my last comment. Hmm, I've seen this, but very rarely.

- Noah

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Nam Dang
Hi Noah,

By the way, the test suite of cephfs-java has a bug. You should put
the permission value in the form of 0777 instead of 777 since the
number has to be octal. With 777 I got directories with weird
permission settings.

Best regards
Nam Dang
Tokyo Institute of Technology
Tokyo, Japan


On Thu, May 31, 2012 at 11:43 PM, Noah Watkins  wrote:
>
> On May 31, 2012, at 6:20 AM, Nam Dang wrote:
>
>>> Stack: [0x7ff6aa828000,0x7ff6aa929000],
>>> sp=0x7ff6aa9274f0,  free space=1021k
>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>>> code)
>>> C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>>>
>>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>>> j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
>>> j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
>>> j  
>>> Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
>>> j  Benchmark$StatsDaemon.benchmarkOne()V+22
>>> j  Benchmark$StatsDaemon.run()V+26
>>> v  ~StubRoutines::call_stub
>
> Nevermind to my last comment. Hmm, I've seen this, but very rarely.
>
> - Noah
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Noah Watkins

On May 31, 2012, at 8:48 AM, Nam Dang wrote:

> Hi Noah,
> 
> By the way, the test suite of cephfs-java has a bug. You should put
> the permission value in the form of 0777 instead of 777 since the
> number has to be octal. With 777 I got directories with weird
> permission settings.

Thanks Nam, I'll fix this up.

> 
> Best regards
> Nam Dang
> Tokyo Institute of Technology
> Tokyo, Japan
> 
> 
> On Thu, May 31, 2012 at 11:43 PM, Noah Watkins  wrote:
>> 
>> On May 31, 2012, at 6:20 AM, Nam Dang wrote:
>> 
 Stack: [0x7ff6aa828000,0x7ff6aa929000],
 sp=0x7ff6aa9274f0,  free space=1021k
 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
 code)
 C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
 
 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
 j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
 j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
 j  
 Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
 j  Benchmark$StatsDaemon.benchmarkOne()V+22
 j  Benchmark$StatsDaemon.run()V+26
 v  ~StubRoutines::call_stub
>> 
>> Nevermind to my last comment. Hmm, I've seen this, but very rarely.
>> 
>> - Noah
>> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Greg Farnum
On Thursday, May 31, 2012 at 7:43 AM, Noah Watkins wrote:
> 
> On May 31, 2012, at 6:20 AM, Nam Dang wrote:
> 
> > > Stack: [0x7ff6aa828000,0x7ff6aa929000],
> > > sp=0x7ff6aa9274f0, free space=1021k
> > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> > > code)
> > > C [libcephfs.so.1+0x139d39] Mutex::Lock(bool)+0x9
> > > 
> > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > > j com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
> > > j com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
> > > j 
> > > Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
> > > j Benchmark$StatsDaemon.benchmarkOne()V+22
> > > j Benchmark$StatsDaemon.run()V+26
> > > v ~StubRoutines::call_stub
> > 
> 
> 
> 
> Nevermind to my last comment. Hmm, I've seen this, but very rarely.
Noah, do you have any leads on this? Do you think it's a bug in your Java code 
or in the C/++ libraries?
Nam: it definitely shouldn't be segfaulting just because a monitor went down. :)
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Noah Watkins

On May 31, 2012, at 3:39 PM, Greg Farnum wrote:
>> 
>> Nevermind to my last comment. Hmm, I've seen this, but very rarely.
> Noah, do you have any leads on this? Do you think it's a bug in your Java 
> code or in the C/++ libraries?

I _think_ this is because the JVM uses its own threading library, and Ceph 
assumes pthreads and pthread compatible mutexes--is that assumption about Ceph 
correct? Hence the error that looks like Mutex::lock(bool) being reference for 
context during the segfault. To verify this all that is needed is some 
synchronization added to the Java.

There are only two segfaults that I've ever encountered, one in which the C 
wrappers are used with an unmounted client, and the error Nam is seeing 
(although they could be related). I will re-submit an updated patch for the 
former, which should rule that out as the culprit.

Nam: where are you grabbing the Java patches from? I'll push some updates.


The only other scenario that comes to mind is related to signaling:

The RADOS Java wrappers suffered from an interaction between the JVM and RADOS 
client signal handlers, in which either the JVM or RADOS would replace the 
handlers for the other (not sure which order). Anyway, the solution was to link 
in the JVM libjsig.so signal chaining library. This might be the same thing we 
are seeing here, but I'm betting it is the first theory I mentioned.

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Nam Dang
I pulled the Java lib from https://github.com/noahdesu/ceph/tree/wip-java-cephfs
However, I use ceph 0.47.1 installed directly from Ubuntu's repository
with apt-get, not the one that I built with the java library. I
assumed that since the java lib is just a wrapper.

>>There are only two segfaults that I've ever encountered, one in which the C 
>>wrappers are used with an unmounted client, and the error Nam is seeing 
>>(although they
>> could be related). I will re-submit an updated patch for the former, which 
>> should rule that out as the culprit.

No, this occurs when I call mount(null) with the monitor being taken
down. The library should throw an Exception instead, but since SIGSEGV
originates from libcephfs.so so I guess it's more related to Ceph's
internal code.

Best regards,

Nam Dang
Tokyo Institute of Technology
Tokyo, Japan


On Fri, Jun 1, 2012 at 8:58 AM, Noah Watkins  wrote:
>
> On May 31, 2012, at 3:39 PM, Greg Farnum wrote:
>>>
>>> Nevermind to my last comment. Hmm, I've seen this, but very rarely.
>> Noah, do you have any leads on this? Do you think it's a bug in your Java 
>> code or in the C/++ libraries?
>
> I _think_ this is because the JVM uses its own threading library, and Ceph 
> assumes pthreads and pthread compatible mutexes--is that assumption about 
> Ceph correct? Hence the error that looks like Mutex::lock(bool) being 
> reference for context during the segfault. To verify this all that is needed 
> is some synchronization added to the Java.
>
> There are only two segfaults that I've ever encountered, one in which the C 
> wrappers are used with an unmounted client, and the error Nam is seeing 
> (although they could be related). I will re-submit an updated patch for the 
> former, which should rule that out as the culprit.
>
> Nam: where are you grabbing the Java patches from? I'll push some updates.
>
>
> The only other scenario that comes to mind is related to signaling:
>
> The RADOS Java wrappers suffered from an interaction between the JVM and 
> RADOS client signal handlers, in which either the JVM or RADOS would replace 
> the handlers for the other (not sure which order). Anyway, the solution was 
> to link in the JVM libjsig.so signal chaining library. This might be the same 
> thing we are seeing here, but I'm betting it is the first theory I mentioned.
>
> - Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Nam Dang
I made a mistake in the previous email. As Noah said, this problem is
due to the wrapper being used with an unsuccessfully mounted client.
However, I think if the mount fails, the wrapper should throw an
exception instead of letting the client continue.

Best regards,
Nam Dang
Tokyo Institute of Technology
Tokyo, Japan


On Fri, Jun 1, 2012 at 1:44 PM, Nam Dang  wrote:
> I pulled the Java lib from 
> https://github.com/noahdesu/ceph/tree/wip-java-cephfs
> However, I use ceph 0.47.1 installed directly from Ubuntu's repository
> with apt-get, not the one that I built with the java library. I
> assumed that since the java lib is just a wrapper.
>
>>>There are only two segfaults that I've ever encountered, one in which the C 
>>>wrappers are used with an unmounted client, and the error Nam is seeing 
>>>(although they
>>> could be related). I will re-submit an updated patch for the former, which 
>>> should rule that out as the culprit.
>
> No, this occurs when I call mount(null) with the monitor being taken
> down. The library should throw an Exception instead, but since SIGSEGV
> originates from libcephfs.so so I guess it's more related to Ceph's
> internal code.
>
> Best regards,
>
> Nam Dang
> Tokyo Institute of Technology
> Tokyo, Japan
>
>
> On Fri, Jun 1, 2012 at 8:58 AM, Noah Watkins  wrote:
>>
>> On May 31, 2012, at 3:39 PM, Greg Farnum wrote:

 Nevermind to my last comment. Hmm, I've seen this, but very rarely.
>>> Noah, do you have any leads on this? Do you think it's a bug in your Java 
>>> code or in the C/++ libraries?
>>
>> I _think_ this is because the JVM uses its own threading library, and Ceph 
>> assumes pthreads and pthread compatible mutexes--is that assumption about 
>> Ceph correct? Hence the error that looks like Mutex::lock(bool) being 
>> reference for context during the segfault. To verify this all that is needed 
>> is some synchronization added to the Java.
>>
>> There are only two segfaults that I've ever encountered, one in which the C 
>> wrappers are used with an unmounted client, and the error Nam is seeing 
>> (although they could be related). I will re-submit an updated patch for the 
>> former, which should rule that out as the culprit.
>>
>> Nam: where are you grabbing the Java patches from? I'll push some updates.
>>
>>
>> The only other scenario that comes to mind is related to signaling:
>>
>> The RADOS Java wrappers suffered from an interaction between the JVM and 
>> RADOS client signal handlers, in which either the JVM or RADOS would replace 
>> the handlers for the other (not sure which order). Anyway, the solution was 
>> to link in the JVM libjsig.so signal chaining library. This might be the 
>> same thing we are seeing here, but I'm betting it is the first theory I 
>> mentioned.
>>
>> - Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-05-31 Thread Noah Watkins

On May 31, 2012, at 9:44 PM, Nam Dang wrote:

> I pulled the Java lib from 
> https://github.com/noahdesu/ceph/tree/wip-java-cephfs
> However, I use ceph 0.47.1 installed directly from Ubuntu's repository
> with apt-get, not the one that I built with the java library. I
> assumed that since the java lib is just a wrapper.
> 
>>> There are only two segfaults that I've ever encountered, one in which the C 
>>> wrappers are used with an unmounted client, and the error Nam is seeing 
>>> (although they
>>> could be related). I will re-submit an updated patch for the former, which 
>>> should rule that out as the culprit.
> 
> No, this occurs when I call mount(null) with the monitor being taken
> down. The library should throw an Exception instead,

I agree. I'll push changes to the tree soon. Thanks.

- Noah--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-03 Thread Noah Watkins
Hi Nam,

I pushed the following changes to my branch. It'd be great if you
could give it a shot:

  git://github.com/noahdesu/ceph.git wip-java-cephfs

- Throws CephNotMountedException when the client is not mounted
- Adds synchronization to the native code

Thanks,
Noah

On Thu, May 31, 2012 at 6:08 AM, Nam Dang  wrote:
>
> Dear all,
>
> I am running a small benchmark for Ceph with multithreading and
> cephfs-java API.
> I encountered this issue even when I use only two threads, and I used
> only open file and creating directory operations.
>
> The piece of code is simply:
> String parent = filePath.substring(0, filePath.lastIndexOf('/'));
> mount.mkdirs(parent, 0755); // create parents if the path does not exist
> int fileID = mount.open(filePath, CephConstants.O_CREAT, 0666); //
> open the file
>
> Each thread mounts its own ceph mounting point (using
> mount.mount(null)) and I don't have any interlocking mechanism across
> the threads at all.
> It appears the error is SIGSEGV sent off by libcepfs. The message is as
> follows:
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ff6af978d39, pid=14063, tid=140697400411904
> #
> # JRE version: 6.0_26-b03
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
> #
> # An error report file with more information is saved as:
> # /home/namd/cephBench/hs_err_pid14063.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
>
> I have also attached the hs_err_pid14063.log for your reference.
> An excerpt from the file:
>
> Stack: [0x7ff6aa828000,0x7ff6aa929000],
> sp=0x7ff6aa9274f0,  free space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C  [libcephfs.so.1+0x139d39]  Mutex::Lock(bool)+0x9
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  com.ceph.fs.CephMount.native_ceph_mkdirs(JLjava/lang/String;I)I+0
> j  com.ceph.fs.CephMount.mkdirs(Ljava/lang/String;I)V+6
> j
>  Benchmark$CreateFileStats.executeOp(IILjava/lang/String;Lcom/ceph/fs/CephMount;)J+37
> j  Benchmark$StatsDaemon.benchmarkOne()V+22
> j  Benchmark$StatsDaemon.run()V+26
> v  ~StubRoutines::call_stub
>
> So I think the probably may be due to the locking mechanism of ceph
> internally. But Dr. Weil previously answered my email stating that the
> mounting is done independently so multithreading should not lead to
> this problem. If there is anyway to work around this, please let me
> know.
>
> Best regards,
>
> Nam Dang
> Email: n...@de.cs.titech.ac.jp
> HP: (+81) 080-4465-1587
> Yokota Lab, Dept. of Computer Science
> Tokyo Institute of Technology
> Tokyo, Japan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-04 Thread Greg Farnum
On Thursday, May 31, 2012 at 4:58 PM, Noah Watkins wrote:
> 
> On May 31, 2012, at 3:39 PM, Greg Farnum wrote:
> > > 
> > > Nevermind to my last comment. Hmm, I've seen this, but very rarely.
> > Noah, do you have any leads on this? Do you think it's a bug in your Java 
> > code or in the C/++ libraries?
> 
> 
> 
> I _think_ this is because the JVM uses its own threading library, and Ceph 
> assumes pthreads and pthread compatible mutexes--is that assumption about 
> Ceph correct? Hence the error that looks like Mutex::lock(bool) being 
> reference for context during the segfault. To verify this all that is needed 
> is some synchronization added to the Java.
I'm not quite sure what you mean here. Ceph is definitely using pthread 
threading and mutexes, but I don't see how the use of a different threading 
library can break pthread mutexes (which are just using the kernel futex stuff, 
AFAIK).
But I admit I'm not real good at handling those sorts of interactions, so maybe 
I'm missing something?

> There are only two segfaults that I've ever encountered, one in which the C 
> wrappers are used with an unmounted client, and the error Nam is seeing 
> (although they could be related). I will re-submit an updated patch for the 
> former, which should rule that out as the culprit.
> 
> Nam: where are you grabbing the Java patches from? I'll push some updates.
> 
> 
> The only other scenario that comes to mind is related to signaling:
> 
> The RADOS Java wrappers suffered from an interaction between the JVM and 
> RADOS client signal handlers, in which either the JVM or RADOS would replace 
> the handlers for the other (not sure which order). Anyway, the solution was 
> to link in the JVM libjsig.so signal chaining library. This might be the same 
> thing we are seeing here, but I'm betting it is the first theory I mentioned.
Hmm. I think that's an issue we've run into but I thought it got fixed for 
librados. Perhaps I'm mixing that up with libceph, or just pulling past 
scenarios out of thin air. It never manifested as Mutex count bugs, though!
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-04 Thread Noah Watkins
On Mon, Jun 4, 2012 at 1:17 PM, Greg Farnum  wrote:

> I'm not quite sure what you mean here. Ceph is definitely using pthread 
> threading and mutexes, but I don't see how the use of a different threading 
> library can break pthread mutexes (which are just using the kernel futex 
> stuff, AFAIK).
> But I admit I'm not real good at handling those sorts of interactions, so 
> maybe I'm missing something?

The basic idea was that threads in Java did not map 1:1 with kernel
threads (think co-routines), which would break a lot of stuff,
especially futex. Looking at some documentation, old JVMs had
something called Green Threads, but have now been abandoned in favor
of native threads. So maybe this theory is now irrelevant, and
evidence seems to suggest you're right and Java is using native
threads.

>> The RADOS Java wrappers suffered from an interaction between the JVM and 
>> RADOS client signal handlers, in which either the JVM or RADOS would replace 
>> the handlers for the other (not sure which order). Anyway, the solution was 
>> to link in the JVM libjsig.so signal chaining library. This might be the 
>> same thing we are seeing here, but I'm betting it is the first theory I 
>> mentioned.

> Hmm. I think that's an issue we've run into but I thought it got fixed for 
> librados. Perhaps I'm mixing that up with libceph, or just pulling past 
> scenarios out of thin air. It never manifested as Mutex count bugs, though!

I haven't tested the Rados wrappers in a while. I've never had to link
in the signal chaining library for libcephfs.

I wonder if the Mutex::lock(bool) being printed out is a red herring...
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-04 Thread Greg Farnum
On Monday, June 4, 2012 at 1:47 PM, Noah Watkins wrote:
> On Mon, Jun 4, 2012 at 1:17 PM, Greg Farnum  (mailto:g...@inktank.com)> wrote:
> 
> > I'm not quite sure what you mean here. Ceph is definitely using pthread 
> > threading and mutexes, but I don't see how the use of a different threading 
> > library can break pthread mutexes (which are just using the kernel futex 
> > stuff, AFAIK).
> > But I admit I'm not real good at handling those sorts of interactions, so 
> > maybe I'm missing something?
> 
> 
> 
> The basic idea was that threads in Java did not map 1:1 with kernel
> threads (think co-routines), which would break a lot of stuff,
> especially futex. Looking at some documentation, old JVMs had
> something called Green Threads, but have now been abandoned in favor
> of native threads. So maybe this theory is now irrelevant, and
> evidence seems to suggest you're right and Java is using native
> threads.

Gotcha, that makes sense.
 
> 
> > > The RADOS Java wrappers suffered from an interaction between the JVM and 
> > > RADOS client signal handlers, in which either the JVM or RADOS would 
> > > replace the handlers for the other (not sure which order). Anyway, the 
> > > solution was to link in the JVM libjsig.so signal chaining library. This 
> > > might be the same thing we are seeing here, but I'm betting it is the 
> > > first theory I mentioned.
> 
> > Hmm. I think that's an issue we've run into but I thought it got fixed for 
> > librados. Perhaps I'm mixing that up with libceph, or just pulling past 
> > scenarios out of thin air. It never manifested as Mutex count bugs, though!
> 
> I haven't tested the Rados wrappers in a while. I've never had to link
> in the signal chaining library for libcephfs.
> 
> I wonder if the Mutex::lock(bool) being printed out is a red herring... 
Well, it's a SIGSEGV. So my guess is that's the frame that happens to be going 
outside its allowed bounds, probably because it's the first frame actually 
accessing the memory off of a bad (probably NULL) pointer. For instance, if it 
not only failed to mount the client, but even to create the context object? 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-04 Thread Sage Weil
On Mon, 4 Jun 2012, Noah Watkins wrote:
> I wonder if the Mutex::lock(bool) being printed out is a red herring... 

FWIW this assert usually means a use-after-free.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-05 Thread Nam Dang
Hi Noah,

Thank you for the push. Now I don't have SIGSEGV anymore. By the way,
is there any way to move the ceph_is_mounted() function into the java
library's code instead of putting it in the main Ceph's code?

> The basic idea was that threads in Java did not map 1:1 with kernel
> threads (think co-routines), which would break a lot of stuff,
> especially futex. Looking at some documentation, old JVMs had
> something called Green Threads, but have now been abandoned in favor
> of native threads. So maybe this theory is now irrelevant, and
> evidence seems to suggest you're right and Java is using native
> threads.

I check the error without multithreading and it was the same. And I'm
using Java 6, so threads should be real threads.
My guess is that it's related to some internal locking mechanism.

Best regards,
Nam Dang
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV in cephfs-java, but probably in Ceph

2012-06-05 Thread Noah Watkins
Thanks for testing this, Nam.

On Tue, Jun 5, 2012 at 4:22 AM, Nam Dang  wrote:
> Hi Noah,
>
> Thank you for the push. Now I don't have SIGSEGV anymore. By the way,
> is there any way to move the ceph_is_mounted() function into the java
> library's code instead of putting it in the main Ceph's code?

The state info needed to implement ceph_is_mounted() is private, so I
think yes it does need to be in Ceph. Am I misunderstanding your
question? I am going to post the Ceph mounting patch back to the
mailing list soon, so the devs may want to change it :)

- Noah
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html