Re: [libvirt] libvirt will wait 20 minutes or hang when the network interface down
Hi Michal, In most cases the thread will wait about 20 minutes. I think that 20 minutes is not acceptable when a router which connects a lot of devices is down. Most important, some thread could hang. I am not sure whether this is a curl bug. But they propose to use these two options to fix this problem. B.R. Benjamin Wang -Original Message- From: Michal Privoznik [mailto:mpriv...@redhat.com] Sent: 2012年12月24日 16:57 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; James Ye (jiaye); Yang Zhou (yangzho) Subject: Re: [libvirt] libvirt will wait 20 minutes or hang when the network interface down On 22.12.2012 09:59, Benjamin Wang (gendwang) wrote: Hi, I find that when the network interface is down. In most scenarios, the libvirt will wait 20 minutes and report the exception. In seldom scenarios, the polling thread will hang even if the network is recovered. The following is the formal description from libcurl website: http://curl.haxx.se/docs/faq.html (Section “4.19 Why doesn't cURL return an error when the network cable is unplugged?”) The following is the similar case about thread hand: http://curl.haxx.se/mail/lib-2010-07/0108.html For “wait 20 minutes”, although this is the TCP normal mechanism, but if a server manages tons of thousands of devices by libvirt. When 1000 devices are down, This could cause thread leak for a long period. For thread hang, this could cause thread leak forever. I tried to add the following codes in esx_vi.c. It seems that these code can avoid the above issues. Would you give your comments? *if*(curl-headers == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, %s, _(Could not build CURL header list)); *return*-1; } +curl_easy_setopt(curl-handle, CURLOPT_LOW_SPEED_LIMIT, 10); +curl_easy_setopt(curl-handle, CURLOPT_LOW_SPEED_TIME, 120); curl_easy_setopt(curl-handle, CURLOPT_USERAGENT, _libvirt_-_esx_); curl_easy_setopt(curl-handle, CURLOPT_HEADER, 0); curl_easy_setopt(curl-handle, CURLOPT_FOLLOWLOCATION, 0); curl_easy_setopt(curl-handle, CURLOPT_SSL_VERIFYPEER, B.R. Benjamin Wang I wonder if this isn't a curl bug actually since it (must) know interface's down. That is, i think curl_easy_perform() which is wrapped in esxVI_CURL_Perform() should have returned an error. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] libvirt will wait 20 minutes or hang when the network interface down
Hi, I find that when the network interface is down. In most scenarios, the libvirt will wait 20 minutes and report the exception. In seldom scenarios, the polling thread will hang even if the network is recovered. The following is the formal description from libcurl website: http://curl.haxx.se/docs/faq.html (Section 4.19 Why doesn't cURL return an error when the network cable is unplugged?) The following is the similar case about thread hand: http://curl.haxx.se/mail/lib-2010-07/0108.html For wait 20 minutes, although this is the TCP normal mechanism, but if a server manages tons of thousands of devices by libvirt. When 1000 devices are down, This could cause thread leak for a long period. For thread hang, this could cause thread leak forever. I tried to add the following codes in esx_vi.c. It seems that these code can avoid the above issues. Would you give your comments? if (curl-headers == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, %s, _(Could not build CURL header list)); return -1; } +curl_easy_setopt(curl-handle, CURLOPT_LOW_SPEED_LIMIT, 10); +curl_easy_setopt(curl-handle, CURLOPT_LOW_SPEED_TIME, 120); curl_easy_setopt(curl-handle, CURLOPT_USERAGENT, libvirt-esx); curl_easy_setopt(curl-handle, CURLOPT_HEADER, 0); curl_easy_setopt(curl-handle, CURLOPT_FOLLOWLOCATION, 0); curl_easy_setopt(curl-handle, CURLOPT_SSL_VERIFYPEER, B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Connection release is not correct in libvirt and libvrt java
Hi, The following is the current code to release connection in libvirt. int virConnectClose(virConnectPtr conn) { ... if (!VIR_IS_CONNECT(conn)) { virLibConnError(VIR_ERR_INVALID_CONN, __FUNCTION__); goto error; } ... error: virDispatchError(NULL); return ret; } Now if the cable is unplugged and the application call virConnectClose to release connection, the code will enter into the error procedure, the connection Can't be released. I have changed the following two parts to fix this issue. Please give your comments: Changed Code1: int virConnectClose(virConnectPtr conn) { ... +if(NULL == conn) { +return 0; +} ... -if (!VIR_IS_CONNECT(conn)) { -virLibConnError(VIR_ERR_INVALID_CONN, __FUNCTION__); -goto error; -} ... error: virDispatchError(NULL); return ret; } Changed Code2: int virUnrefConnect(virConnectPtr conn) { ... +if(NULL == conn) { +return 0; +} -if ((!VIR_IS_CONNECT(conn))) { -virLibConnError(VIR_ERR_INVALID_ARG, _(no connection)); -return -1; -} ... } For libvirt java, there are similar issue. I have changed code as following in Collect.java. Please also give your comments. public int close() throws LibvirtException { int success = 0; if (VCP != null) { +try { success = libvirt.virConnectClose(VCP); processError(); +} +finally { // If leave an invalid pointer dangling around JVM crashes and burns // if someone tries to call a method on us // We rely on the underlying libvirt error handling to detect that // it's called with a null virConnectPointer VCP = null; + } } return success; } B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] JNA Error Callback could cause core dump.
Hi, I am using JNA 3.4.1. The problem is caused by libvirt java. You are right. B.R. Benjamin Wang -Original Message- From: Claudio Bley [mailto:cb...@av-test.de] Sent: 2012年10月19日 19:36 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Guannan Ren; Daniel Veillard; Yang Zhou (yangzho) Subject: Re: JNA Error Callback could cause core dump. BW == Benjamin Wang (gendwang) gendw...@cisco.com writes: BW Hi, When I changed code as following: BW public class Connect { BW // Load the native part BW static { BW Libvirt.INSTANCE.virInitialize(); BW try { BW ErrorHandler.processError(Libvirt.INSTANCE); BW } catch (Exception e) { BW e.printStackTrace(); BW } BW + Libvirt.INSTANCE.virSetErrorFunc(null, new BW ErrorCallback()); BW } BW The problem was caused that when JNA call setErrorFunc, it BW will create ErrorCallback object. But when GC is executed, the BW object is GCed. Yes, that's why you should keep a reference to the object around. BW But even I change code as following. BW When GC is excuted, the callback object will be moved. Then C BW can’t find this object. Both of scenarios will cause core BW dump. It seems that JNA mustn’t provide ErrorCallback Class, First off, JNA does not provide this class, it is provided by the libvirt-java wrapper. Which version of JNA did you use? As I said in a previous mail, I had crashes with JNA 3.4.2. Consequently, I cannot reproduce the crash using your code, JNA 3.4.2 and with having this series (https://www.redhat.com/archives/libvir-list/2012-October/msg00578.html) applied. (at least patch #15 is needed when using JNA 3.4.2) -- AV-Test GmbH, Klewitzstr. 7, 39112 Magdeburg, Germany Phone: +49 391 6075466, Fax: +49 391 6075469 Web: http://www.av-test.org Eingetragen am / Registered at: Amtsgericht Stendal (HRB 114076) Geschaeftsfuehrer (CEO): Andreas Marx, Guido Habicht, Maik Morgenstern -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] JNA Error Callback could cause core dump.
Hi, When I changed code as following: public class Connect { // Load the native part static { Libvirt.INSTANCE.virInitialize(); try { ErrorHandler.processError(Libvirt.INSTANCE); } catch (Exception e) { e.printStackTrace(); } + Libvirt.INSTANCE.virSetErrorFunc(null, new ErrorCallback()); } The server will generate the following core dump: Program terminated with signal 6, Aborted. #0 0x003f9b030265 in raise () from /lib64/libc.so.6 (gdb) where #0 0x003f9b030265 in raise () from /lib64/libc.so.6 #1 0x003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x003f9b07230f in _int_free () from /lib64/libc.so.6 #4 0x003f9b07276b in free () from /lib64/libc.so.6 #5 0x2cf46868 in ?? () #6 0x in ?? () The problem was caused that when JNA call setErrorFunc, it will create ErrorCallback object. But when GC is executed, the object is GCed. But even I change code as following. When GC is excuted, the callback object will be moved. Then C can't find this object. Both of scenarios will cause core dump. It seems that JNA mustn't provide ErrorCallback Class, Because nobody can use this. Please correct me. public class Connect { + private static final ErrorCallback callback = new ErrorCallback(); // Load the native part static { Libvirt.INSTANCE.virInitialize(); try { ErrorHandler.processError(Libvirt.INSTANCE); } catch (Exception e) { e.printStackTrace(); } + Libvirt.INSTANCE.virSetErrorFunc(null, callback); } B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Memory free in libvirt JNA
Hi Claudio, Sorry for my late response. I have gone through Claudio's solution. It's good. But I think this is not a common solution. There are two points: 1. This solution must use Pointerbyreference to encapsulate the Pointer. This is not clean. 2. Libvirt provides virFree method. But a common library could not provide memory management functions. My proposal is as following: 1. Add a new Class Libc.java public interface Libc extends Library{ Libc INSTANCE = (Libc) Native.loadLibrary(c, Libc.class); public void free(Pointer p); } 2. Transfer the following code as following: public SchedParameter[] getSchedulerParameters() throws LibvirtException { IntByReference nParams = new IntByReference(); SchedParameter[] returnValue = new SchedParameter[0]; -String scheduler = libvirt.virDomainGetSchedulerType(VDP, nParams); +Pointer pScheduler = libvirt.virDomainGetSchedulerType(VDP, nParams); processError(); -if (scheduler != null) { +if (pScheduler != null) { +String scheduler = pScheduler.getString(0); +libc.free(pScheduler); virSchedParameter[] nativeParams = new virSchedParameter[nParams.getValue()]; returnValue = new SchedParameter[nParams.getValue()]; libvirt.virDomainGetSchedulerParameters(VDP, nativeParams, nParams); What about your opinion? B.R. Benjamin Wang -Original Message- From: Claudio Bley [mailto:cb...@av-test.de] Sent: 2012年10月8日 20:33 To: veill...@redhat.com Cc: Benjamin Wang (gendwang); libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Memory free in libvirt JNA Hi Daniel, At Fri, 28 Sep 2012 22:34:13 +0800, Daniel Veillard wrote: sorry for the delay, I need to focuse one something else ATM ! Me too. So, no worries! ;) First do you have a small pointer indicating where in JNA that kind of native deallocation must take place, since most of the time JNA can do the marshalling all by itself ? This effects mostly Strings. JNA takes the safe assumption that functions are returning const char*s because it can't distinguish a string (const char*) from a string (char*). See https://github.com/twall/jna/blob/master/www/FrequentlyAskedQuestions.md#how-do-i-read-back-a-functions-string-result So, here is a list of methods of org.libvirt.jna.Libvirt which return a string (/probably/ a char*, not const char*) which need to be checked: virConnectBaselineCPU virConnectDomainXMLFromNative virConnectDomainXMLToNative virConnectFindStoragePoolSources virConnectGetCapabilities virConnectGetHostname virConnectGetType virConnectGetURI virDomainGetName virDomainGetOSType virDomainGetXMLDesc virDomainSnapshotGetXMLDesc virInterfaceGetMACString virInterfaceGetName virInterfaceGetXMLDesc virNWFilterGetName virNWFilterGetXMLDesc virNetworkGetBridgeName virNetworkGetName virNetworkGetXMLDesc virNodeDeviceGetName virNodeDeviceGetParent virNodeDeviceGetXMLDesc virSecretGetUsageID virSecretGetXMLDesc virStoragePoolGetName virStoragePoolGetXMLDesc virStorageVolGetKey virStorageVolGetName virStorageVolGetPath virStorageVolGetXMLDesc And second would you have an idea how to systematically detect such leaks, the kind of loop suggested to expose the issue is nor really practical to chase the leaks ... I tried valgrind, but it didn't produce any output. mtrace wasn't very helpful either. So, I just hacked this up: ,[ memcheck.py ] | import gdb | | allocations = {} | | class AllocBreak(gdb.FinishBreakpoint): | def stop(self): | global allocations | | if self.return_value != None: | callstack = [] | frame = gdb.selected_frame() | | while frame: | name = frame.name() | func = frame.function() | sal = frame.find_sal() | | funcname = func.print_name if func else '?' | line = sal.line | filename = sal.symtab.filename if sal.symtab else '?' | | callstack.append((name, filename, line, funcname)) | | frame = frame.older() | | addr = int(str(self.return_value), 16) | allocations[addr] = callstack | | | class MemAlloc (gdb.Command): | Track allocations. | def __init__(self): | super(MemAlloc, self).__init__(memalloc, gdb.COMMAND_NONE) | | def invoke(self, arg, from_tty): | top = gdb.selected_frame() | frame = top.older() | | if frame: | func = frame.function() | | if func: # and func.name.startswith(virAlloc): | ab = AllocBreak(top, True) | ab.silent = True | | | class MemFree(gdb.Command): | Track de-allocations. | def __init__(self): | super(MemFree, self).__init__(memfree, gdb.COMMAND_NONE) | | def invoke(self, arg, from_tty): | global allocations
Re: [libvirt] Memory free in libvirt JNA
Hi Claudio, Thanks for you informing about new JNA version. I try to use JNA provided API. It works well. The updated code is as following: public SchedParameter[] getSchedulerParameters() throws LibvirtException { IntByReference nParams = new IntByReference(); SchedParameter[] returnValue = new SchedParameter[0]; -String scheduler = libvirt.virDomainGetSchedulerType(VDP, nParams); +Pointer pScheduler = libvirt.virDomainGetSchedulerType(VDP, + nParams); processError(); -if (scheduler != null) { +if (pScheduler != null) { +String scheduler = pScheduler.getString(0); +Native.free(Pointer.nativeValue(pScheduler)); virSchedParameter[] nativeParams = new virSchedParameter[nParams.getValue()]; returnValue = new SchedParameter[nParams.getValue()]; libvirt.virDomainGetSchedulerParameters(VDP, nativeParams, nParams); If there is no issue. I recommend to use this solution to enhance all JNA code. BTW: Not all the returned String should be freed by JNA. For example, In Domain Class, the result returned by the method getName/ getUUIDString can't be freed. Because the reference were not allocated temporarily. We must analyze case by case. B.R. Benjamin Wang -Original Message- From: Claudio Bley [mailto:cb...@av-test.de] Sent: 2012年10月11日 23:36 To: Benjamin Wang (gendwang) Cc: veill...@redhat.com; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Memory free in libvirt JNA At Thu, 11 Oct 2012 08:37:23 +, Benjamin Wang (gendwang) wrote: Hi Claudio, Sorry for my late response. I have gone through Claudio's solution. It's good. But I think this is not a common solution. There are two points: 1. This solution must use Pointerbyreference to encapsulate the Pointer. This is not clean. Yes, as I said, this adds another level of indirection --- which is pretty useless in Java. 2. Libvirt provides virFree method. But a common library could not provide memory management functions. Sorry, I don't get your point here. My proposal is as following: 1. Add a new Class Libc.java public interface Libc extends Library{ Libc INSTANCE = (Libc) Native.loadLibrary(c, Libc.class); public void free(Pointer p); } Not every platform has a shared library called c. On Windows this would be msvcrt.dll for the Microsoft runtime. So, you would need to branch on the platform to load whatever library seems appropriate. Also, I just discovered that since version 3.3.0 JNA provides a public free method itself. Since I get crashes when using callback functions with JNA 3.2.7 in certain circumstances it is better just to require a newer version of JNA, IMHO. I'll post a few patches with improvements and memory fixes tomorrow. -- AV-Test GmbH, Henricistraße 20, 04155 Leipzig, Germany Phone: +49 341 265 310 19 Web:http://www.av-test.org Eingetragen am / Registered at: Amtsgericht Stendal (HRB 114076) Geschaeftsfuehrer (CEO): Andreas Marx, Guido Habicht, Maik Morgenstern -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Core dump caused by misusing openssl in multithread scenario!
-Original Message- From: Matthias Bolte [mailto:matthias.bo...@googlemail.com] Sent: 2012年10月7日 2:14 To: Benjamin Wang (gendwang) Cc: Daniel P. Berrange; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! 2012/10/2 Benjamin Wang (gendwang) gendw...@cisco.com: Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread? As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first. [Benjamin]: I agree. But if libcurl want to access ESXi by https. OpenSSL will be used. And libvirt must call CRYPTO_set_id_callback/CRYPTO_set_locking_callback to support multi-threads Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL? [Benjamin]: I think there is no open source license issue. We will not change libcurl or openssl source code. What we needed is to call openssl API(CRYPTO_set_id_callback/CRYPTO_set_locking_callback) to support multi-threads. -- Matthias Bolte http://photron.blogspot.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Core dump caused by misusing openssl in multithread scenario!
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread? B.R. Benjamin Wang -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: 2012年10月2日 16:02 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! On Tue, Oct 02, 2012 at 02:57:46AM +, Benjamin Wang (gendwang) wrote: Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem? That is actually the problem. The CURL docs explicitly tell you that it is *not* safe to rely on curl_easy_init in a multithreaded program. You must call curl_global_init explicitly. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Two core dumps are generated in multi-thread scenarios
Hi Matthias, This can't be reproduced 100%. I reproduce this case twice. But when I set the CURLOPT_NOSIGNAL to 1. I didn't find the similar core again. And it seems that everything works well. What do you mean stuck in a DNS lookup? B.R. Benjamin Wang -Original Message- From: Matthias Bolte [mailto:matthias.bo...@googlemail.com] Sent: 2012年9月30日 4:20 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: Two core dumps are generated in multi-thread scenarios 2012/9/23 Benjamin Wang (gendwang) gendw...@cisco.com: Hi, I found two core dumps generated in multi-thread scenarios in ESX part. Case1: libcurl support multi-thread core dump: #12 0x2aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4 #13 0x2aaabea89b86 in dprintf_formatf () from /usr/local/lib/libcurl.so.4 #14 0x2aaabea8b055 in curl_mvsnprintf () from /usr/local/lib/libcurl.so.4 #15 0x2aaabea7678f in Curl_failf () from /usr/local/lib/libcurl.so.4 #16 0x2aaabea6d871 in Curl_resolv_timeout () from /usr/local/lib/libcurl.so.4 #17 0x0006e8a8f230 in ?? () Fix code: esxVI_CURL_Connect() in esx_vi.c: I add a new line as following: curl_easy_setopt(curl-handle, CURLOPT_NOSIGNAL, 1); It took me a moment reading libcurl code until I figured out what might be happening here. The problem is that Curl_resolv_timeout uses SIGALRM + sigsetjmp/siglongjmp to realize the timeout logic. This implementation is not thread-safe as the SIGALRM might be executed on a different thread than the original thread that started the call to Curl_resolv_timeout. This in turn results in the call to Curl_resolv_timeout being continued via siglongjmp (called from the SIGALRM handler) on different thread. Setting CURLOPT_NOSIGNAL to 1 makes libcurl avoid the SIGALRM + sigsetjmp/siglongjmp implementation. This solves the problem but with the cost of losing the timeout capability. In your case a DNS lookup took longer than libcurl was willing to wait and a timeout aborted it. But the call to Curl_failf (as part of the timeout error handling) was made on the wrong thread (I think) making it segfault. IMHO there is no ideal solution here, because with CURLOPT_NOSIGNAL set to 0 (the default) libcurl can realize DNS lookup with timeout, but the error handling might occur on the wrong thread. But with CURLOPT_NOSIGNAL set to 1 the segfault is avoided but libcurl might get stuck in a DNS lookup. Are you able to reproduce this problem and can you confirm that setting CURLOPT_NOSIGNAL to 1 fixes it? -- Matthias Bolte http://photron.blogspot.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Core dump caused by misusing openssl in multithread scenario!
Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem? 2. If we need to use openssl in multi-threads, we must register the two callbacks. Currently libcurl didn't do it. If we will not register these two callbacks in libvirt, How to do? B.R. Benjamin Wang -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: 2012年10月1日 16:24 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! On Sat, Sep 29, 2012 at 01:31:07PM +, Benjamin Wang (gendwang) wrote: Hi, I am running libvirt with ESXi driver in multithread scenario to access ESXi by https. Sometimes a core dump will be generated as following: #0 0x003f9b030265 in raise () from /lib64/libc.so.6 #1 0x003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x2aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x2aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x2aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x2aaaba0d932b in esxVI_SessionIsActive (ctx=0x2aaac093ca80, sessionID=0x2aaac06932a0 `3i\300\252*, userName=0x2aaac0ae6e80 root, output=0x) at esx/esx_vi_methods.generated.c:599 #13 0x2aaaba0c7a60 in esxStorageVolumeLookupByKey (conn=0x7412, key=0x76c1 Address 0x76c1 out of bounds) at esx/esx_storage_driver.c:825 I checked that currently ESXi driver didn't initialize openssl. Because libcurl will not handle openssl for multi-thread. According to openssl API, libvirt should No code in libvirt should assume curl uses openssl - it may well have been compiled with gnutls, or nss instead. The actual flaw here is that libvirt does not invoke 'curl_global_init' from virInitialize. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Potential race condition problem
Hi, Currently virInitialize() method defined in libvirt.c has the following code: int virInitialize(void) { if (initialized) return 0; initialized = 1; if (virThreadInitialize() 0 || virErrorInitialize() 0 || virRandomInitialize(time(NULL) ^ getpid()) || virNodeSuspendInit() 0) return -1; .. } When two threads access virInitialize method, there is no lock for the initialized parameter. If the first thread enters this method and set initialized to 1, the second thread could see that initialized is 1(Because initialized is not volatiled, I say could). In some situation, before the first thread finishes all the initialization, the second thread could use some resources which should be initialized in Initialize method. If you have any comments, please let me know. Thanks! B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Potential race condition problem
Hi, OK. Now I am using JNA to access libvirt. If we add another mutex which used to access “initialized” parameter. This mutex must be pthread_mutex_init firstly and only once. But it seems that there is no way to change libvirt code. I do it as following: 1. Changing libvirt JNA code in Connect.java Old Code: public Connect(String uri) throws LibvirtException { VCP = libvirt.virConnectOpen(uri); // Check for an error processError(); ErrorHandler.processError(Libvirt.INSTANCE); } New Code: public Connect(String uri) throws LibvirtException { synchronized(this.getClass()) { VCP = libvirt.virConnectOpen(uri); } // Check for an error processError(); ErrorHandler.processError(Libvirt.INSTANCE); } This can make sure only that one thread can execute Connect. For a server application, we only need one time. So the performance is OK 2. Changing libvirt code in libvirt.c Old Code: static int initialized = 0; New Code: static int volatile initialized = 0; This can make sure the initialization will be executed once. Would you give your comments for this solution? B.R. Benjamin Wang From: Guannan Ren [mailto:g...@redhat.com] Sent: 2012年9月29日 15:43 To: Benjamin Wang (gendwang) Cc: Daniel Veillard; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Potential race condition problem On 09/29/2012 03:07 PM, Benjamin Wang (gendwang) wrote: Hi, Currently virInitialize() method defined in libvirt.c has the following code: int virInitialize(void) { if (initialized) return 0; initialized = 1; if (virThreadInitialize() 0 || virErrorInitialize() 0 || virRandomInitialize(time(NULL) ^ getpid()) || virNodeSuspendInit() 0) return -1; …… } When two threads access virInitialize method, there is no lock for the “initialized” parameter. If the first thread enters this method and set “initialized” to 1, the second thread could see that “initialized” is 1(Because initialized is not volatiled, I say could). In some situation, before the first thread finishes all the initialization, the second thread could use some resources which should be initialized in Initialize method. If you have any comments, please let me know. Thanks! B.R. Benjamin Wang As the comments above the function said, It's better to call this routine at startup in multithreaded applications to avoid potential race when initializing the library. Guannan -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Potential race condition problem
Hi, I think you misunderstand my meaning. My solution includes step1 + step2. Step1 is used to implement thread mutex. Step2 is used to handle “initialized” visibility. Without step2, the initialization could be executed several times. B.R. Benjamin Wang From: Guannan Ren [mailto:g...@redhat.com] Sent: 2012年9月29日 17:22 To: Benjamin Wang (gendwang) Cc: Daniel Veillard; libvir-list@redhat.com; Yang Zhou (yangzho); cb...@av-test.de Subject: Re: [libvirt] Potential race condition problem On 09/29/2012 03:52 PM, Benjamin Wang (gendwang) wrote: Hi, OK. Now I am using JNA to access libvirt. If we add another mutex which used to access “initialized” parameter. This mutex must be pthread_mutex_init firstly and only once. But it seems that there is no way to change libvirt code. I do it as following: 1. Changing libvirt JNA code in Connect.java Old Code: public Connect(String uri) throws LibvirtException { VCP = libvirt.virConnectOpen(uri); // Check for an error processError(); ErrorHandler.processError(Libvirt.INSTANCE); } New Code: public Connect(String uri) throws LibvirtException { synchronized(this.getClass()) { VCP = libvirt.virConnectOpen(uri); } // Check for an error processError(); ErrorHandler.processError(Libvirt.INSTANCE); } This can make sure only that one thread can execute Connect. For a server application, we only need one time. So the performance is OK 2. Changing libvirt code in libvirt.c Old Code: static int initialized = 0; New Code: static int volatile initialized = 0; This can make sure the initialization will be executed once. Would you give your comments for this solution? B.R. Benjamin Wang As far as I know the operations on volatile variable is not atomic, the usage of volatile keyword as a portable synchronization mechanism is discouraged by C. But in Java, it is a global ordering on the reads and writes to a volatile variable. So, maybe, your first solution is pretty enough good. Guannan -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Core dump caused by misusing openssl in multithread scenario!
Hi, I am running libvirt with ESXi driver in multithread scenario to access ESXi by https. Sometimes a core dump will be generated as following: #0 0x003f9b030265 in raise () from /lib64/libc.so.6 #1 0x003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x2aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x2aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x2aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x2aaaba0d932b in esxVI_SessionIsActive (ctx=0x2aaac093ca80, sessionID=0x2aaac06932a0 `3i\300\252*, userName=0x2aaac0ae6e80 root, output=0x) at esx/esx_vi_methods.generated.c:599 #13 0x2aaaba0c7a60 in esxStorageVolumeLookupByKey (conn=0x7412, key=0x76c1 Address 0x76c1 out of bounds) at esx/esx_storage_driver.c:825 I checked that currently ESXi driver didn't initialize openssl. Because libcurl will not handle openssl for multi-thread. According to openssl API, libvirt should register two methods to support mutli-threads. The detailed description is as following: http://www.openssl.org/docs/crypto/threads.html I have changed code as following: 1. virInitialize() in libvirt.c Old Code: int virInitialize(void) { ... virLogSetFromEnv(); virNetTLSInit(); ... } New Code: int virInitialize(void) { ... virLogSetFromEnv(); virNetTLSInit(); virOpenSSLInit(); ... } 2. In virnetServer.c New Code: pthread_mutex_t *lock_cs; long *lock_count; void virOpenSSLLockCallback(int mode, int type, const char *file ATTRIBUTE_UNUSED, int line ATTRIBUTE_UNUSED) { if (mode CRYPTO_LOCK) { pthread_mutex_lock((lock_cs[type])); lock_count[type]++; } else { pthread_mutex_unlock((lock_cs[type])); } } unsigned long virOpenSSLIdCallback(void) { unsigned long ret; ret=(unsigned long)pthread_self(); return(ret); } void virOpenSSLInit(void) { int i; lock_cs=OPENSSL_malloc(CRYPTO_num_locks() * sizeof(pthread_mutex_t)); lock_count=OPENSSL_malloc(CRYPTO_num_locks() * sizeof(long)); for (i=0; iCRYPTO_num_locks(); i++) { lock_count[i]=0; pthread_mutex_init((lock_cs[i]),NULL); } CRYPTO_set_id_callback(virOpenSSLIdCallback); CRYPTO_set_locking_callback(virOpenSSLLockCallback); } To be honest, virOpenSSLInit/ virOpenSSLIdCallback/ virOpenSSLLockCallback should not be defined in this file. But It seems that Makefile generated by autoconfig can't handle the new file recursively. What about this solution? If you have any comments, please feel free to contact me. BTW: If I add a new source/header file, is there a simple way to change Makefile? B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Two core dumps are generated in multi-thread scenarios
Hi, Old code(in esx_vi.c) is as below: curl_easy_setopt(curl-handle, CURLOPT_USERAGENT, libvirt-esx); curl_easy_setopt(curl-handle, CURLOPT_HEADER, 0); New code: curl_easy_setopt(curl-handle, CURLOPT_NOSIGNAL, 1); curl_easy_setopt(curl-handle, CURLOPT_USERAGENT, libvirt-esx); curl_easy_setopt(curl-handle, CURLOPT_HEADER, 0); B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年9月23日 16:52 To: Benjamin Wang (gendwang) Cc: Matthias Bolte; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Two core dumps are generated in multi-thread scenarios On Sun, Sep 23, 2012 at 03:32:52AM +, Benjamin Wang (gendwang) wrote: Hi, I found two core dumps generated in multi-thread scenarios in ESX part. Case1: libcurl support multi-thread core dump: #12 0x2aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4 #13 0x2aaabea89b86 in dprintf_formatf () from /usr/local/lib/libcurl.so.4 #14 0x2aaabea8b055 in curl_mvsnprintf () from /usr/local/lib/libcurl.so.4 #15 0x2aaabea7678f in Curl_failf () from /usr/local/lib/libcurl.so.4 #16 0x2aaabea6d871 in Curl_resolv_timeout () from /usr/local/lib/libcurl.so.4 #17 0x0006e8a8f230 in ?? () Fix code: esxVI_CURL_Connect() in esx_vi.c: I add a new line as following: curl_easy_setopt(curl-handle, CURLOPT_NOSIGNAL, 1); Where exactly in the function ? Can you send a diff of your change ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Two core dumps are generated in multi-thread scenarios
Hi, I found two core dumps generated in multi-thread scenarios in ESX part. Case1: libcurl support multi-thread core dump: #12 0x2aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4 #13 0x2aaabea89b86 in dprintf_formatf () from /usr/local/lib/libcurl.so.4 #14 0x2aaabea8b055 in curl_mvsnprintf () from /usr/local/lib/libcurl.so.4 #15 0x2aaabea7678f in Curl_failf () from /usr/local/lib/libcurl.so.4 #16 0x2aaabea6d871 in Curl_resolv_timeout () from /usr/local/lib/libcurl.so.4 #17 0x0006e8a8f230 in ?? () Fix code: esxVI_CURL_Connect() in esx_vi.c: I add a new line as following: curl_easy_setopt(curl-handle, CURLOPT_NOSIGNAL, 1); Case2: libssl support multi-thread core dump: #0 0x003f9b030265 in raise () from /lib64/libc.so.6 #1 0x003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x2aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x2aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x2aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x2aaaba0d932b in esxVI_RegisterVM_Task (ctx=0x2aaaba0d96d1, _this=0x5cf54b20, path=0x50e921c0 10.74.125.50, name=0x2aaac0ae6e80 root, asTemplate=3228119712, pool=0x5cf54b20, host=0x2aaac0693270, output=0x50e921a0) at esx/esx_vi_methods.generated.c:480 Possible Problem: Two callback functions(locking_function and threadid_func) need to be set. http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION Would you help to give some comments about this two core dump? B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Libvir JNA report SIGSEGV
Hi, The problem is located. The root cause is that esxConnectToHost/esxConnectToVCenter method defined in esx_driver.c will collect the username and password. JNA will allocate the memory for username and password. But esxConnectToHost/esxConnectToVCenter will free the memory allocated by Java as following defined in esxConnectToHost/esxConnectToVCenter: VIR_FREE(username); VIR_FREE(unescapedPassword); When JVM run the GC, it will crash because of dual free. If I comment these two lines, the system works well. But I think this is not a good solution. What about your opinion? B.R. Benjamin Wang -Original Message- From: Benjamin Wang (gendwang) Sent: 2012年9月6日 21:43 To: 'veill...@redhat.com' Cc: libvir-list@redhat.com Subject: RE: [libvirt] Libvir JNA report SIGSEGV Hi, I have looked into the code for several days. But I didn't find the root cause. Because even if I only call new Connect, the problem will occur. So this should be related to Connect.java or ConnectAuthDefault.java. Would you take a quick at the issue and give some prompt? Then I can try to fix this. Thanks! B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年9月6日 19:05 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 09:06:14AM +, Benjamin Wang (gendwang) wrote: Hi, Actually I also did another test as following. When I comment the new Connet, the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved. Okay, maybe some memory need to be pinned in some ways, I take patches ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Memory free in libvirt JNA
Hi, I wrote a code to verify the memory leak problem as following. C code in so: void checkJNAMemLeak1(int **head, int *length) { long i = 0; *head = (int *)malloc(sizeof(int) * 1); for(i=0; i1; i++) { (*head)[i] = 1; } *length = 1; } Java code: public static void testJNAMemLeak1() { PointerByReference head = new PointerByReference(); IntByReference length = new IntByReference(); while(true) { libben.checkJNAMemLeak1(head, length); System.out.println(length.getValue()); sleep(1); } } When we check memory by top command, the virt and res will increase very quickly. When we check with jconsole, there is no memory in Java heap. Even I execute GC manually by jconsole. Nothing happen. If I change java code as following: public static void testJNAMemLeak1() { PointerByReference head = new PointerByReference(); IntByReference length = new IntByReference(); while(true) { libben.checkJNAMemLeak1(head, length); System.out.println(length.getValue()); sleep(1); libc.free(head.getValue()); } } public static void testJNAMemLeak1() { PointerByReference head = new PointerByReference(); IntByReference length = new IntByReference(); while(true) { libben.checkJNAMemLeak1(head, length); System.out.println(length.getValue()); sleep(1); libc.free(head.getValue()); } } Then everything works well. The virt and res will not increase. I think we must provide the free functions for all the memory allocated by libvirt. B.R. Benjamin Wang -Original Message- From: Benjamin Wang (gendwang) Sent: 2012年9月7日 15:22 To: libvir-list@redhat.com Cc: 'veill...@redhat.com'; Yang Zhou (yangzho) Subject: RE: Memory free in libvirt JNA Hi, Overview Part of JNA API describes as following: 1. Description1: If the native method returns char* and actually allocates memory, a return type of Pointer should be used to avoid leaking the memory. It is then up to you to take the necessary steps to free the allocated memory. 2. Description2: Declare the method as returning a Structure of the appropriate type, then invoke Structure.toArray(int) to convert to an array of initialized structures of the appropriate size. Note that your Structure class must have a no-args constructor, and you are responsible for freeing the returned memory if applicable in whatever way is appropriate for the called function. And the example code shows as following: // Original C code struct Display* get_displays(int* pcount); void free_displays(struct Display* displays); // Equivalent JNA mapping Display get_displays(IntByReference pcount); void free_displays(Display[] displays); ... IntByReference pcount = new IntByReference(); Display d = lib.get_displays(pcount); Display[] displays = (Display[])d.toArray(pcount.getValue()); ... lib.free_displays(displays); That's to say. All the memory allocated by native code must be freed explicitly in JNA part. We must add some free memory methods to support the memory-freeing. Any comments? B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年8月20日 14:25 To: Benjamin Wang (gendwang) Cc: st...@tvnet.hu; daniel.schwa...@dtnet.de Subject: Re: Memory free in libvirt JNA On Mon, Aug 20, 2012 at 05:15:45AM +, Benjamin Wang (gendwang) wrote: Hi Veillard, Thanks for your reply. I checked the current Libvirt-JNA implementation. I find that a method named free defined in Domain class which is used to free the domain object. If this is mandatory, that's to say, we should a lot of methods into the current Libvirt-jna implementation to free the memory which is allocated by libvirt API. Please correct me! As far as I understat free() is aliased as finalize() on that object so the java runtime will call free() automatically on garbage collection. I'm not a java expert, check some Java litterature for more details about how this is done and the cases where free() might be better called directly. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https
Re: [libvirt] Memory free in libvirt JNA
Hi, Overview Part of JNA API describes as following: 1. Description1: If the native method returns char* and actually allocates memory, a return type of Pointer should be used to avoid leaking the memory. It is then up to you to take the necessary steps to free the allocated memory. 2. Description2: Declare the method as returning a Structure of the appropriate type, then invoke Structure.toArray(int) to convert to an array of initialized structures of the appropriate size. Note that your Structure class must have a no-args constructor, and you are responsible for freeing the returned memory if applicable in whatever way is appropriate for the called function. And the example code shows as following: // Original C code struct Display* get_displays(int* pcount); void free_displays(struct Display* displays); // Equivalent JNA mapping Display get_displays(IntByReference pcount); void free_displays(Display[] displays); ... IntByReference pcount = new IntByReference(); Display d = lib.get_displays(pcount); Display[] displays = (Display[])d.toArray(pcount.getValue()); ... lib.free_displays(displays); That's to say. All the memory allocated by native code must be freed explicitly in JNA part. We must add some free memory methods to support the memory-freeing. Any comments? B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年8月20日 14:25 To: Benjamin Wang (gendwang) Cc: st...@tvnet.hu; daniel.schwa...@dtnet.de Subject: Re: Memory free in libvirt JNA On Mon, Aug 20, 2012 at 05:15:45AM +, Benjamin Wang (gendwang) wrote: Hi Veillard, Thanks for your reply. I checked the current Libvirt-JNA implementation. I find that a method named free defined in Domain class which is used to free the domain object. If this is mandatory, that's to say, we should a lot of methods into the current Libvirt-jna implementation to free the memory which is allocated by libvirt API. Please correct me! As far as I understat free() is aliased as finalize() on that object so the java runtime will call free() automatically on garbage collection. I'm not a java expert, check some Java litterature for more details about how this is done and the cases where free() might be better called directly. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Libvir JNA report SIGSEGV
Hi, The problem only occurs in JNA part. The pure c libvirt works well. Even If I only create a connection outside of the loop, the problem can still happen. The following is the easiest problem to reproduce this problem public static void testcase1() throws LibvirtException { Connect conn=null; //connect to the hypervisor conn = new Connect(esx://10.74.125.69:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); while(true) { int[] array = new int[1]; try { Thread.sleep(1000); } catch(Exception e){} } } B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年9月6日 15:49 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Wed, Sep 05, 2012 at 08:59:07AM +, Benjamin Wang (gendwang) wrote: Hi, I try to verify the JNA with concurrent situation but meet some problems. The following is my example code: public static void testcase1() throws LibvirtException { Connect conn=null; Connect conn1=null; //connect to the hypervisor conn = new Connect(esx://10.74.125.68:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); System.out.println(conn.getVersion()); //connect to the hypervisor conn1 = new Connect(esx://10.74.125.90:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); System.out.println(conn1.getVersion()); while(true) { int[] array = new int[1]; Long version = conn.getVersion(); Long version1 = conn1.getVersion(); try { Thread.sleep(1000); } catch(Exception e) { } } } When I add line int[] array = new int[1], then the following error will be generated very quickly: # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x003f9b07046e, pid=30049, tid=1109510464 # # Java VM: OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode linux-amd64) # Problematic frame: # C [libc.so.6+0x7046e] # # An error report file with more information is saved as: I have tried to write the similar code as following. It works well. static void virXenBasic_TC001(void) { virConnectPtr conn = NULL; virConnectPtr conn1 = NULL; unsigned long version = 0; unsigned long version1 = 0; char *hostname = NULL; conn = virConnectOpenAuth(esx://10.74.125.21/?no_verify=1, virConnectAuthPtrDefault, 0); if (conn == NULL) { fprintf(stderr, Failed to open connection to qemu:///system\n); return; } conn1 = virConnectOpenAuth(esx://192.168.119.40/?no_verify=1, virConnectAuthPtrDefault, 0); if (conn1 == NULL) { fprintf(stderr, Failed to open connection to qemu:///system\n); return; } while(true) { hostname = malloc(sizeof(char) * 1); virConnectGetVersion(conn, version); virConnectGetVersion(conn, version1); free(hostname); sleep(1); } return; } Maybe you need to increase the stack or memory size of you java process or something, that doesn't look related to libvirt at all in my opinion. Well maybe the bindings fails somewhere at checking for an allocation error, but is it in JNA ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Libvir JNA report SIGSEGV
Hi, Actually I also did another test as following. When I comment the new Connet, the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved. B.R. Benjamin Wang public static void testcase1() throws LibvirtException { while(true) { int[] array = new int[1]; try { Thread.sleep(1000); } catch(Exception e){} } } -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年9月6日 16:53 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 07:53:24AM +, Benjamin Wang (gendwang) wrote: Hi, The problem only occurs in JNA part. The pure c libvirt works well. Even If I only create a connection outside of the loop, the problem can still happen. The following is the easiest problem to reproduce this problem public static void testcase1() throws LibvirtException { Connect conn=null; //connect to the hypervisor conn = new Connect(esx://10.74.125.69:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); while(true) { int[] array = new int[1]; try { Thread.sleep(1000); } catch(Exception e){} } } Then it's a java bug. The loop doesn't call or use libvirt in any way. If it crashes in the loop it's java crashing to me ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Libvir JNA report SIGSEGV
Hi, I have looked into the code for several days. But I didn't find the root cause. Because even if I only call new Connect, the problem will occur. So this should be related to Connect.java or ConnectAuthDefault.java. Would you take a quick at the issue and give some prompt? Then I can try to fix this. Thanks! B.R. Benjamin Wang -Original Message- From: Daniel Veillard [mailto:veill...@redhat.com] Sent: 2012年9月6日 19:05 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com Subject: Re: [libvirt] Libvir JNA report SIGSEGV On Thu, Sep 06, 2012 at 09:06:14AM +, Benjamin Wang (gendwang) wrote: Hi, Actually I also did another test as following. When I comment the new Connet, the program works well. So this is the problem related to Libvirt JNA. If I manually run the garbage collection for this program, it still works well. But if I run the garbage collection for the last problem, It will crash. I guess this problem is caused by ConnectAuth callback. When garbage collection is executed, the callback memory is moved. Okay, maybe some memory need to be pinned in some ways, I take patches ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Libvir JNA report SIGSEGV
Hi, I try to verify the JNA with concurrent situation but meet some problems. The following is my example code: public static void testcase1() throws LibvirtException { Connect conn=null; Connect conn1=null; //connect to the hypervisor conn = new Connect(esx://10.74.125.68:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); System.out.println(conn.getVersion()); //connect to the hypervisor conn1 = new Connect(esx://10.74.125.90:443/?no_verify=1transport=https, new ConnectAuthDefault(), 0); System.out.println(conn1.getVersion()); while(true) { int[] array = new int[1]; Long version = conn.getVersion(); Long version1 = conn1.getVersion(); try { Thread.sleep(1000); } catch(Exception e) { } } } When I add line int[] array = new int[1], then the following error will be generated very quickly: # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x003f9b07046e, pid=30049, tid=1109510464 # # Java VM: OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode linux-amd64) # Problematic frame: # C [libc.so.6+0x7046e] # # An error report file with more information is saved as: I have tried to write the similar code as following. It works well. static void virXenBasic_TC001(void) { virConnectPtr conn = NULL; virConnectPtr conn1 = NULL; unsigned long version = 0; unsigned long version1 = 0; char *hostname = NULL; conn = virConnectOpenAuth(esx://10.74.125.21/?no_verify=1, virConnectAuthPtrDefault, 0); if (conn == NULL) { fprintf(stderr, Failed to open connection to qemu:///system\n); return; } conn1 = virConnectOpenAuth(esx://192.168.119.40/?no_verify=1, virConnectAuthPtrDefault, 0); if (conn1 == NULL) { fprintf(stderr, Failed to open connection to qemu:///system\n); return; } while(true) { hostname = malloc(sizeof(char) * 1); virConnectGetVersion(conn, version); virConnectGetVersion(conn, version1); free(hostname); sleep(1); } return; } B.R. Benjamin Wang -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Question about contribution
Hi, I am from Cisco. Now we want to use and contribute to Libvirt. One simple question is as following: If our product needs a new feature for Libvirt, what is the process to submit our contribution to Libvirt? Must we be approved by some committee? B.R. Benjamin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list