Re: Memory management bugs
On 8/3/2010 3:44 PM, Lyublena Antova wrote: I tried to use Xerces with the pluggable MemoryManager and I discovered that on several occasions objects are instantiated with the global new operator that does not use the memory manager. Here are some of those cases: * initializing the EncodingValidator in EncodingValidator.cpp * creating a DOMImplementationListImpl in DOMImplementationImpl.cpp and DOMImplementationRegistry.cpp * creating a DOMNodeListImpl in DOMNodeImpl.cpp * creating a DOMDocumentTypeImpl in DOMImplementationImpl.cpp * ... In our code we essentially forbid the use of plain global “new” so the above cases blow up when Xerces is linked against our codebase. To my understanding the pluggable memory manager is used either: * by making classes derive from the XMemory class which overloads new and delete, or * by using the global overloaded placement new operators that take a DomDocument(Impl) object The problem classes mentioned above are not derived from XMemory but occasionally get instantiated with a plain “new” operator instead of the placement “new”-s. I have a fix that makes those classes inherit the XMemory class, and thus get instantiated with the global memory manager. That caused some problems because on some occasions the global placement “new”-s were shadowed by the Xmemory member “new”-s which produced unexpected results. The solution was to force the use of the global new (::new) to avoid wrong resolving of operator calls. Just out of curiousity, can you provide an example of where this occurred in the Xerces-C code? Was there any reason why the classes above do not inherit from XMemory in the first place? Just inheriting from XMemory isn't always the right fix. In some cases, there may be an available MemoryManager instance, either as a function parameter or as a class member. It may also have been done that way to avoid multiple inheritance, particularly in the DOM implementation classes. On a broader note, is there a particular reason why not have a placement new operator that takes a MemoryManager instance? Perhaps deallocation issues? Do you mean a global placement new operator? If so, I suspect the deallocation issue is why it doesn't exist. Dave - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Memory management bugs
I tried to use Xerces with the pluggable MemoryManager and I discovered that on several occasions objects are instantiated with the global new operator that does not use the memory manager. Here are some of those cases: * initializing the EncodingValidator in EncodingValidator.cpp * creating a DOMImplementationListImpl in DOMImplementationImpl.cpp and DOMImplementationRegistry.cpp * creating a DOMNodeListImpl in DOMNodeImpl.cpp * creating a DOMDocumentTypeImpl in DOMImplementationImpl.cpp * ... In our code we essentially forbid the use of plain global "new" so the above cases blow up when Xerces is linked against our codebase. To my understanding the pluggable memory manager is used either: * by making classes derive from the XMemory class which overloads new and delete, or * by using the global overloaded placement new operators that take a DomDocument(Impl) object The problem classes mentioned above are not derived from XMemory but occasionally get instantiated with a plain "new" operator instead of the placement "new"-s. I have a fix that makes those classes inherit the XMemory class, and thus get instantiated with the global memory manager. That caused some problems because on some occasions the global placement "new"-s were shadowed by the Xmemory member "new"-s which produced unexpected results. The solution was to force the use of the global new (::new) to avoid wrong resolving of operator calls. Was there any reason why the classes above do not inherit from XMemory in the first place? On a broader note, is there a particular reason why not have a placement new operator that takes a MemoryManager instance? Perhaps deallocation issues? Thanks, Lyublena
[jira] Commented: (XERCESC-1936) ICUTransService and IconvGNUransService CAN NOT deal with huge file.
[ https://issues.apache.org/jira/browse/XERCESC-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894980#action_12894980 ] kirby zhou commented on XERCESC-1936: - The following 2 lines are more suitable for UTF-8 locale users to debug. ]# ( echo ''; echo ''; for ((i=0;i<2;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done ; echo; echo '' ) > /small.xml ]# ( echo ''; echo ''; for ((i=0;i<10;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done ; echo; echo '' ) > ~/big.xml diff -x .svn -x CVS -ru --show-c-function xerces-c-3.1.1.bak/src/xercesc/util/Transcoders/IconvGNU/IconvGNUTransService.cpp xerces-c-3.1.1/src/xercesc/util/Transcoders/IconvGNU/IconvGNUTransService.cpp --- xerces-c-3.1.1.bak/src/xercesc/util/Transcoders/IconvGNU/IconvGNUTransService.cpp 2010-01-20 16:45:02.0 +0800 +++ xerces-c-3.1.1/src/xercesc/util/Transcoders/IconvGNU/IconvGNUTransService.cpp 2010-08-04 02:07:06.0 +0800 @@ -1049,6 +1049,9 @@ XMLSize_tIconvGNUTranscoder::transco for (size_t cnt = 0; cnt < maxChars && srcLen; cnt++) { size_trc = iconvFrom(startSrc, &srcLen, &orgTarget, uChSize()); if (rc == (size_t)-1) { +if (errno == EINVAL) { +break; +} if (errno != E2BIG || prevSrcLen == srcLen) { ThrowXMLwithMemMgr(TranscodingException, XMLExcepts::Trans_BadSrcSeq, getMemoryManager()); } diff -x .svn -x CVS -ru --show-c-function xerces-c-3.1.1.bak/src/xercesc/util/Transcoders/ICU/ICUTransService.cpp xerces-c-3.1.1/src/xercesc/util/Transcoders/ICU/ICUTransService.cpp --- xerces-c-3.1.1.bak/src/xercesc/util/Transcoders/ICU/ICUTransService.cpp 2010-01-20 16:45:02.0 +0800 +++ xerces-c-3.1.1/src/xercesc/util/Transcoders/ICU/ICUTransService.cpp 2010-08-04 02:28:46.0 +0800 @@ -666,7 +666,7 @@ ICUTranscoder::transcodeTo( const XMLC ); // Rememember the status before we possibly overite the error code -const bool res = (err == U_ZERO_ERROR); +const bool res = (err == U_ZERO_ERROR || (err == U_BUFFER_OVERFLOW_ERROR && startSrc > srcPtr)); // Put the old handler back err = U_ZERO_ERROR; [ > ICUTransService and IconvGNUransService CAN NOT deal with huge file. > > > Key: XERCESC-1936 > URL: https://issues.apache.org/jira/browse/XERCESC-1936 > Project: Xerces-C++ > Issue Type: Bug > Components: Utilities >Affects Versions: 2.8.0, 3.1.1 > Environment: RHEL-5.5 > glibc-2.5-49.el5_5.2 > libicu-3.6-5.11.4 >Reporter: kirby zhou > > If a huge file passed to XMLReader, it will call TransService mulitple times, > and splite the file content into several fragments. > Unfortunately, the fragment will contain incomplete multi-byte characters. > But neither ICUTransService nor IconvGNUransService deal with it. > ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and > IconvGNUransService did not deal with EINVAL. > Both 2.8.0 and 3.1.1 have the same bug. > For example, make 2 XML like that: > ]# ( echo ''; echo ''; for > ((i=0;i<2;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > > ~/small.xml > ]# ( echo ''; echo ''; for > ((i=0;i<10;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > > ~/big.xml > # the small.xml and big.xml are analogical. > ]# samples/SAXPrint -x=gbk ~/small.xml > > > 中文汉字A中文汉字A > > # with icu > ]# samples/SAXPrint -x=gbk ~/big.xml > > > Fatal Error at file /root/big.xml, line 3, char 16377 > Message: char 0x6C49 is not representable in 'gbk' encoding > # with iconvgnu > ]# samples/SAXPrint -x=gbk ~/big.xml > ]# samples/SAXPrint -x=gbk ~/big.xml > > > Fatal Error at file /root/big.xml, line 3, char 16377 > Message: invalid multi-byte sequence -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Updated: (XERCESC-1936) ICUTransService and IconvGNUransService CAN NOT deal with huge file.
[ https://issues.apache.org/jira/browse/XERCESC-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Kolpackov updated XERCESC-1936: - Hi, Can you attach the sample files to the bug report? The content that you have pasted in the description is all garbled. Also, would you be able to come up with a patch for this issue? > ICUTransService and IconvGNUransService CAN NOT deal with huge file. > > > Key: XERCESC-1936 > URL: https://issues.apache.org/jira/browse/XERCESC-1936 > Project: Xerces-C++ > Issue Type: Bug > Components: Utilities >Affects Versions: 2.8.0, 3.1.1 > Environment: RHEL-5.5 > glibc-2.5-49.el5_5.2 > libicu-3.6-5.11.4 >Reporter: kirby zhou > > If a huge file passed to XMLReader, it will call TransService mulitple times, > and splite the file content into several fragments. > Unfortunately, the fragment will contain incomplete multi-byte characters. > But neither ICUTransService nor IconvGNUransService deal with it. > ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and > IconvGNUransService did not deal with EINVAL. > Both 2.8.0 and 3.1.1 have the same bug. > For example, make 2 XML like that: > ]# ( echo ''; echo ''; for > ((i=0;i<2;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > > ~/small.xml > ]# ( echo ''; echo ''; for > ((i=0;i<10;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > > ~/big.xml > # the small.xml and big.xml are analogical. > ]# samples/SAXPrint -x=gbk ~/small.xml > > > 中文汉字A中文汉字A > > # with icu > ]# samples/SAXPrint -x=gbk ~/big.xml > > > Fatal Error at file /root/big.xml, line 3, char 16377 > Message: char 0x6C49 is not representable in 'gbk' encoding > # with iconvgnu > ]# samples/SAXPrint -x=gbk ~/big.xml > ]# samples/SAXPrint -x=gbk ~/big.xml > > > Fatal Error at file /root/big.xml, line 3, char 16377 > Message: invalid multi-byte sequence -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] Created: (XERCESC-1936) ICUTransService and IconvGNUransService CAN NOT deal with huge file.
ICUTransService and IconvGNUransService CAN NOT deal with huge file. Key: XERCESC-1936 URL: https://issues.apache.org/jira/browse/XERCESC-1936 Project: Xerces-C++ Issue Type: Bug Components: Utilities Affects Versions: 2.8.0, 3.1.1 Environment: RHEL-5.5 glibc-2.5-49.el5_5.2 libicu-3.6-5.11.4 Reporter: kirby zhou If a huge file passed to XMLReader, it will call TransService mulitple times, and splite the file content into several fragments. Unfortunately, the fragment will contain incomplete multi-byte characters. But neither ICUTransService nor IconvGNUransService deal with it. ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and IconvGNUransService did not deal with EINVAL. Both 2.8.0 and 3.1.1 have the same bug. For example, make 2 XML like that: ]# ( echo ''; echo ''; for ((i=0;i<2;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > ~/small.xml ]# ( echo ''; echo ''; for ((i=0;i<10;++i)); do echo -n '中文汉字A'; done ; echo; echo '' ) > ~/big.xml # the small.xml and big.xml are analogical. ]# samples/SAXPrint -x=gbk ~/small.xml 中文汉字A中文汉字A # with icu ]# samples/SAXPrint -x=gbk ~/big.xml Fatal Error at file /root/big.xml, line 3, char 16377 Message: char 0x6C49 is not representable in 'gbk' encoding # with iconvgnu ]# samples/SAXPrint -x=gbk ~/big.xml ]# samples/SAXPrint -x=gbk ~/big.xml Fatal Error at file /root/big.xml, line 3, char 16377 Message: invalid multi-byte sequence -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org