[
https://issues.apache.org/jira/browse/XERCESC-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083262#comment-16083262
]
Scott Cantor commented on XERCESC-2094:
---------------------------------------
What's happening is the code auto-creates a transcoder when the linefeed is
seen, and then it finishes parsing the prolog and explicitly resets the
encoding to the value in the file, overwriting the original one.
I'm not sure there's a correct fix because it's consuming the subsequent bits
of the prolog using what may be an incorrect encoding, and there's a comment in
the file noting that the implicit transcoder is created based on the assumption
it can't be changed afterward.
It would be possible to just fail when it tries to set the encoding afterward,
but I think it's legal to spread the declaration across lines, so I think the
bug is really that it can't consume enough of the prolog before deciding what
to do.
> Memory leak related to invalid encoding
> ---------------------------------------
>
> Key: XERCESC-2094
> URL: https://issues.apache.org/jira/browse/XERCESC-2094
> Project: Xerces-C++
> Issue Type: Bug
> Affects Versions: 3.1.3, 3.1.4
> Environment: Probably all. In that case Ubuntu 16.04 x86_64
> Reporter: Even Rouault
> Attachments: xerces-c-leak.xml
>
>
> Issue originally found through OSS-Fuzz on GDAL ( for reference
> https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=1685 : the link will
> not be publicly accessible until 90 days have passe), but can be reproduced
> with Xerces-C SAX2Count utility.
> On the attached file, Valgrind reports a memory leak:
> The content of the file is:
> {{{
> <?xml[newline character]
> version="1.0" encoding="U"?><foo xmlns="http://schemas.opengis.net/gml"/>
> }}}
> valgrind --leak-check=full /home/even/install-xerces-c-3.1.4/bin/SAX2Count
> xerces-c-leak.xml
> ==21268== Memcheck, a memory error detector
> ==21268== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==21268== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
> ==21268== Command: /home/even/install-xerces-c-3.1.4/bin/SAX2Count
> /home/even/gdal/trunk/gdal/xerces-c-leak.xml
> ==21268==
> Fatal Error at file /home/even/gdal/trunk/gdal/xerces-c-leak.xml, line 1,
> char 35
> Message: unable to create converter for 'U' encoding
> ==21268==
> ==21268== HEAP SUMMARY:
> ==21268== in use at exit: 76,348 bytes in 10 blocks
> ==21268== total heap usage: 9,244 allocs, 9,234 frees, 1,282,907 bytes
> allocated
> ==21268==
> ==21268== 52 (40 direct, 12 indirect) bytes in 1 blocks are definitely lost
> in loss record 4 of 10
> ==21268== at 0x4C2E0EF: operator new(unsigned long) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==21268== by 0x4FF9B58: xercesc_3_1::MemoryManagerImpl::allocate(unsigned
> long) (MemoryManagerImpl.cpp:40)
> ==21268== by 0x4F7EE05: xercesc_3_1::XMemory::operator new(unsigned long,
> xercesc_3_1::MemoryManager*) (XMemory.cpp:68)
> ==21268== by 0x4F7E660:
> xercesc_3_1::ENameMapFor<xercesc_3_1::XMLUTF8Transcoder>::makeNew(unsigned
> long, xercesc_3_1::MemoryManager*) const (TransENameMap.c:50)
> ==21268== by 0x4F7AF20:
> xercesc_3_1::XMLTransService::makeNewTranscoderFor(unsigned short const*,
> xercesc_3_1::XMLTransService::Codes&, unsigned long,
> xercesc_3_1::MemoryManager*) (TransService.cpp:147)
> ==21268== by 0x5010A75: xercesc_3_1::XMLReader::refreshCharBuffer()
> (XMLReader.cpp:523)
> ==21268== by 0x4FFA5AA: peekNextChar (XMLReader.hpp:767)
> ==21268== by 0x4FFA5AA: xercesc_3_1::ReaderMgr::peekNextChar()
> (ReaderMgr.cpp:158)
> ==21268== by 0x5016297: xercesc_3_1::XMLScanner::scanProlog()
> (XMLScanner.cpp:1238)
> ==21268== by 0x4FEE371:
> xercesc_3_1::IGXMLScanner::scanDocument(xercesc_3_1::InputSource const&)
> (IGXMLScanner.cpp:206)
> ==21268== by 0x5017E6D: xercesc_3_1::XMLScanner::scanDocument(unsigned
> short const*) (XMLScanner.cpp:400)
> ==21268== by 0x5018221: xercesc_3_1::XMLScanner::scanDocument(char const*)
> (XMLScanner.cpp:408)
> ==21268== by 0x5044F47: xercesc_3_1::SAX2XMLReaderImpl::parse(char const*)
> (SAX2XMLReaderImpl.cpp:451)
> ==21268==
> ==21268== LEAK SUMMARY:
> ==21268== definitely lost: 40 bytes in 1 blocks
> ==21268== indirectly lost: 12 bytes in 1 blocks
> ==21268== possibly lost: 0 bytes in 0 blocks
> ==21268== still reachable: 76,296 bytes in 8 blocks
> ==21268== suppressed: 0 bytes in 0 blocks
> ==21268== Reachable blocks (those to which a pointer was found) are not shown.
> ==21268== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==21268==
> ==21268== For counts of detected and suppressed errors, rerun with: -v
> ==21268== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> I've found that the leak occurs only if the following conditions are met:
> there is a newline character between <?xml and version="1.0" and the value of
> the encoding attribute is a invalid encoding name.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]