[xml] This mailing list will be retired by the end of Oct 2022

2022-10-21 Thread Nick Wellnhofer via xml
According to [1], GNOME's Mailman platform is being decommissioned which 
probably means that this mailing list will go away soon.


Nick

[1] https://mail.gnome.org/archives/foundation-list/2022-October/msg2.html
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.3

2022-10-14 Thread Nick Wellnhofer via xml

Version 2.10.3 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

### Security

- [CVE-2022-40304] Fix dict corruption caused by entity reference cycles
- [CVE-2022-40303] Fix integer overflows with XML_PARSE_HUGE
- Fix overflow check in SAX2.c

### Portability

- win32: Fix build with VS2013

### Build system

- cmake: Set SOVERSION

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.2

2022-08-29 Thread Nick Wellnhofer via xml

Version 2.10.2 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

This should really fix the build with Python 3.10.

### Improvements

- Remove set-but-unused variable in xmlXPathScanName
- Silence -Warray-bounds warning

### Build system

- build: require automake-1.16.3 or later (Xi Ruoyao)
- Remove generated files from distribution

### Test suite

- Don't create missing.xml when running testapi

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.1

2022-08-25 Thread Nick Wellnhofer via xml

Version 2.10.1 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

This fixes some showstoppers reported by early adopters of 2.10.0. Thsnk for 
the reports!


### Regressions

- Fix xmlCtxtReadDoc with encoding

### Bug fixes

- Fix HTML parser with threads and --without-legacy

### Build system

- Fix build with Python 3.10
- cmake: Disable version script on macOS
- Remove Makefile rule to build testapi.c

### Documentation

- Switch back to HTML output for API documentation
- Port doc/examples/index.py to Python 3
- Fix order of exports in libxml2-api.xml
- Remove libxml2-refs.xml

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.0

2022-08-17 Thread Nick Wellnhofer via xml

Version 2.10.0 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

In this release, I started to remove a lot of old cruft like build systems for 
outdated platforms which haven't been touched in 10+ years.


The Docbook parser module has been removed completely. As far as I can tell, 
this was experimental code which never really worked and generated a 
deprecation warning for 15+ years.


Some other modules are now disabled by default and will eventually be removed 
completely:


- Support for XPointer locations (ranges and points): This was based on
  a W3C specification which never got beyond Working Draft status. To my
  knowledge, there's no software supporting this spec which is still
  maintained. You now have to enable this code by passing the
  `--with-xptr-locs` configuration option. Be warned that this part of
  the code base is buggy and had many security issues in the past.

- Support for the built-in FTP client (`--with-ftp`).

- Support for "legacy" functions (`--with-legacy`).

I also started to deprecate several functions of the public API. Most of them 
should be completely unused and will generate a deprecation warning now.


Special thanks to David Seifert and Daniel Engberg who contributed many 
improvements to the build system, and to David Kilzer for many patches that 
harden security.


It's likely that this release will break a few things. If you're concerned 
about stability, I'd suggest to wait for 2.10.1 which I plan to release in 6-8 
weeks. Going forward, patch releases will only contain important bug fixes. My 
plan is to bump the minor version about every six months and possibly make bug 
fix releases for older branches as well.


Here's the full changelog:

### Security

- [CVE-2022-2309] Reset nsNr in xmlCtxtReset
- Reserve byte for NUL terminator and report errors consistently in xmlBuf and
  xmlBuffer (David Kilzer)
- Fix missing NUL terminators in xmlBuf and xmlBuffer functions (David Kilzer)
- Fix integer overflow in xmlBufferDump() (David Kilzer)
- xmlBufAvail() should return length without including a byte for NUL
  terminator (David Kilzer)
- Fix ownership of xmlNodePtr & xmlAttrPtr fields in xmlSetTreeDoc() (David
  Kilzer)
- Use xmlNewDocText in xmlXIncludeCopyRange
- Fix use-after-free bugs when calling xmlTextReaderClose() before
  xmlFreeTextReader() on post-validating parser (David Kilzer)
- Use UPDATE_COMPAT() consistently in buf.c (David Kilzer)
- fix: xmlXPathParserContext could be double-delete in  OOM case. (jinsub ahn)

### Removals and deprecations

- Disable XPointer location support by default
- Remove outdated xml2Conf.sh
- Deprecate module init and cleanup functions
- Remove obsolete XML Software Autoupdate (XSA) file
- Remove DOCBparser
- Remove obsolete Python test framework
- Remove broken VxWorks support
- Remove broken Mac OS 9 support
- Remove broken bakefile support
- Remove broken Visual Studio 2010 support
- Remove broken Windows CE support
- Deprecate IDREF-related functions in valid.h
- Deprecate legacy functions
- Disable legacy support by default
- Deprecate all functions in nanoftp.h
- Disable FTP support by default
- Add XML_DEPRECATED macro
- Remove elfgcchack.h

### Regressions

- Skip incorrectly opened HTML comments
- Restore behavior of htmlDocContentDumpFormatOutput() (David Kilzer)

### Bug fixes

- Fix memory leak with invalid XSD
- Make XPath depth check work with recursive invocations
- Fix memory leak in xmlLoadEntityContent error path
- Avoid double-free if malloc fails in inputPush
- Properly fold whitespace around the QName value when validating an XSD
  schema. (Damjan Jovanovic)
- Add whitespace folding for some atomic data types that it's missing on.
  (Damjan Jovanovic)
- Don't add IDs containing unexpanded entity references

### Improvements

- Avoid calling xmlSetTreeDoc
- Simplify xmlFreeNode
- Don't reset nsDef when changing node content
- Fix unintended fall-through in xmlNodeAddContentLen
- Remove unused xmlBuf functions (David Kilzer)
- Implement xpath1() XPointer scheme
- Add configuration flag for XPointer locations support
- Fix compiler warnings in Python code
- Mark more static data as `const` (David Kilzer)
- Make xmlStaticCopyNode non-recursive
- Clean up encoding switching code
- Simplify recursive pthread mutex
- Use non-recursive mutex in dict.c
- Fix parser progress checks
- Avoid arithmetic on freed pointers
- Improve buffer allocation scheme
- Remove unneeded #includes
- Add support for some non-standard escapes in regular expressions. (Damjan
  Jovanovic)
- htmlParseComment: handle abruptly-closed comments (Mike Dalessio)
- Add let variable tag support (Oliver Diehl)
- Add value-of tag support (Oliver Diehl)
- Remove useless call to xmlRelaxNGCleanupTypes
- Don't include ICU headers in public headers
- Update `xmlStrlen()` to use POSIX / ISO C `strlen()` (Mike Dalessio)
- Fix unused variable warnings with disabled features
- Only warn on invalid redeclarations of 

Re: [xml] How can I parse an XML file whose filesystem path is a Unicode string?

2022-08-02 Thread Nick Wellnhofer via xml

On 31/07/2022 17:40, Paul Kinnucan via xml wrote:
My Xerces-c implementation uses a custom entity resolver to 
resolve file entities. I might need a custom entity resolver to fix the 
problem with the libxml2 implementation. However, libxml2 does not seem to 
support custom entity resolvers. At lease, I have not been able to find this 
feature in the doc or the libxml2 code base on GitHub.


You can install a custom entity loader with xmlSetExternalEntityLoader:

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html#xmlSetExternalEntityLoader

Another option is to use "input callbacks":

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-xmlIO.html#xmlRegisterInputCallbacks

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743

2022-06-27 Thread Nick Wellnhofer via xml

On 24/06/2022 21:48, enh via xml wrote:
did anyone report https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743 
 to libxml2 directly?


No, this wasn't reported. For now, these issues should be reported to the 
libxml2 bug tracker. That said, I will resubscribe to OSS-Fuzz soon and handle 
new issues directly.


sadly, it looks like there are actually a bunch of fuzzer-found bugs that may 
never have been reported upstream? (i haven't checked; i'm just guessing.) see 
https://bugs.chromium.org/p/oss-fuzz/issues/list?q=libxml2=2 
 for example.


Most of the timeout and OOM issues are hard to fix. I'll try to address some 
of them in the next months.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.9.14

2022-05-02 Thread Nick Wellnhofer via xml

Version 2.9.14 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.9/

Note that starting with 2.9.13, libxml2 tarballs are published on 
download.gnome.org instead of ftp.xmlsoft.org.


### Security

- [CVE-2022-29824] Integer overflow in xmlBuf and xmlBuffer
- Fix potential double-free in xmlXPtrStringRangeFunction
- Fix memory leak in xmlFindCharEncodingHandler
- Normalize XPath strings in-place
- Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars()
  (David Kilzer)
- Fix leak of xmlElementContent (David Kilzer)

### Bug fixes

- Fix parsing of subtracted regex character classes
- Fix recursion check in xinclude.c
- Reset last error in xmlCleanupGlobals
- Fix certain combinations of regex range quantifiers
- Fix range quantifier on subregex

### Improvements

- Fix recovery from invalid HTML start tags

### Build system, portability

- Define LFS macros before including system headers
- Initialize XPath floating-point globals
- configure: check for icu DEFS (James Hilliard)
- configure.ac: produce tar.xz only (GNOME policy) (David Seifert)
- CMakeLists.txt: Fix LIBXML_VERSION_NUMBER
- Fix build with older Python versions
- Fix --without-valid build

Thanks to all contributors!

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Is anyone still using XPointer ranges?

2022-04-07 Thread Nick Wellnhofer via xml
I'm curious if there are people out there who still use XPointer ranges, 
specifically things like the range-to XPath extension function. This part of 
the code base is extremely buggy and the latest spec seems to be a Working 
Draft from 2002 which was never finished [1]. The xpointer() scheme is listed 
as "being reviewed" in the XPointer registry since 2006 [2]. I couldn't find 
any other projects that are still maintained and implement this feature. Here 
are some that don't:


- Xerces: "The XPointer xpointer() Scheme is currently not supported." [3]
- Mvp.Xml: " XPointer xpointer() Scheme (XPath subset only)" [4]

Since I have no plans to work on this part of the code base, I'm thinking 
about phasing out support for this feature. xpointer() expressions will 
continue to work but without any XPath extensions for locations, ranges and 
points. Just like the xpath1() scheme which we should start to support as well.


Nick


[1] https://www.w3.org/TR/xptr-xpointer/
[2] https://www.w3.org/2005/04/xpointer-schemes/
[3] https://xerces.apache.org/xerces2-j/faq-xinclude.html
[4] http://mvp-xml.sourceforge.net/xinclude/


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Euro sign in xml:id

2022-04-06 Thread Nick Wellnhofer via xml

On 06/04/2022 00:40, Einhard Leichtfuß wrote:

I noticed that xmllint complains about the Euro sign ("€") in an xml:id.
  - "validity error : xml:id : attribute value € is not an NCName"

The W3C's XML specification, however, seems to allow this:
  - https://www.w3.org/TR/xml-id/#processing
  - https://www.w3.org/TR/xmlschema-2/#ID
  - https://www.w3.org/TR/xml-names/#NT-NCName
  - https://www.w3.org/TR/xml/#NT-NameStartChar
  * '€' is #x20ac which is in the range [#x2070-#x218F], a subset of
NameStartChar, and may, therefore, occur anywhere in an NCName.

Am I mistaken above, should I look at another specification, or is this
a bug?


This is a bug. The xmlValidate*Name functions in tree.c weren't updated to XML 
1.0, Fifth Edition which includes the following change:


https://www.w3.org/XML/xml-V10-4e-errata#E09

This issue is now tracked here:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/364

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Support libxml2 and libxslt on Open Collective

2022-02-27 Thread Nick Wellnhofer via xml

On 23/02/2022 23:39, Eberhard wrote:

Dumb question.  How do I contribute in dollars?  I get Euros and no option
to change.  E


Everything should be set to USD now.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-23 Thread Nick Wellnhofer via xml

On 23/02/2022 08:17, Stefan Behnel wrote:
Could you make the archives available in a (second) format that matches all 
(previous) releases?


The archives are automatically converted to .tar.xz when uploaded to the GNOME 
download server. I have no influence on that. Personally, I'd prefer .tar.gz 
for compatibility reasons, but I don't have a strong opinion.


I asked on GNOME infra if it is possible to offer .tar.gz downloads, but this 
would require changes to the upload script.


https://gitlab.gnome.org/Infrastructure/Infrastructure/-/issues/768

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-21 Thread Nick Wellnhofer via xml

On 21/02/2022 14:57, Mike Dalessio wrote:
I'm not asking specifically for a CVSS score for this vulnerability, and I'm 
certainly not asking you to create a CVE for every memory fix that's found. 
I'm only asking for a more accessible explanation of the conditions under 
which an application might be vulnerable to this already-published CVE.


From my limited analysis, there are two scenarios:

1. When using the reader API (xmlreader.h, xmlTextReader)

  Conditions:

  - Create a reader with parser option XML_PARSE_DTDVALID (or "parser
property" XML_PARSER_VALIDATE) but without parser option XML_PARSE_NOENT
(XML_PARSER_SUBST_ENTITIES)
  - Parse an untrusted document

  Impact:

  - Crash (DoS)
  - Memory disclosure via error channel

2. When using another parser API

  Conditions:

  - Parse an untrusted document with XML_PARSE_DTDVALID but without
XML_PARSE_NOENT
  - Delete a portion of the resulting document
  - Call xmlGetID on the document

  Potential impact:

  - Crash (DoS)
  - Arbitrary memory disclosure
  - Arbitrary code execution

Would this be an appropriate explanation for me to include in my security 
advisory?


 > An application may be vulnerable to a denial-of-service attack if it parses 
an untrusted document with parse options `DTDVALID` on, and `NOENT` off.


No, that's understating the severity. As I tried to explain, it's impossible 
to assess the severity without auditing each and every downstream project. 
Since clever exploitation of use-after-free errors can result in code 
execution, I have to assume the worst case if you force me to make a general 
statement.


DISCLAIMER: I make no guarantees regarding the accuracy and completeness of my 
statements above.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Support libxml2 and libxslt on Open Collective

2022-02-21 Thread Nick Wellnhofer via xml

Hello,

You can now support libxml2 and libxslt financially on Open Collective:

https://opencollective.com/libxml2

All donations go through the Open Source Collective, a non-profit organization 
providing financial and legal infrastructure for thousands of open source 
projects.


https://www.oscollective.org/

If you prefer, you can also support me directly. Just get in touch by email.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-20 Thread Nick Wellnhofer via xml

On 20/02/2022 20:50, Mike Dalessio wrote:
Is there any additional information about CVE-2022-23308 (other than the 
commit log) that would help downstream projects triage? Was there a CVSS score 
calculated or severity assigned?


In this case, the CVE record is managed by a third party. It should be made 
public soon, but I have no influence on that. In my personal opinion, the 
whole CVE system is severely flawed with regard to OSS projects. Basically, 
anyone can request a CVE ID for arbitrary projects without having to 
coordinate with maintainers.


It's often hard, if not impossible, to come up with meaningful CVSS scores for 
vulnerabilities in software libraries. If there's a flaw in a certain library 
function, it really depends on how this function used by downstream projects. 
If you look at major Linux distros, there are 500+ projects with a direct 
dependency on libxml2, and thousands with an indirect dependency. Most of them 
don't call the vulnerable functions at all, some others are libraries 
themselves, so it all depends on their users.


There are quite a few preconditions to be met to trigger a use-after-free in 
this particular case, so I'm not overly concerned. Even then, it seems 
anything but trivial come up with a serious exploit. But I'm not really an 
expert and you never can tell without auditing tens or hundreds of downstream 
projects. Besides, I only have limited resources to assess the impact of 
security issues, and it's always possible that I missed something.


Note that for some reason, GitLab truncates the commit message after ~1000 
characters with no obvious way to expand it, at least on gitlab.gnome.org. You 
can see the full commit message on the GitHub mirror:



https://github.com/GNOME/libxml2/commit/652dd12a858989b14eed4e84e453059cd3ba340e

Nick



___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.9.13

2022-02-20 Thread Nick Wellnhofer via xml

Version 2.9.13 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.9/

Note that starting with this release, libxml2 tarballs are published on 
download.gnome.org instead of ftp.xmlsoft.org.


### Security

- [CVE-2022-23308] Use-after-free of ID and IDREF attributes
  (Thanks to Shinji Sato for the report)
- Use-after-free in xmlXIncludeCopyRange (David Kilzer)
- Fix null deref in xmlSchemaGetComponentTargetNs (huangduirong)
- Fix memory leak in xmlXPathCompNodeTest
- Fix null pointer deref in xmlStringGetNodeList
- Fix several memory leaks found by Coverity (David King)

### Fixed regressions

- Fix regression in RelaxNG pattern matching
- Properly handle nested documents in xmlFreeNode
- Fix regression with PEs in external DTD
- Fix random dropping of characters on dumping ASCII encoded XML (Mohammad
  Razavi)
- Revert "Make schema validation fail with multiple top-level elements"
- Fix regression when parsing invalid HTML tags in push mode
- Fix regression parsing public IDs literals in HTML
- Fix buffering in xmlOutputBufferWrite
- Fix whitespace when serializing empty HTML documents
- Fix XPath recursion limit
- Fix regression in xmlNodeDumpOutputInternal
- Work around lxml API abuse

### Bug fixes

- Fix xmlSetTreeDoc with entity references
- Fix double counting of CRLF in comments
- Make sure to grow input buffer in xmlParseMisc
- Don't ignore xmllint options after "-"
- Don't normalize namespace URIs in XPointer xmlns() scheme
- Fix handling of XSD with empty namespace
- Also register HTML document nodes
- Make xmllint return an error if arguments are missing
- Fix handling of ctxt->base in xmlXPtrEvalXPtrPart
- Fix xmllint --maxmem
- Fix htmlReadFd, which was using a mix of xml and html context functions
  (Finn Barber)
- Move current position before possible calling of ctxt->sax->characters
  (Yulin Li)
- Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
  (David Kilzer)
- Patch to forbid epsilon-reduction of final states (Arne Becker)
- Avoid segfault at exit when using custom memory functions (Mike Dalessio)

### Tests, code quality, fuzzing

- Remove .travis.yml
- Make xmlFuzzReadString return a zero size in error case
- Fix unused function warning in testapi.c
- Update NewsML DTD in test suite
- Add more checks for malloc failures in xmllint.c
- Avoid potential integer overflow in xmlstring.c
- Run CI tests with UBSan implicit-conversion checks
- Fix casting of line numbers in SAX2.c
- Fix integer conversion warnings in hash.c
- Add explicit casts in runtest.c
- Fix integer conversion warning in xmlIconvWrapper
- Add suffix to unsigned constant in xmlmemory.c
- Add explicit casts in testchar.c
- Fix integer conversion warnings in xmlstring.c
- Add explicit cast in xmlURIUnescapeString
- Remove unused variable in xmlCharEncOutFunc (David King)

### Build system, portability

- Remove xmlwin32version.h
- Fix fuzzer test with VPATH build
- Support custom prefix when installing Python module
- Remove Makefile.win
- Remove CVS and SVN-related code
- Port python 3.x module to Windows and improve distutils (Chun-wei Fan)
- Correctly install the HTML examples into their subdirectory (Mattia Rizzolo)
- Refactor the settings of $docdir (Mattia Rizzolo)
- Remove unused configure checks (Ben Boeckel)
- python/Makefile.am: use *_LIBADD, not *_LDFLAGS for LIBS (Sam James)
- Fix check for libtool in autogen.sh
- Use version in configure.ac for CMake (Timothy Lyanguzov)
- Add CMake alias targets for embedded projects (Markus Rickert)

### Documentation

- Remove SVN keyword anchors
- Rework README
- Remove README.cvs-commits
- Remove old ChangeLog
- Update hyperlinks
- Remove README.docs
- Remove MAINTAINERS
- Remove xmltutorial.pdf
- Upload documentation to GitLab pages
- Document how to escape XML_CATALOG_FILES
- Fix libxml2.doap
- Update URL for libxml++ C++ binding (Kjell Ahlstedt)
- Generate devhelp2 index file (Emmanuele Bassi)
- Mention XML_CATALOG_FILES is space-separated (Jan Tojnar)
- Add documentaiton for xmllint exit code 10 (Rainer Canavan)
- Fix some validation errors in the FAQ (David King)
- Add instructions on how to use CMake to compile libxml (Markus Rickert)

Thanks to all contributors!

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Intent to remove build systems for outdated platforms

2022-02-16 Thread Nick Wellnhofer via xml
I plan to remove several directories from the libxml2 repo containing build 
systems for outdated platforms.

VxWorks

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/VxWorks

Bakefile

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/bakefile

MacOS 9

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/macos

VMS

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/vms

Windows CE

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/win32/wince

Visual Studio 2010

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/win32/VC10

These files haven’t been updated in 10+ years and most likely broken.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Schema validation skipping IDC

2022-02-09 Thread Nick Wellnhofer via xml

On 09/02/2022 14:48, Stefan de Konink wrote:

On Wednesday, February 9, 2022 1:25:41 PM CET, Nick Wellnhofer wrote:
I'm always reluctant to add new features, especially if it sounds like it 
only solves a problem for a single user. Do you want to disable checking of 
identity constraints for performance reasons or is there another use case?


They are indeed based on performance reasons, where the syntax validation is 
extremely fast and powerful (even single threaded, as expected), but IDC is 
(for the size of our documents) costly.


Can provide more detail about the performance problem? Ideally by opening a 
Gitlab issue.


Like Eric pointed out; to support this use case now it requires two schema's 
one with and one without. Since our schema consists of 384 individual xsd's 
that is less trivial to search and replace on the fly.


It seems that you only have to remove certain elements from the XSDs which 
should be easy to automate.


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Schema validation skipping IDC

2022-02-09 Thread Nick Wellnhofer via xml

On 01/02/2022 13:39, Stefan de Konink wrote:

Hi,

Would a patch be accepted that would create an option to disable identity 
constraints at runtime? Use case: only syntactically validate a file.


I'm always reluctant to add new features, especially if it sounds like it only 
solves a problem for a single user. Do you want to disable checking of 
identity constraints for performance reasons or is there another use case?


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-14 Thread Nick Wellnhofer via xml

On 12/01/2022 17:30, Stefan de Konink wrote:
If you're seeing degraded performance on large documents, it's likely 
another issue with quadratic runtime. Fixing such issues algorithmically 
should typically yield much better results than trying to work around them 
with multi-threading.


What can I do to identify these thing in a usable way? Would a profiler help 
in this case?


Yes, profiling is usually the quickest way to see which part of the code is 
causing performance issues. Then you could try to isolate the problem and come 
up with a test case where doubling the input size results in quadrupling the 
execution time or shows other superlinear behavior.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 11/01/2022 11:38, Daniel Veillard wrote:

  So you want to reintegrate libxml2 within the GNOME framework ? TBH
now that I have very limited bandwidth that's probably the right thing
to do.


I didn't mean the GNOME desktop environment itself, but the infrastructure 
that the GNOME Foundation offers. Mostly the GitLab instance which could be 
used to create and distribute releases and the GitLab Wiki which could be used 
for documentation.


It seems like a historical accident that libxml2 ended up under the GNOME 
umbrella, but why shouldn't we use the features we are offered? It certainly 
makes collaboration easier than maintaining your own website. It's also nice 
to have a self-hosted platform compared to something like GitHub.



  Happy to help you any steps you may need to take over,


For now, it's enough to receive some formal blessing from you to start making 
releases on my own.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 10/01/2022 20:47, Mike Dalessio wrote:
Although I'm relieved, the potential loss of maintainers from the project 
 was 
alarming. Perhaps another goal to consider for the year is to expand the pool 
of contributors and maintainers. I (and others, I assume) am interested in 
volunteering more time so that the burden isn't carried by you alone, and so 
that if in the future you're unable to secure funding the user community will 
be able to sustain that loss.


Thanks again, and please think about what work volunteers can pick up to get 
more involved.


Anyone is invited to help with maintenance. But I can't think of many simple 
issues for people to get started. Fixing bugs and reviewing merge requests 
often requires deep knowledge of the code base which in turn requires to 
invest considerable amounts of time. On the other hand, everyone has to start 
somewhere. The best way is probably to start working on interesting issues, 
learn from any mistakes you make, and repeat.


Personally, I think the main problem is funding. The pool of competent 
programmers willing to spend months of their time to work on a rather outdated 
code base implementing mostly legacy technology for free is tiny or even 
non-existent. It's really the large corporations who could make a difference 
by sponsoring OSS maintenance directly. I'm sure you can find people like me 
who would work on OSS at a discount, but not without any monetary compensation.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 10/01/2022 16:51, Stefan de Konink wrote:
This is great news, thanks Google for acknowledging the importance of 
maintaining core open source products. Your previous improvements on XSD 
validation made a great difference, but from my prototype in Python (LXML) I 
assume that multithreaded constraint validation and a more efficient way of 
storage would gain additional performance on files larger than 500MB. One may 
ask if no 'green fund' would be able to donate money on these type of 
improvements.


I didn't make any performance improvements to the XSD code personally. You're 
probably seeing improvements from the following commit which wasn't authored 
by me:


https://gitlab.gnome.org/GNOME/libxml2/-/commit/faea2fa9

If you're seeing degraded performance on large documents, it's likely another 
issue with quadratic runtime. Fixing such issues algorithmically should 
typically yield much better results than trying to work around them with 
multi-threading.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Resuming maintenance

2022-01-10 Thread Nick Wellnhofer via xml

Hello,

Thanks to a donation from Google, I'm able to resume maintenance of libxml2 
(and libxslt) for the remainder of 2022.


My immediate plans are:

- Make a bug fix release fixing many regressions.
- Establish a new release schedule, possibly with multiple branches being
  maintained.
- Move releases from the old FTP server to GNOME's Gitlab infrastructure.
- Move documentation to GNOME infrastructure.
- Set up an official way to sponsor libxml2 maintainers.

In the future I'll focus less on security improvements and more on typical 
maintenance duties like bug fixes and modernizing the code base in a few ways.


Thanks (again) to Google for making this possible.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] userdata for SAX parsing with schema validation

2022-01-03 Thread Nick Wellnhofer via xml

On 23/12/2021 20:14, Lara Blatchford wrote:
Hi - I have a simple SAX handler set up, and schema validation errors are 
being caught by my structured error handler.  So far so good.


It appears that the userdata argument to xmlSAXUserParseMemory /must/ be the 
xmlSchemaSAXPlugPtr returned by the call to xmlSchemaSAXPlug, and that this 
pointer is passed as the ctx pointer to the SAX handler callbacks.


This is correct.

Is there any way for me to make a userdata pointer of my choosing available to 
my SAX handler callbacks while still getting schema validation?


From a quick look at the code, it seems that you can simply pass your user 
data pointer to xmlSchemaSAXPlug.



    // userdata arg is set to the pointer to the original SAX user data pointer

    xmlSchemaValidCtxtPtr oldXsdValidCtxt = NULL;;

    void *ctxptr = 

    xmlSchemaSAXPlugPtr saxPlug = xmlSchemaSAXPlug(xsdValidCtxt, 
,  );


You should pass your user data pointer here instead of a NULL 
xmlSchemaValidCtxtPtr:


void *user_data = my_user_data;
xmlSchemaSAXPlugPtr saxPlug = xmlSchemaSAXPlug(xsdValidCtxt,
, _data);

xmlSchemaSAXPlug will then swap the user data pointer with its own one which 
you have to use when calling xmlSAXUserParseMemory. The SAX callbacks, 
however, should receive the original pointer.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Stepping down

2021-07-22 Thread Nick Wellnhofer via xml
I never really asked for it but in the last years I became de-facto maintainer 
of both libxml2 and libxslt. Luckily, I was able to fund my involvement 
through Chrome VRP bug bounties and OSS-Fuzz integration rewards. Big thanks 
to Google for these outstanding programs.


Unfortunately, returns from security research are diminishing quickly and I 
see no way to obtain a minimal level of funding anymore. So I'm stepping down 
as contributor and maintainer.


Thanks to everyone who reported bugs and contributed patches!

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.11

2021-05-14 Thread Nick Wellnhofer via xml

On 13/05/2021 23:13, Stefan Behnel wrote:

Difficult to say if this is an improvement or deliberate breakage.
Technically, it's not a semantic change in the XML output, rather a byte
level change in ignorable whitespace. But I'll need to look into it further
to understand what the best adaptation to this change is.


This is caused by one of my changes. I can have a look and revert to the old 
behavior.



More importantly, there also seem to be issues where additional closing
tags or duplicated PIs and comments are being written, e.g.


This is tracked here:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/255

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] warning: cast from 'unsigned char *' to 'unsigned short *'

2021-03-23 Thread Nick Wellnhofer via xml

On 23/03/2021 00:38, Jeffrey Walton via xml wrote:

encoding.c:500:26: warning: cast from 'const unsigned char *' to
   'unsigned short *' increases required alignment from 1 to 2 
[-Wcast-align]
 unsigned short* in = (unsigned short*) inb;



If the buffers are aligned, then you can use the following to squash
the warning:


This is a known issue. Internal use of these functions should be safe but the 
encoding API is also exposed publically and could be used with unaligned 
pointers. So the warning is valid.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 2.9.10 and Hang after Testing parser : 61 of 70 functions

2021-03-22 Thread Nick Wellnhofer via xml

On 22/03/2021 05:21, Jeffrey Walton via xml wrote:

I'm working on my old PowerMac G5, powerpc-apple-darwin9.8.0. I'm
trying to build an updated OpenSSH. libxml2 2.9.10 is a distant
dependency.


First of all, it's great to hear that libxml2 compiled at all and that most of 
the tests seem to pass.



libxml2's make check is hanging at:

 ...
 Testing nanoftp : 14 of 22 functions ...
 Testing nanohttp : 13 of 17 functions ...
 Testing parser : 61 of 70 functions ...
 

Does anyone have an idea what may be going sideways?


That's the 'testapi' test which causes the same problem on Windows. The test 
should complete eventually. It's just incredibly slow. One possible 
explanation is that somewhere an array is reallocated every time an element is 
appended. Some Linux allocators can handle repeated reallocations in linear 
time, but in general, you have to expect quadratic behavior. I just haven't 
found the time to investigate the issue.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] fix memory leak when xmlRegStatePush failed

2021-03-13 Thread Nick Wellnhofer via xml

On 12/01/2021 10:42, zhuyan (M) wrote:


In the function xmlRegStatePush, if xmlMalloc or xmlRealloc fails,


Yes, there are many issues that arise from poor handling of malloc failures. 
Fortunately, similar issues can be found quite effectively by changing the 
fuzzers to inject malloc failures. I already started to address these errors 
in a more systematic way, but I want to hold off further commits until after 
the next release.


Note that in this particular case, it is easier to make static function 
xmlRegStatePush free the 'to' state on error.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xmlGetNodePath() returns invalid path for XML_DTD_NODE

2021-03-13 Thread Nick Wellnhofer via xml

On 08/02/2021 18:01, Christoph M. Becker wrote:

On 08.02.2021 at 17:23, Nick Wellnhofer wrote:

This should be fixed for other node types as well. Does the attached
patch work for you?


Yes, that works fine.  Thank you!


This is fixed in master now:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/e20c9c148c725e2933efa143ee6a543a5cae4204

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] about xmlReadMemory()

2021-03-03 Thread Nick Wellnhofer via xml

On 03/03/2021 09:30, nicolas bats wrote:

Hi Nick,
I've experimented with xmlReadIO and it's cool.
this message just to check I'm doing right:
-I register an xmlInputReadCallback of type: size_t myCallback(void* context, 
char* buffer, int length)
-I do my stuff in the callback and if data I use exceed the length of the 
buffer, I realloc it.

Is this schema good?
Do I need to set size_t as the return type of myCallback?


No, the read callback is supposed to fill the buffer with up to 'length' 
bytes. Try something like:


typedef struct {
const char *ptr;
size_t remaining;
} myContext;

static int
myReadCallback(void *vcontext, char *buffer, int len) {
myContext *context = vcontext;

if (context->remaining < len)
len = context->remaining;
memcpy(buffer, context->ptr, len);
context->ptr += len;
context->remaining -= len;

return len;
}

xmlDocPtr
myReadMemory(const char *buffer, size_t size, const char *URL,
 const char *encoding, int options) {
myContext context;

context.ptr = buffer;
context.remaining = size;

return xmlReadIO(myReadCallback, NULL, , URL, encoding, options);
}
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] about xmlReadMemory()

2021-03-02 Thread Nick Wellnhofer via xml

On 02/03/2021 16:28, nicolas bats via xml wrote:

Hi,
is there's a reason why xmlReadMemory 
() accepts int as 
the size of the array to transform to xmlDocPtr.

no doubt there's one...


That's simply a design mistake. The API was created 20 years ago when 64-bit 
systems were rare.


and in that case how could I retrieve a xmlDocPtr from 
memory where size is type of size_t?


If you want to process memory buffers larger than INT_MAX, you can use 
xmlReadIO with a custom read callback that uses a size_t to store the offset.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xmlGetNodePath() returns invalid path for XML_DTD_NODE

2021-02-08 Thread Nick Wellnhofer via xml

On 28/01/2021 14:51, Christoph M. Becker via xml wrote:

-if ((node == NULL) || (node->type == XML_NAMESPACE_DECL))
+if ((node == NULL) || (node->type == XML_NAMESPACE_DECL)
+|| (node->type == XML_DTD_NODE))
  return (NULL);


This should be fixed for other node types as well. Does the attached patch 
work for you?


Nick

diff --git a/tree.c b/tree.c
index d2347dfdf..636f81fed 100644
--- a/tree.c
+++ b/tree.c
@@ -4881,7 +4881,9 @@ xmlGetNodePath(const xmlNode *node)
 }
 next = ((xmlAttrPtr) cur)->parent;
 } else {
-next = cur->parent;
+xmlFree(buf);
+xmlFree(buffer);
+return (NULL);
 }
 
 /*
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Issue in building for arm...

2021-01-18 Thread Nick Wellnhofer via xml

On 18/01/2021 12:30, Abu Muttalib via xml wrote:

In file included from /usr/include/python2.7/Python.h:8:0,
                  from libxml.c:15:
/usr/include/python2.7/pyconfig.h:14:54: fatal error: 
arm-linux-gnueabihf/python2.7/pyconfig.h: No such file or directory

compilation terminated.


Simply disable the Python bindings:

./configure --without-python

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Constraint validation for huge documents

2021-01-05 Thread Nick Wellnhofer via xml
The XML Schemas code hasn't been actively maintained for more than 10 years, 
so it's unlikely to receive a helpful answer regarding the code.


There was a recent patch which might help:


https://gitlab.gnome.org/GNOME/libxml2/-/commit/faea2fa9b890cc329f33ce518dfa1648e64e14d6

Other than that, you'll have to dig through the sources yourself.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Fwd: Windows libxml2.lib missing?

2020-12-09 Thread Nick Wellnhofer via xml

On 09/12/2020 01:49, Pro Turm via xml wrote:
do you know why the provided Windows binaries dont contain any .lib files? No 
.lib has been provided here

http://xmlsoft.org/sources/win32/64bit/ 



It's explained in readme.txt.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] encoding: fix memleak in xmlRegisterCharEncodingHandler()

2020-12-07 Thread Nick Wellnhofer via xml

On 07/12/2020 13:19, Xiaoming Ni wrote:

The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.

so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().

Reported-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  encoding.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)


Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/649d02eaa419fa72ae6b131718a4ac77063d7a5a


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val"

2020-12-07 Thread Nick Wellnhofer via xml

On 07/12/2020 13:17, Xiaoming Ni wrote:

The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.

Signed-off-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  xmlschemastypes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/cb7a572b3e7f568f1ebc8d91b1b8826a8ce3baa8


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] ping //Re: [PATCH] xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add check "facet->val"

2020-12-06 Thread Nick Wellnhofer via xml

On 01/12/2020 08:05, Xiaoming Ni wrote:

ping


Your previous email didn't make it to the mailing list.


On 2020/11/24 14:55, Xiaoming Ni wrote:

The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.

Signed-off-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  xmlschemastypes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Please resend the full patch formatted with "git format-patch" or create a 
merge request on https://gitlab.gnome.org/GNOME/libxml2


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Fix xmlURIEscape memory leaks.

2020-11-09 Thread Nick Wellnhofer via xml
Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/7c06d99e1f4f853e3c5b307c0dc79c8a32a09855


Nick

On 27/10/2020 19:33, enh via xml wrote:

Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).

Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.

This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
---
  uri.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Fix xmlURIEscape memory leaks.

2020-11-06 Thread Nick Wellnhofer via xml

On 06/11/2020 00:54, enh via xml wrote:

ping?

(let me know if this should be a pull request somewhere instead...)


Sending patches to the mailing list is fine. It might take another week or 
two, but the issue will be addressed eventually.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Why does libxml2 limit port numbers to 999,999,999?

2020-10-17 Thread Nick Wellnhofer via xml
On Oct 17, 2020, at 12:24 , Richard W.M. Jones via xml  wrote:
> It seems like libxml2 chose to do this for convenience rather than
> correctness.

Yes, this is an arbitrary limit introduced to avoid integer overflow.
 
> I think it should accept port numbers at least up to
> signed int (the type used to store port numbers), and give an error if
> the port number overflows.

This is fixed now: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/b46016b8705b041c0678dd45e445dc73674b75d0

> Also could the uri->port field be changed to unsigned int without
> breaking ABI?

It’s a public struct member, so strictly speaking, no. But the risk to break 
stuff seems low.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Fix character column number of XML parse error on line with closing tag of element with namespace preceding it

2020-08-09 Thread Nick Wellnhofer via xml
On Jun 15, 2020, at 17:29 , Frederic Vancraeyveldt  wrote:
> I traced the code and I have a suggested fix in libxml_parser.patch.

Thanks, this should be fixed now with this commit:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/b82fa3dd26a72c89ced293d06269eb97bb252d76
 
> I also modified xmllint a little bit to be able to show the error using that 
> tool.
> 
> That modification is in patch (libxml_error.patch)

This changes the format of error messages for all users of libxml2 and could 
break things if someone tries to parse the error message, for example.

> I just joined this mailing list. Please advise me if there is a better way of 
> reporting these issues.

The best way is to use GitLab: https://gitlab.gnome.org/GNOME/libxml2

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] GCC 10 analyzer findings

2020-07-24 Thread Nick Wellnhofer

On 16/07/2020 11:49, Jeffrey Walton via xml wrote:

I'm building libxml2-2.9.10 on Fedora 32 with GCC 10. GCC 10 includes
the analyzer. The analyzer can be enabled by adding -fanalyze to
CFLAGS and LDFLAGS.

The analyzer is producing some use-after-free and double-free findings
on libxml2-2.9.10.


I gave it a try and here are my observations:

xmlMalloc and similar entry points are function pointers. To make the static 
analyzer understand that these are actually malloc calls, we need a special 
configuration where xmlMalloc is defined as a macro or function.


libxml2 typically zeroes freshly allocated memory with memset and assumes this 
initializes pointers with NULL. Although I haven't seen a platform where this 
doesn't work, the C standard makes no such guarantee. To avoid false 
positives, such pointers must be initialized in a different way.


There are quite a few bug reports about false positives with GCC's analyzer in 
the initial release, so I'd wait for a newer release before giving it another 
look.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Question about xmlDoc->oldNs usage in php

2020-06-15 Thread Nick Wellnhofer

On 15/06/2020 05:04, Benjamin Eberlei wrote:
Now I am wondering what oldNs is even used for here, it seems from the libxml 
code it is really only needed to "cache" a pointer to the xmlNs that 
represents "xml" and nothing more.


No, the oldNs list is also appended to in `xmlDOMWrapStoreNs`. It seems that 
this namespace list is only used for memory management. From a quick look, 
your changes to the PHP code will cause a memory leak.


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] win32: add "symbols" flag to configure.js

2020-05-04 Thread Nick Wellnhofer

On 10/04/2020 19:32, Michael Stahl wrote:

On 10.03.20 12:16, Nick Wellnhofer wrote:
Maybe we should simply add a feature to provide custom compiler and linker 
flags.


okay i've played around with that now, result is attached...


Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/a230b728f1289dd24c1666856ac4fb55579c6dfb


I renamed the variables to EXTRA_CFLAGS etc.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 self test failures on NetBSD

2020-03-21 Thread Nick Wellnhofer

> On Mar 21, 2020, at 08:03 , Jeffrey Walton via xml  wrote:
> 
> I'm building libxml2-2.9.10 from sources. I'm seeing some libxml2 self
> test failures on NetBSD 8.1.

> File ./test/ebcdic_566012.xml generated an error

This issue was originally fixed with this commit:


https://gitlab.gnome.org/GNOME/libxml2/-/commit/4b4135977e82b7c9d3bba87a24fb7b5609312e14

I don’t know why it doesn’t work on NetBSD. I assume your build is with iconv 
and without ICU. Does `iconv -l |grep -i ebcdic` return any results?

Nick


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] win32: add "symbols" flag to configure.js

2020-03-10 Thread Nick Wellnhofer

On 08/03/2020 17:40, Michael Stahl wrote:
hi, we want an easier way to get PDB files for MSVC release builds for 
crashreporting purpose...


There's also this GitLab issue: 
https://gitlab.gnome.org/GNOME/libxml2/issues/140

Maybe we should simply add a feature to provide custom compiler and linker 
flags.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] precisionDecimal support

2020-02-15 Thread Nick Wellnhofer

On 24/01/2020 15:35, Constantin Dogaru via xml wrote:
Would be open in accepting a contribution from Bloomberg that will add support 
for precisionDecimal in libxml2?


To be clear, you're talking about this XSD extension datatype?

https://www.w3.org/TR/xsd-precisionDecimal/

Technically, this document specifies an extension to XSD 1.1 while libxml2 
only supports version 1.0. But since it's just a new datatype, it should work 
with the old version as well.



Is this something that you would be interested in?


Me personally, no. But libxml2 is an open-source project, and in principle we 
accept contributions from everyone. Things can move a bit slow due to a lack 
of maintainers, though.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Memory leak problem

2019-11-27 Thread Nick Wellnhofer

Hi Eric,

I'd use AddressSanitizer to debug this kind of problem. It's built into recent 
clang and gcc versions but probably doesn't support AIX. If you can produce a 
stand-alone test program that exhibits the memory leak, you could debug it 
under Linux, though.


Another option is libxml2's built-in memory debugging:

http://xmlsoft.org/xmlmem.html

It's rather limited but it might be your only option if you can't use external 
tools.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] DOM parser uses SAX2

2019-11-12 Thread Nick Wellnhofer

On 11/11/2019 19:59, Akash Opensource wrote:
 From test file I meant sample *.xml file that can be used to go for statement 
coverage.


For example I got a *.xml file in test folder of libxml2 extracted source 
directory that contained a long element name more than 4000 characters and it 
was possible for me to hit a debug point related to such a large element name 
handling.
Similarly xml files with utf8 and utf16 helped in covering encoding related if 
conditions in encoding.c source file.


All files in the `test` directory are parsed if you run `make check`. Some 
additional tests are run if you download the XML Conformance Test Suite. 
Execute `./runxmlconf` to get instructions. Test coverage isn't especially 
good, though.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] DOM parser uses SAX2

2019-11-11 Thread Nick Wellnhofer

On 11/11/2019 11:21, Akash Opensource via xml wrote:
But while checking libxml2 code I saw the DOM parser making calls to functions 
in sax2.c .


The event-based SAX parser is used to build a DOM tree. There's nothing 
special about that, just like you could walk a DOM tree to generate SAX events.


Also tell me is there a way to get statement coverage for all the path in 
libxml2 source files.


You can get coverage reports with the usual compiler options and tools.


Can I get more test files ?


What do you mean by "more test files"?

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] xml2-config.in: fix regressions introduced by commit 2f2bf4b2c

2019-11-02 Thread Nick Wellnhofer
Thanks, applied here:

https://gitlab.gnome.org/GNOME/libxml2/commit/29740ed12f96149e795b22a147ada80b8776c8b4

Nick


> On Nov 2, 2019, at 13:07 , Dmitry V. Levin  wrote:
> 
> One of regressions introduced by commit
> 2f2bf4b2caa1cb9a4a5039b7a44db101943382d1 aka v2.9.10-rc1~56 is that
> cflags and libs variables are used uninitialized, resulting to
> the following behaviour:
> 
> $ cflags=foo libs=bar sh ./xml2-config.in --prefix
> @prefix@
> foo bar
> 
> Another regression is that the test for these variables is flawed.
> 
> Fixes: 2f2bf4b2c ("xml2-config.in: Output CFLAGS and LIBS on the same line")
> ---
> xml2-config.in | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xml2-config.in b/xml2-config.in
> index cb4aa613..c25558c2 100644
> --- a/xml2-config.in
> +++ b/xml2-config.in
> @@ -4,6 +4,8 @@ prefix=@prefix@
> exec_prefix=@exec_prefix@
> includedir=@includedir@
> libdir=@libdir@
> +cflags=
> +libs=
> 
> usage()
> {
> @@ -102,7 +104,7 @@ while test $# -gt 0; do
> shift
> done
> 
> -if test "$cflags" -o "$libs"; then
> +if test -n "$cflags$libs"; then
> echo $cflags $libs
> fi
> 
> -- 
> ldv
> ___
> xml mailing list, project page  http://xmlsoft.org/
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Research about vulnerabilities

2019-10-29 Thread Nick Wellnhofer

On 29/10/2019 14:30, Raphael de Carvalho Muniz wrote:
I found in the commit history of Libxml2 (commit 9acef28) the presence of the 
following code snippet in the libxml.c file (Lines 1,597 - 1,612).


More specifically python/libxml.c which is part of the Python bindings.

I believe 
that this commit presents a weakness that, If format strings can be influenced 
by an attacker, they can be exploited.


libxml_buildMessage is only called from error handlers which should never 
receive format strings from an external source.


You can't just pick a function that calls printf with a variable format string 
and assume that it's vulnerable. It depends on how the function is called and 
which format strings it receives.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] error compiling libxml2-2.9.9 with MinGW with MSYS

2019-05-12 Thread Nick Wellnhofer

On 12/05/2019 19:15, Test User via xml wrote:

../libxml2-2.9.9/nanohttp.c:915:28: error: 'F_GETFL' undeclared (first
use in this function)
  if ((status = fcntl(s, F_GETFL, 0)) != -1) {
 ^~~


Should be fixed with this commit from January:

https://gitlab.gnome.org/GNOME/libxml2/commit/d3de75782504c9136e504c6356bbae52fedf17e5

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Potential NULL pointer dereference in xmlregexp.c

2019-03-05 Thread Nick Wellnhofer

On 04/03/2019 20:37, Shaobo He via xml wrote:
I'm Shaobo He, a graduate student at University of Utah. I'm running a static 
analysis tool on libxml2 and noticed there may be a NULL pointer dereference 
in function `xmlRegexpIsDeterminist`. Basically, function `xmlNewAutomata` can 
return a NULL pointer when malloc fails. Please let me know if it makes sense 
or not.


Thanks for the report. Fixed here:

https://gitlab.gnome.org/GNOME/libxml2/commit/09797c13

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] always define LIBXML_THREAD_ENABLED when enabled

2019-02-28 Thread Nick Wellnhofer

On 27/02/2019 15:43, Michael Haubenwallner wrote:

this is the followup patch proposal to
https://mail.gnome.org/archives/xml/2018-September/msg2.html


Thanks, applied here:

https://gitlab.gnome.org/GNOME/libxml2/commit/cf68fe3d505dd3f7525ccc28c90f87432a747aa4

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2-2.9.9

2019-01-30 Thread Nick Wellnhofer

On 30/01/2019 10:36, Alexander Dahl wrote:

What about CVE-2017-8872?

Debian (and SuSE) have a patch:

https://sources.debian.org/patches/libxml2/2.9.8+dfsg-1/0003-CVE-2017-8872.patch/

https://security-tracker.debian.org/tracker/CVE-2017-8872

According to https://bugzilla.gnome.org/show_bug.cgi?id=775200 and
https://gitlab.gnome.org/GNOME/libxml2/issues/26 that might have been fixed by
accident with git commit v2.9.8-26-g123234f2?

The Debian patch still applies on 2.9.9, but I don't understand libxml2 well
enough to say if it is harmful now and should be dropped?


The Debian patch is basically the same as commit 123234f2, so it can be dropped.

https://gitlab.gnome.org/GNOME/libxml2/commit/123234f2cfcd9e9b9f83047eee1dc17b4c3f4407


I also can not say
if CVE-2017-8872 is really mitigated with v2.9.8-26-g123234f2?


Yes, it's the same issue. I just verified that the POC document in bug 775200 
doesn't trigger ASan anymore.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-23 Thread Nick Wellnhofer

On 23/01/2019 16:14, Tomi Belan wrote:
I don't know too much 
about Python's C API, but [2] [3] suggests lxml is using a deprecated macro 
and giving libxml2 a multibyte buffer even though the input would fit into 
pure ASCII. This explains why it behaved differently than xmllint.


Right, if Python passes ASCII codes as, say, 16-bit integers, this will be 
detected as UTF-16 by libxml2 and encoding conversion will happen behind the 
scenes. I'm not sure what would happen with an encoding that isn't Unicode 
compatible. Maybe there's a bug lurking in lxml.


It would be good to add some tests to decrease the likelihood that 
this issue or something similar happens again.


Yes, that would be nice. But it was only a short-lived regression that I 
personally don't want to spend more time on. A UTF-16 test case derived from 
either your or the Chromium bug report would probably make most sense.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-23 Thread Nick Wellnhofer

On 23/01/2019 01:47, Tomi Belan wrote:
But even so I still wasn't able to reproduce it in pure C. Could it be 
because xmllint reads ctxt->myDoc, and lxml uses SAX2 event handlers 
(according to parsertarget.pxi)? AFAICT xmllint's --push and --sax options are 
incompatible.


ctxt->myDoc is also built via internal SAX2 handlers, so I'm not sure what's 
going on exactly.


I had more luck with git bisect. Using a dynamically linked build of lxml, and 
pointing LD_LIBRARY_PATH to libxml2/.libs/, I successfully found out that the 
bug was:
- introduced by 
https://github.com/GNOME/libxml2/commit/6e6ae5daa6cd9640c9a83c1070896273e9b30d14
- fixed(?) by 
https://github.com/GNOME/libxml2/commit/7a1bd7f6497ac33a9023d556f6f47a48f01deac0


The first commit was an attempt to fix an (ICU-related?) issue but it turned 
out to be buggy. It's unfortunate that the commit made it into 2.9.8.


https://mail.gnome.org/archives/xml/2018-January/msg3.html
https://bugs.chromium.org/p/chromium/issues/detail?id=820163

I hope that's meaningful to you, because I have no idea what are those commits 
doing and how could it be related to this bug... The commits sound related to 
character encoding, but bad.html is plain ASCII...


The commit obviously also affected documents that didn't need encoding 
conversion. I didn't realize that. At least we know that the issue is isolated 
to 2.9.8. Thanks for your efforts!


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-22 Thread Nick Wellnhofer

On 22/01/2019 19:11, Tomi Belan wrote:
I tried to reproduce it with only xmllint as you suggest, but I'm not having 
much luck. It produces correct results with "--html --debug bad.html", "--html 
--debug --stream bad.html", "--html --debug --push bad.html", and "--html 
--debug --sax bad.html".


Maybe I'm just not using the right flags - I don't know if lxml uses SAX mode, 
or streaming, etc. But at this point I wouldn't be too surprised if it 
depended on the size of some internal input buffer that's different in lxml vs 
xmllint. I'd welcome any advice about what else I should try, or how can I 
find out what calls are being made from lxml to libxml2.


From a quick look at the lxml source, it seems that the `feed` method of 
HTMLParser calls htmlParseChunk, so you should pass `--html --push` to 
xmllint. But if it's a buffer boundary issue, you might have to recreate the 
exact chunk sizes to reproduce the problem. lxml seems to split into chunks of 
size INT_MAX, meaning a single chunk in most cases. xmllint first passes a 
chunk of 4 bytes, then splits the remaining data into chunks of 4096 bytes. 
But maybe I'm missing something. To be sure, you could run your Python code 
under a debugger like gdb and set a break point on htmlParseChunk. Also break 
on htmlCtxtUseOptions to see which parser options are used exactly.


You could also start experimenting with feeding chunks of different sizes in 
your Python script or with a small C program that calls htmlParseChunk in the 
same way as lxml, presumably writing a single chunk. You could also try to add 
4 bytes somewhere at the beginning of `bad.html` and see if it helps with 
reproducing the issue using xmllint.


Other than that: It's not ideal, but could you please check if you can also 
reproduce the bug with the first set of commands I posted? Just to verify it's 
not just me.


Yes, I can try.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-22 Thread Nick Wellnhofer

On 22/01/2019 15:43, Tomi Belan via xml wrote:
After a lot of debugging, I determined the problem is in libxml2 and not the 
other libraries in my stack, and that it only seems to happen on version 
2.9.8. But I don't see any related changes in news.html for 2.9.9, nor in the 
diff between them, so I am still worried: I don't know if the bug is really 
fixed, or just dormant. I hope you can find the root cause, and maybe add a 
regression test if you do.


I also don't see any directly related changes in either 2.9.8 or 2.9.9.

This will download 
the manylinux binary build of lxml 4.2.5, which is statically linked to 
libxml2 2.9.8.


Are you sure that a pristine 2.9.8 build was used? Maybe there are additional 
patches added by a distro?


I couldn't shorten the file very much, because if I delete even a single 
character, the bug stops triggering. (Could it be some buffer boundary issue?) 


Yes, a buffer boundary issue seems likely.

I also built my own lxml 4.2.5 with libxml2 2.9.9 and it was not affected. So 
I believe this is a bug in libxml2 2.9.8 specifically, and not in a particular 
version of lxml.


Did you also try your own build with the official libxml2 2.9.8 sources?

I hope you can solve the mystery. Please let me know if I can be of any help. 


It would help if you could reproduce the issue with xmllint and no Python code 
involved. git-bisect might also be useful.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-12-19 Thread Nick Wellnhofer

On 30/11/2018 11:41, Nikolai Weibull via xml wrote:
OK, now I understand why it was working in my copy of the repository and not 
yours.  Something went wrong when you applied the patch, Daniel, as a line was 
elided.  Here’s a fix.  We want to include XML_RELAXNG_TEXT here as well, 
otherwise it won’t work. The second part of the patch below was just to 
reorder the types to be listed in alphabetical order, so you may certainly 
skip that.


Stefan, can you confirm that Nikolai's patch fixes the lxml issue?

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-12-01 Thread Nick Wellnhofer

On 29/11/2018 22:50, Daniel Veillard wrote:

On Fri, Nov 23, 2018 at 11:12:13PM +0100, Nick Wellnhofer wrote:

The function now claims to work without preparsed documents, so the
workaround isn't used. But apparently there's problem with the commit. I'm
CC'ing the author. If we can't get this fixed, let's revert.


I let you double check this post RC2, we can either delay or just revert,
tell me what you think,


I just reverted the commit:

https://gitlab.gnome.org/GNOME/libxml2/commit/6fc04d714a019cb3be351bc472f7a64a08f51008

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-24 Thread Nick Wellnhofer

On 24/11/2018 14:01, Daniel Veillard via xml wrote:

  Nick there seems to be 7 merge requests, maybe we need to go though those
before I push an RC2,


https://gitlab.gnome.org/GNOME/libxml2/merge_requests/5

This should be kept externally, IMO.

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/7

Also see the discussion in issue #2. I'd simply document that xmlInitGlobals 
shouldn't be called from application code.


https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8

I added a hardcoded newline separator which should be enough for now.

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/9

Should be discussed on this list first. Personally, I'm in favor of adding 
CMake support (as an option). In the long run, it could replace the weird 
Win32 build system with something more standard.


https://gitlab.gnome.org/GNOME/libxml2/merge_requests/10

That's Nikolas's merge request.

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/11

Trivial.

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/12

Looks good but I didn't have time to review properly.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-23 Thread Nick Wellnhofer

On 23/11/2018 22:38, Nick Wellnhofer wrote:

On 23/11/2018 20:51, Shlomi Fish wrote:

I am getting a failure in one of the tests of
https://github.com/shlomif/perl-XML-LibXML - it works fine with 2.9.8:


Can you check if it's caused by one of the following commits?


Nevermind, it's this commit:

https://gitlab.gnome.org/GNOME/libxml2/commit/bfec41b3de1cbd35e547b57c80ae3a5101f8891c

It seems that XML::LibXML implements its own workaround for 
xmlTextReaderNextSibling only being supported on preparsed documents:


https://github.com/shlomif/perl-XML-LibXML/blob/master/LibXML.xs#L8667

The function now claims to work without preparsed documents, so the workaround 
isn't used. But apparently there's problem with the commit. I'm CC'ing the 
author. If we can't get this fixed, let's revert.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-23 Thread Nick Wellnhofer

On 23/11/2018 20:51, Shlomi Fish wrote:

I am getting a failure in one of the tests of
https://github.com/shlomif/perl-XML-LibXML - it works fine with 2.9.8:


Can you check if it's caused by one of the following commits?

https://gitlab.gnome.org/GNOME/libxml2/commit/d2ef114c6b0d9a840b94cdecf554a873fc6f6df5
https://gitlab.gnome.org/GNOME/libxml2/commit/bfec41b3de1cbd35e547b57c80ae3a5101f8891c
https://gitlab.gnome.org/GNOME/libxml2/commit/39fbfb4fd08eae88d4b0c15f3a8ac33babc740e6

Thanks,
Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Entering freeze for release of libxml2-2.9.9

2018-11-23 Thread Nick Wellnhofer

On 22/11/2018 18:32, Daniel Veillard via xml wrote:

Please give it some testing, if we need to make changes I will likely push
an RC2 mid next week, and if everything goes well I will push 2.9.9 final
end of next week.


Built and tested (with `make check`) succesfully on:

- Windows 10
  - MingW64, gcc 7.2.0, both 64 and 32 bits (requires CFLAGS=-posix
and PRINTF_EXPONENT_DIGITS=2 to make the XPath tests pass)
  - MSVC 14.11
- macOS 10.14 with stock command-line developer tools

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xmlIO.obj : error LNK2019: unresolved external symbol

2018-11-03 Thread Nick Wellnhofer

On 03/11/2018 01:02, Heng Zhou via xml wrote:
xmlIO.obj : error LNK2019: unresolved external symbol __libxml2_xzopen 
referenced in function xmlXzfileOpen_real


Can you try the attached patch? Untested, but if lzma is enabled, we have to 
compile and link xzlib.c as well.


Nick
diff --git a/win32/Makefile.msvc b/win32/Makefile.msvc
index 491dc880..ee4250af 100644
--- a/win32/Makefile.msvc
+++ b/win32/Makefile.msvc
@@ -244,6 +244,12 @@ XML_OBJS_A_DLL = $(XML_INTDIR_A_DLL)\buf.obj\
$(XML_INTDIR_A_DLL)\xpointer.obj\
$(XML_INTDIR_A_DLL)\xmlstring.obj
 
+!if "$(WITH_LZMA)" == "1"
+XML_OBJS = $(XML_OBJS) $(XML_INTDIR)\xzlib.obj
+XML_OBJS_A = $(XML_OBJS_A) $(XML_INTDIR_A)\xzlib.obj
+XML_OBJS_A_DLL = $(XML_OBJS_A_DLL) $(XML_INTDIR_A_DLL)\xzlib.obj
+!endif
+
 # Xmllint and friends executables.
 UTILS = $(BINDIR)\xmllint.exe\
$(BINDIR)\xmlcatalog.exe\
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Serialization of documents without encoding

2018-09-27 Thread Nick Wellnhofer

On 25/09/2018 14:36, Nick Wellnhofer wrote:
The whole situation is a mess. I'd love to change the code so that non-ASCII 
chars are always encoded as UTF-8, but I'm scared to break things.


This is the change I have in mind:

https://github.com/nwellnhof/libxml2/commit/53551ec2f6a2ef03bfcfb6d73b6fd18dc70ba15d

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Serialization of documents without encoding

2018-09-27 Thread Nick Wellnhofer

On 27/09/2018 10:59, Roumen Petrov wrote:

Let consider case as "file" mode.



Let consider case as "stream" code.


I'm not only talking about xmllint but the serialization API (xmlSave*, 
xmlNodeDump*) in general.


Now about above test samples . if content is stored in file xmllint works fine 
with encoding(=codeset=charset).


$ cat test-noencoding.xml
Käse


No, it doesn't work fine:

$ xmllint test-noencoding.xml

Kse

(2) Next a-umlaut character is encoded in hexadecimal. Minor inconsistency 
between "stream" and "file" mode.


As shown above, "file" mode can also produce unwanted numeric character 
references.


(3) Problem is that in "scream" mode xmllint application ignores value of 
encode argument:

$ echo 'Käse' | xmllint - --encode UTF-8

Kse


Right, there is an inconsistency in xmllint. But that's not my point.

 From my point of view (1) and (2) are minor non-important issues. Only (3) 
could be fixed with low priority.


Unneeded numeric character references in UTF-8 output are not a minor issue. 
If you're working with non-Latin scripts, it makes serialized XML files 
unreadable for humans and blows up the file size.


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Serialization of documents without encoding

2018-09-25 Thread Nick Wellnhofer

On 25/09/2018 13:19, Nick Wellnhofer wrote:
libxml2 serializes documents without an encoding declaration differently than 
documents with an explicit UTF-8 encoding:


It seems that this was partially changed in 2005 with the following commit:

https://gitlab.gnome.org/GNOME/libxml2/commit/64354ea7d6b8e0d95f3f9bcfdc98bddd065b65fc

But this change only applies to text nodes, not attribute content. It also 
only applies when serializing with xmlNodeDumpOutput or xmlNodeDump, not when 
using the xmlSave API (which xmllint uses).


The whole situation is a mess. I'd love to change the code so that non-ASCII 
chars are always encoded as UTF-8, but I'm scared to break things.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Serialization of documents without encoding

2018-09-25 Thread Nick Wellnhofer
libxml2 serializes documents without an encoding declaration differently than 
documents with an explicit UTF-8 encoding:


$ echo 'Käse' |xmllint -

Kse

$ echo 'Käse' |xmllint -

Käse

Since the encoding should default to UTF-8, can anyone explain why this 
decision was made?


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Is there a solution to have newline delimited output for xmllint ?

2018-09-23 Thread Nick Wellnhofer

On 23/09/2018 15:12, gilles.que...@sputnick.fr wrote:

This is the official libxml repository I guess(?).


Yes.


Do you know if a Debian package will be packaged ?

I will contact Archlinux mainteners to update package, last update 2016.


All distros will eventually catch up but it probably doesn't make much sense 
to ping maintainers before a new libxml2 version is released.


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Is there a solution to have newline delimited output for xmllint ?

2018-09-23 Thread Nick Wellnhofer

On 20/03/2018 16:45, Nick Wellnhofer wrote:
I agree that printing text nodes without a separator is rather useless and I 
always found it annoying that the output isn't terminated with a newline at 
all. In this case, I'm not too concerned about backward compatibility and I'd 
simply change the `--xpath` output to always print a newline after each node, 
text or not.


This should be resolved now:


https://gitlab.gnome.org/GNOME/libxml2/commit/da35eeae5b92b88d8ebdb64b4b327ac1c2cf1bce

Also see the discussion here:

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] variables need 'extern' in static lib on Cygwin

2018-09-22 Thread Nick Wellnhofer

On 17/09/2018 10:59, Michael Haubenwallner wrote:

While the dllimport/dllexport macros now work for Cygwin, using the
static library still requires variables to be declared as 'extern'.
This is a regression of c65c9e8ee07e2dab0647392c2bd1795a5bc99829,
found+fixed by Bruno Haible using static libxml embedded in gettext.


Thanks. Patch applied here:

https://gitlab.gnome.org/GNOME/libxml2/commit/73b2417c5148af1f89708031b4bf96f40d1195e0

And for libxslt:

https://gitlab.gnome.org/GNOME/libxslt/commit/dfa1bdceaef73a404d1c6efe58c3618493b36afb

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] possibility to use xpath searching on xml balanced chunks

2018-09-05 Thread Nick Wellnhofer

On 05/09/2018 10:53, Pavel Stehule via xml wrote:
Is any possibility to read balanced chunk in format where XPatch searching is 
possible?


By design, XPath only works on full documents. All you can do is to insert the 
balanced chunk under a dummy document node.


Nick


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] always dllexport the singlethreaded variables

2018-09-04 Thread Nick Wellnhofer

On 04/09/2018 11:00, Michael Haubenwallner wrote:

Right now, LIBXML_THREAD_ENABLED is defined in xmlversion.h only if the
*application* does enable threads


OK, I thought that LIBXML_THREAD_ENABLED only depends on the configure switch, 
but it also depends on the following check in xmlversion.h:


#if defined(_REENTRANT) || defined(__MT__) || \
(defined(_POSIX_C_SOURCE) && (_POSIX_C_SOURCE - 0 >= 199506L))

https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/xmlversion.h.in#L87

This looks wrong to me. If libxml2 was compiled with threads, 
LIBXML_THREAD_ENABLED should always be defined.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] always dllexport the singlethreaded variables

2018-08-31 Thread Nick Wellnhofer

On 25/05/2018 17:46, Michael Haubenwallner wrote:

When an application using libxml2 does not enable multithreaded support
for itself, we provide the singlethreaded variables, eventually tagged
with dllimport.  So even when we build the multithreaded libxml2, our
singlethreaded variables still eventually need the dllexport tag.


I don't understand the issue. Do you mean an application linked with a DLL 
configured without thread support and compiled with single-thread headers, and 
THEN swapping the DLL with a multi-thread version? I'm not sure whether this 
should be supported.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] really declare dllexport/dllimport for Cygwin

2018-08-31 Thread Nick Wellnhofer

Thanks, patch applied here:

https://gitlab.gnome.org/GNOME/libxml2/commit/c65c9e8ee07e2dab0647392c2bd1795a5bc99829


On 25/05/2018 17:46, Michael Haubenwallner wrote:

Cygwin does not define _WIN32, but still requires dllexport/dllimport
tags for when applications use the --disable-auto-import linker flag,
probably set by the gl_WOE32_DLL autoconf macro in woe32-dll.m4 file.
---
  include/libxml/xmlexports.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/libxml/xmlexports.h b/include/libxml/xmlexports.h
index 2c79f814..bc8a90d4 100644
--- a/include/libxml/xmlexports.h
+++ b/include/libxml/xmlexports.h
@@ -131,8 +131,8 @@
#endif
  #endif
  
-/* Cygwin platform, GNU compiler */

-#if defined(_WIN32) && defined(__CYGWIN__)
+/* Cygwin platform (does not define _WIN32), GNU compiler */
+#if defined(__CYGWIN__)
#undef XMLPUBFUN
#undef XMLPUBVAR
#undef XMLCALL


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-06-08 Thread Nick Wellnhofer

On 08/06/2018 03:45, Eric S. Eberhard wrote:
Some very simple things to do:  1) put the DTD hosts into the /etc/hosts file 
(or another if you like and substitute an IP)   2)  set /etc/resolv.conf to 
first look in the hosts file (before DNS)


The discussion is not about caching DTDs loaded over the network but from the 
local file system. In this particular case, the same Docbook DTD (~250 KB) is 
parsed more than 100 times for each XInclude.


If I was to suggest a speed up of libxml2 I would change it to allow 
optionally (probably at compile time) to never free memory -- each node, piece 
of data, etc that is created and destroyed constantly would just sit there 
(and slowly grow until it levels out).


libxml2 already allows you to use your own memory allocators. It's easy to 
make `free` a no-op.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-06-07 Thread Nick Wellnhofer

On 07/06/2018 00:00, Stefan Sauer wrote:

Another idea is to stop loading external DTDs for XIncludes without an
XPointer expression. This would still change the behavior for some
users but it's much less likely to cause problems.

change the behaviour, as in we would not catch validation errors?


No, nothing related to validation. If you validate a document, the DTDs will 
always be loaded. But parsing with or without XML_PARSE_DTDLOAD will obviously 
produce different results. It's hard to tell whether this will cause problems 
for users. But maybe I'm overly cautious. If someone parses a document without 
DTD flags, why would they assume that XIncluded documents are parsed with 
XML_PARSE_DTDLOAD?



Too bad that xmlXIncludeParseFile() does not get the parent parserCtx,
in that case we could apply the same flags'.


I think the original flags are already passed via xmlXIncludeSetFlags.


It seems that xmldict is only handling key and value to be a string,
right? So, we'll even need out one cache data structure. I'd say it
would need to be on the _xmlXIncludeCtxt level. global is easier, but
then we can't free it ever :/


xmlHash should work fine:

http://xmlsoft.org/html/libxml-hash.html

But building a DTD cache would be the least of your problems. The hard part is 
to apply a cached DTD to a document. There are some interactions between 
internal and external subsets (see xmlAddElementDecl and xmlAddAttributeDecl 
in valid.c for example), so you it looks like you can't just simply set 
doc->extSubset to the cached DTD. You'd probably have to replay the calls to 
xmlAddElementDecl etc, maybe even in the original order which might be lost. 
That's why I wouldn't want to go down this route.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-17 Thread Nick Wellnhofer

On 16/05/2018 21:51, Stefan Sauer wrote:

So one solution could be another flag to enable this?


Yes, but it would be rather ugly.


Thanks, reading the code. Need to figure where we could cache external
subsets and what a suitable keys is (ExternalID ?).


Note that I'm currently not planning to review and integrate larger patches 
from other developers. I only took over some libxml2 maintenance duties 
because noone else did. So even if you write a high-quality patch, it might 
never get merged.


Caching external subsets for XIncludes certainly sounds like a nice feature 
but I would prefer to find a simpler solution. For example, can't you just 
omit the external DTD from included documents? You wrote:



and gtk-doc will replicate this for the fragments (replacing 'book' with
e.g. 'refentry'). This way one can e.g. inject things like a version.


What do you mean by "inject things like a version"? Why exactly do your 
included documents have to reference an external DTD?


Another idea is to stop loading external DTDs for XIncludes without an 
XPointer expression. This would still change the behavior for some users but 
it's much less likely to cause problems.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-15 Thread Nick Wellnhofer
On May 15, 2018, at 21:56 , Stefan Sauer <enso...@hora-obscura.de> wrote:
> 
> On 05/15/2018 08:40 PM, Stefan Sauer wrote:
>> On 05/15/2018 12:42 PM, Nick Wellnhofer wrote:
>>> Can you try to change the line to
>>> 
>>> xmlCtxtUseOptions(pctxt, ctxt->parseFlags);
>>> 
>>> and see if it helps?
>>> 
>> It does not help. I'll experiment further. Thanks for the recomendations.

I think you also have to remove the line at 
https://git.gnome.org/browse/libxml2/tree/xinclude.c#n463

pctxt->loadsubset |= XML_DETECT_IDS;

Looks like the idea is to make sure that ID attributes are detected for 
XIncludes with XPointers. IMO, it should be the application's responsibility to 
set the XML_PARSE_DTDLOAD flag in this case. But changing the behavior might 
break code that relies on this feature.

> Is libxml2 doing that for each file over and over?

Yes.

> Wouldn't it make sense to only load each dtd once?

This would make sense.

> And where exatly is it loaded (I can only
> see xmlFreeDtd, but can't find a xmlLoadDtd or the like.

Via xmlParseDocument -> xmlSAX2ExternalSubset -> xmlParseExternalSubset.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-15 Thread Nick Wellnhofer

On 14/05/2018 21:48, Stefan Sauer wrote:

This part looks suspicious:

|--22.98%--0xc2160
|  xmlFreeDoc
|  |
|   --22.42%--xmlFreeDtd



Can I tell it to not load dtds in the first place? Is it loading the dtd for 
each an every xinclude?


Good catch. It seems that the XInclude engine always parses included docs with 
XML_PARSE_DTDLOAD:


https://git.gnome.org/browse/libxml2/tree/xinclude.c#n450

If you're not using XML catalogs, this will probably cause the DTD to be 
loaded over the network multiple times which could explain the slowdown.


Can you try to change the line to

xmlCtxtUseOptions(pctxt, ctxt->parseFlags);

and see if it helps?

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] performance of parsing docbook with xincludes

2018-05-14 Thread Nick Wellnhofer

On 13/05/2018 20:54, Stefan Sauer wrote:

Lets look at some numbers using glib (https://gitlab.gnome.org/GNOME/glib)

cd glib/docs/reference/glib
xmllint --timing --xinclude --noout glib-docs.xml
Parsing took 0 ms
Xinclude processing took 4560 ms
Freeing took 91 ms

Any idea how I can get more breakdown of whats happening in  'Xinclude
processing'?


It seems that "XInclude processing" also contains the time needed to parse the 
included documents, so maybe the XIncludes aren't the issue at all 
(glib-docs.xml is a small document including several larger ones). Can you 
save glib-docs.xml after processing XIncludes and check whether parsing the 
consolidated document is considerably faster?



Running with "perf record -g -- xmllint --timing --xinclude --noout
glib-docs.xml" gets me such a report.

+   17.15%    16.69%  xmllint  libc-2.24.so    [.] _int_malloc
+   11.93%    11.87%  xmllint  libc-2.24.so    [.] malloc_consolidate
+    9.01% 8.97%  xmllint  libxml2.so.2.9.4    [.] xmlDictLookup
+    7.15% 0.00%  xmllint  ld-2.24.so  [.] 0x8021a0022010
+    6.25% 6.21%  xmllint  libxml2.so.2.9.4    [.] xmlHashAddEntry3
+    6.22% 0.00%  xmllint  libxml2.so.2.9.4    [.] xmlSAX2IsStandalone
+    6.22% 0.00%  xmllint  [unknown]   [.] 0x56413c74c0854810
+    3.95% 3.94%  xmllint  libxml2.so.2.9.4    [.] xmlHashLookup2
  3.72% 3.70%  xmllint  libc-2.24.so    [.] _int_free
+    3.28% 0.00%  xmllint  [unknown]   [.] 
+    3.06% 3.04%  xmllint  libxml2.so.2.9.4    [.]
xmlFreeDocElementContent
+    2.96% 2.91%  xmllint  libc-2.24.so    [.] free


The callgraph based reports (perf report -g or -G) are usually more helpful.


Any ideas. Is there a know issues with using xincludes here?


It might be quadratic behavior in the XInclude engine or something else 
entirely. How large is glib-docs.xml after processing XIncludes?


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Is there a solution to have newline delimited output for xmllint ?

2018-03-20 Thread Nick Wellnhofer

On 20/03/2018 14:14, gilles.que...@sputnick.fr wrote:
I post many snippets with xmllint on stackoverflow and unix.stackexchange.com, 
but many times I'm stuck with this nice tool when it comes to retrieve N > 1 
text node, because the output is not newline delimited (unlike xmlstarlet).


It's not clear what exactly you're talking about, but I guess this is about 
the `--xpath` option and the bug you already posted:


https://bugzilla.gnome.org/show_bug.cgi?id=740827

I agree that printing text nodes without a separator is rather useless and I 
always found it annoying that the output isn't terminated with a newline at 
all. In this case, I'm not too concerned about backward compatibility and I'd 
simply change the `--xpath` output to always print a newline after each node, 
text or not. But maybe other people want to weigh in.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 2.9.8 build error on AIX, HP-UX and old Visual Studio like 10.0

2018-03-15 Thread Nick Wellnhofer

On 15/03/2018 17:29, Fabrice Manfroi wrote:

The patch works for AIX/HP but with the old Visual Studio 2010 I have
another error:

{quote}
..\xpath.c(501) : error C2124: divide or mod by zero
{quote}


Can you try this updated version of the patch (against master, not on top of 
the previous patch).


BTW, thanks for testing on some more exotic platforms. I'm curious whether you 
also run the test suite (`make check` or `nmake tests` with MSVC). If yes, are 
there any errors?


Nick

diff --git a/xpath.c b/xpath.c
index f4406967..89fab588 100644
--- a/xpath.c
+++ b/xpath.c
@@ -477,27 +477,28 @@ int wrap_cmp( xmlNodePtr x, xmlNodePtr y );
  * *
  /
 
-#ifndef NAN
-#define NAN (0.0 / 0.0)
-#endif
-
 #ifndef INFINITY
-#define INFINITY HUGE_VAL
+#define INFINITY (DBL_MAX * DBL_MAX)
 #endif
 
-double xmlXPathNAN = NAN;
-double xmlXPathPINF = INFINITY;
-double xmlXPathNINF = -INFINITY;
+#ifndef NAN
+#define NAN (INFINITY / INFINITY)
+#endif
+
+double xmlXPathNAN;
+double xmlXPathPINF;
+double xmlXPathNINF;
 
 /**
  * xmlXPathInit:
  *
  * Initialize the XPath environment
- *
- * Does nothing but must be kept as public function.
  */
 void
 xmlXPathInit(void) {
+xmlXPathNAN = NAN;
+xmlXPathPINF = INFINITY;
+xmlXPathNINF = -INFINITY;
 }
 
 /**
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 2.9.8 build error on AIX, HP-UX and old Visual Studio like 10.0

2018-03-15 Thread Nick Wellnhofer

On 15/03/2018 15:50, Fabrice Manfroi wrote:

..\xpath.c(506) : error C2099: initializer is not a constant


Does the attached patch work for you?

Nick

diff --git a/xpath.c b/xpath.c
index f4406967..773e848b 100644
--- a/xpath.c
+++ b/xpath.c
@@ -485,9 +485,9 @@ int wrap_cmp( xmlNodePtr x, xmlNodePtr y );
 #define INFINITY HUGE_VAL
 #endif
 
-double xmlXPathNAN = NAN;
-double xmlXPathPINF = INFINITY;
-double xmlXPathNINF = -INFINITY;
+double xmlXPathNAN;
+double xmlXPathPINF;
+double xmlXPathNINF;
 
 /**
  * xmlXPathInit:
@@ -498,6 +498,9 @@ double xmlXPathNINF = -INFINITY;
  */
 void
 xmlXPathInit(void) {
+xmlXPathNAN = NAN;
+xmlXPathPINF = INFINITY;
+xmlXPathNINF = -INFINITY;
 }
 
 /**
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Time for some releases

2018-01-22 Thread Nick Wellnhofer

On 21/01/2018 07:22, Daniel Veillard wrote:

  I think it's time for a new set of releases,
I failed to push in the last 2 months and a number of patches
have accumulated since november, so I think entering freeze on Mon or
Tuesday, then having rc2 around end of week for a release early
around 29-30 Jan would make sense,

   unless there is something pending,


There's nothing pending from my side.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Heap use after free in parser.c

2018-01-22 Thread Nick Wellnhofer

On 08/01/2018 22:43, Jay Civelli wrote:
On Mon, Jan 8, 2018 at 11:27 AM, Nick Wellnhofer <wellnho...@aevum.de 
<mailto:wellnho...@aevum.de>> wrote:


On 02/01/2018 20:08, Jay Civelli via xml wrote:

We ran into a heap use after free in Chromium http://crbug.com/793715
<http://crbug.com/793715> that I think I tracked down.

I don't have access to this page.

You should have access now.


I still don't have access to the original Clusterfuzz report. I only found 
your reduced test case "bad_xml" but I couldn't reproduce the issue with 
xmllint. Given the stack trace and Chromium sources, it seems that you're 
using xmlReaderForMemory in recovery mode:



https://chromium.googlesource.com/chromium/src/+/master/third_party/libxml/chromium/libxml_utils.cc

Note that it's discouraged to use XML_PARSE_RECOVER in production code. This 
flag hides errors in invalid XML and exercises some less-tested code paths in 
libxml2.


For future reports, it would be helpful to provide test cases that show the 
problem with xmllint. The following flags should make xmllint behave similar 
to the Chromium code in question:


xmllint --stream --memory --recover file.xml

Good idea, done in new attached patch. Note that I changed the error from the 
existing from XML_ERR_INVALID_ENCODING to XML_ERR_INVALID_CHAR which seemed to 
make more sense.


I committed a minimal fix that only adds a call to xmlHaltParser.


https://git.gnome.org/browse/libxml2/commit/?id=ab362ab0ad3af54406ae8237a525405c6e2a705b

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Check hex or decimal entity for overflow

2018-01-22 Thread Nick Wellnhofer

On 09/01/2018 00:55, Joel Hockey wrote:

Updated patch with XML_ERR_INVALID_CHAR.


Should be fixed with


https://git.gnome.org/browse/libxml2/commit/?id=60dded12cbf1705927803c5ed615a7a0132aebbd

As noted previously, this only affects "recovery" mode. The commit addresses 
the issue at an earlier point in the parsing process and makes sure not to 
return invalid entity content in recovery mode at all.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Fwd: Patch to fix ICU flush and pivot buffer

2018-01-08 Thread Nick Wellnhofer

On 08/01/2018 02:06, Joel Hockey wrote:
Nick, I have another patch for some additional call sites where flush is being 
incorrectly set on the non-final read.


Applied here:


https://git.gnome.org/browse/libxml2/commit/?id=6e6ae5daa6cd9640c9a83c1070896273e9b30d14

Looks right, but I applied the patch more or less blindly. The iconv code 
seems to ignore the `flush` argument, so this should only affect the ICU code.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Heap use after free in parser.c

2018-01-08 Thread Nick Wellnhofer

On 02/01/2018 20:08, Jay Civelli via xml wrote:
We ran into a heap use after free in Chromium http://crbug.com/793715 
 that I think I tracked down.


I don't have access to this page.


I have a tentative patch attached to address it.
In parser.c, if a call to xmlCharEncInput() fails and has grown the buffer, 
the ctxt object could still point to the old deleted buffer.


Maybe it's better to call xmlHaltParser if xmlCharEncInput fails. That's what 
the other code path in xmlParseChunk does.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Check hex or decimal entity for overflow

2018-01-08 Thread Nick Wellnhofer

On 08/01/2018 02:06, Joel Hockey wrote:
The entity parsing code in tree.c is getting integer overflow when a very 
long, invalid hex (or decimal) entity is used:  e.g. #xabcdefabcdef;


This is probably the same issue as

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3874

Also see

https://bugzilla.gnome.org/show_bug.cgi?id=783052

The issue only arises in "recovery" mode (XML_PARSE_RECOVER). In the past, I 
tried to fix similar issues by not adding nodes containing invalid character 
references at all in an earlier stage of the parsing code, but I'm fine with 
your approach.


For these cases, I am setting the error to XML_TREE_UNTERMINATED_ENTITY.  The 
other 2 existing codes are XML_TREE_INVALID_HEX, XML_TREE_INVALID_DEC.  I 
thought unterminated is the better choice, but maybe a new code such as 
XML_TREE_INVALID_CHAR could be used.


Regarding the error code, we could simply use XML_ERR_INVALID_CHAR or not 
report an error at all since invalid numeric character references are already 
detected and reported earlier.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Simplify XPath NaN, inf and -0 handling

2017-12-09 Thread Nick Wellnhofer

On 08/12/2017 10:26, Daniel Veillard wrote:

   what do you really gain by this ?
Things which were known portable cross many platforms now become unclear again.
Trio might be deprecated, I would ask Daniel Stenberg first, so in copy,


Most of all, I want to get rid of the Trio dependency. I'm aware that this 
change could break some ancient platforms but I'm willing to help with fixes 
in a timely manner.


If you think the change is too dangerous then I can revert it. But at one 
point, we should discuss how to handle compatibility with 20+ year old platforms.


That said, Trio still seems to be supported. The latest release is 1.16 from 
2014:

https://sourceforge.net/projects/ctrio/files/trio/

Upgrading our copy of the Trio code should fix some compiler warnings, but 
personally I'd prefer not having to deal with third party code at all.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Patch suggestion for "fixing" 10 MB limit when using xmlNewTextWriterDoc

2017-12-04 Thread Nick Wellnhofer

On 29/11/2017 22:14, Stian Hvatum wrote:
I am one of those who have been bit by the 10 MB limit when building an XML 
using xmlNewTextWriterDoc as constructor the xmlTextWriter.


I already mentioned on this list that, personally, I'd completely remove the 
text node size limit. It ostensibly protects against DoS attacks, but there 
are countless other ways to make libxml2 consume even more resources than 
feeding it large text nodes.


Instead of adding even more kludges, I'd prefer a work-around that doesn't 
require additional public functions. Maybe you can simply call 
xmlNewTextWriterPushParser with a custom parser context instead?


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Update information about contributing

2017-11-14 Thread Nick Wellnhofer

Here's a patch that I'd like to discuss before committing.

Nick

--

https://github.com/nwellnhof/libxml2/commit/cbedb8de41ba260d8cf5a4b9858f43175d01715e

Update information about contributing

The contents of the HACKING file were hopelessly outdated. Remove the
file and start with a CONTRIBUTING document.
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Simplify XPath NaN, inf and -0 handling

2017-11-14 Thread Nick Wellnhofer

Here's are patch that I'd like to discuss before committing.

Nick

--

https://github.com/nwellnhof/libxml2/commit/8813f397f8925f85ffbe9e9fb62bfaa3c1accf11

Simplify XPath NaN, inf and -0 handling

Use C99 macros NAN, INFINITY, isnan, isinf. If they're not available:

- Assume that (0.0 / 0.0) generates a NaN and !(x == x) tests for NaN.
- Use C89's HUGE_VAL for INFINITY.

Remove manual handling of NaN, infinity and negative zero in functions
xmlXPathValueFlipSign and xmlXPathDivValues.

Remove xmlXPathGetSign. All the tests for negative zero can be replaced
with a test for negative or positive zero.

Simplify xmlXPathRoundFunction.

Remove Trio dependency.

This should work on IEEE 754 compliant implementations even if the C99
macros aren't available, but will likely break some ancient platforms.
If problems arise, my plan is to port the relevant trionan.c solution
to xpath.c. Note that non-compliant implementations are impossible
to fully support, anyway, since XPath requires IEEE 754.
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


  1   2   >