I created a branch on my fork with a little extra logging, and I don't
think this is Xerces now.

The issue appears to be in the DaffodilConstructingLoader. In that
constructor, we're creating an XmlStreamReader and calling getEncoding.
Normally that returns UTF-16BE for these tests, but when the tests fail,
it returns UTF-8. So for some reason something is racey there and
XmlStreamReader isn't detecting the encoding correctly sometimes...


On 12/19/19 5:55 PM, Steve Lawrence wrote:
> On 12/19/19 12:09 PM, Dave Fisher wrote:
>>
>>
>>> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <slawre...@apache.org> wrote:
>>>
>>> Unfortunately, this error happens from time to time, and we haven't been
>>> able to track it down. Primarily because I don't think anyone has been
>>> able to reliably reproduce it. I know I've never actually seen it
>>> outside of the CI.
>>>
>>> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
>>>
>>> I think the assumption is there is some kindof non-thread-safe code in
>>> Xerces (or something that parses the XML) and it hits som race condition
>>> that prevents it from detecting that the file is UTF-16, and so can't
>>> parse the file correctly.
>>
>> If you think that this a Xerces issue then I’d ask on the Xerces dev list.
>>
>> Regards,
>> Dave
>>
> 
> I'm actually not entirely convinced it's xerces yet. The SDE is
> happening because DaffodilXMLLoader.load is returning null. Looking at
> that function, it can return null in two different ways:
> 
> xercesAdapter.load(inputSource)
> 
>   and
> 
> constructingLoader.load()
> 
> The first is used for validation, the second actually loads the XML.
> Based on the error it's not clear which is failing, but the
> constructingLoader is daffodil stuff.
> 
> Interestingly, the DaffodilConstructingLoader constructor is maybe a
> little suspicious:
> 
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala#L75-L87
> 
> That code is using Apache Commons XMLStreamReader to detect the encoding
> in the constructor. Considering the issue appears to be related to not
> detecting UTF-16, the issue might be in there as well.
> 
> So lots of problems where the issue could be: Xerces, Apache Commons, or
> Daffodil.
> 

Reply via email to