I've created a WIP pull request that switches from XmlStreamReader to a
scala-xml EnodingHeuristic to detect the charset.

https://github.com/apache/incubator-daffodil/pull/306

All tests pass, and I haven't been able to retrigger the failure, but I
also don't really know what caused the issue before, and I've only seen
it once today with the XmlStreamReader.

I think we should periodically trigger this PR to rebuild and see if the
tests ever fail again. If not, we can be somewhat confident that the
issue is somehow related to the XmlStreamReader.


On 12/20/19 11:25 AM, Steve Lawrence wrote:
> I created a branch on my fork with a little extra logging, and I don't
> think this is Xerces now.
> 
> The issue appears to be in the DaffodilConstructingLoader. In that
> constructor, we're creating an XmlStreamReader and calling getEncoding.
> Normally that returns UTF-16BE for these tests, but when the tests fail,
> it returns UTF-8. So for some reason something is racey there and
> XmlStreamReader isn't detecting the encoding correctly sometimes...
> 
> 
> On 12/19/19 5:55 PM, Steve Lawrence wrote:
>> On 12/19/19 12:09 PM, Dave Fisher wrote:
>>>
>>>
>>>> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <slawre...@apache.org> wrote:
>>>>
>>>> Unfortunately, this error happens from time to time, and we haven't been
>>>> able to track it down. Primarily because I don't think anyone has been
>>>> able to reliably reproduce it. I know I've never actually seen it
>>>> outside of the CI.
>>>>
>>>> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
>>>>
>>>> I think the assumption is there is some kindof non-thread-safe code in
>>>> Xerces (or something that parses the XML) and it hits som race condition
>>>> that prevents it from detecting that the file is UTF-16, and so can't
>>>> parse the file correctly.
>>>
>>> If you think that this a Xerces issue then I’d ask on the Xerces dev list.
>>>
>>> Regards,
>>> Dave
>>>
>>
>> I'm actually not entirely convinced it's xerces yet. The SDE is
>> happening because DaffodilXMLLoader.load is returning null. Looking at
>> that function, it can return null in two different ways:
>>
>> xercesAdapter.load(inputSource)
>>
>>   and
>>
>> constructingLoader.load()
>>
>> The first is used for validation, the second actually loads the XML.
>> Based on the error it's not clear which is failing, but the
>> constructingLoader is daffodil stuff.
>>
>> Interestingly, the DaffodilConstructingLoader constructor is maybe a
>> little suspicious:
>>
>> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala#L75-L87
>>
>> That code is using Apache Commons XMLStreamReader to detect the encoding
>> in the constructor. Considering the issue appears to be related to not
>> detecting UTF-16, the issue might be in there as well.
>>
>> So lots of problems where the issue could be: Xerces, Apache Commons, or
>> Daffodil.
>>
> 

Reply via email to