Ah!

That could well be it - thanks Mark! I’ll test this out. 

Chris

> On 29 Jan 2016, at 3:26 AM, Mark Hung <mark...@gmail.com> wrote:
> 
> http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454
>  
> <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454>
> 
> -    else if ( rName == "Content-Type" )
> +    else if ( rName.equalsIgnoreAsciiCaseAscii("Content-Type"))
> 
> 
> 
> 2016-01-28 9:16 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com 
> <mailto:chris.sherloc...@gmail.com>>:
> Hi guys, 
> 
> I’m afraid I’m still a bit stuck on this, any other ideas what might be 
> causing the problem?
> 
> Chris
> 
>> On 6 Jan 2016, at 4:27 AM, Chris Sherlock <chris.sherloc...@gmail.com 
>> <mailto:chris.sherloc...@gmail.com>> wrote:
>> 
>> Thanks Mark, appreciate these code pointers!
>> 
>> (I’m cc’ing in the mailing list so others can comment)
>> 
>> Chris
>> 
>>> On 4 Jan 2016, at 8:21 PM, Mark Hung <mark...@gmail.com 
>>> <mailto:mark...@gmail.com>> wrote:
>>> 
>>> 
>>> I meant there is a chance for SvParser::GetNextChar() to switch encoding, 
>>> but yes it is less relevant.
>>> 
>>> Grepping content-type under ucb , there are some suspicious code
>>> http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454
>>>  
>>> <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454>
>>> http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav/ContentProperties.cxx#471
>>>  
>>> <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav/ContentProperties.cxx#471>
>>> 
>>> Which seems incosistent with 
>>> http://opengrok.libreoffice.org/xref/core/sc/source/filter/html/htmlpars.cxx#264
>>>  
>>> <http://opengrok.libreoffice.org/xref/core/sc/source/filter/html/htmlpars.cxx#264>
>>> 
>>> 
>>> 2016-01-04 16:17 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com 
>>> <mailto:chris.sherloc...@gmail.com>>:
>>> Hi Mark,
>>> 
>>> BOM detection is irrelevant here. The HTTP header states that it should be 
>>> UTF8, but this is not being honoured. 
>>> 
>>> There is something further down the stack that isn’t recording the HTTP 
>>> headers. 
>>> 
>>> Chris
>>> 
>>>> On 4 Jan 2016, at 4:23 PM, Mark Hung <mark...@gmail.com 
>>>> <mailto:mark...@gmail.com>> wrote:
>>>> 
>>>> Hi Chris,
>>>> 
>>>> As recently I'm working on SvParser and HTMLParser, 
>>>> 
>>>> There is BOM detection is in SvParser::GetNextChar().
>>>> 
>>>> A quick look at eehtml, EditHTMLParser:: <>EditHTMLParser seems relevant.
>>>> 
>>>> Best regards.
>>>> 
>>>> 
>>>> 2016-01-04 12:02 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com 
>>>> <mailto:chris.sherloc...@gmail.com>>:
>>>> Hey guys, 
>>>> 
>>>> Probably nobody saw this because of the time of year (Happy New Year, 
>>>> incidentally!!!).
>>>> 
>>>> Just a quick ping to the list to see if anyone can give me some pointers. 
>>>> 
>>>> Chris
>>>> 
>>>>> On 30 Dec 2015, at 12:15 PM, Chris Sherlock <chris.sherloc...@gmail.com 
>>>>> <mailto:chris.sherloc...@gmail.com>> wrote:
>>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> In bug 95217 - https://bugs.documentfoundation.org/show_bug.cgi?id=95217 
>>>>> <https://bugs.documentfoundation.org/show_bug.cgi?id=95217> - Persian 
>>>>> test in a webpage encoded as UTF-8 is corrupting.
>>>>> 
>>>>> If I take the webpage and save to an HTML file encoded as UTF8, then 
>>>>> there are no problems and the Persian text comes through fine. However, 
>>>>> when connecting to a webserver directly, the HTTP header correctly gives 
>>>>> the content type as utf8.
>>>>> 
>>>>> I did a test using Charles Proxy with its SSL interception feature turned 
>>>>> on and pointed Safari to 
>>>>> https://bugs.documentfoundation.org/attachment.cgi?id=119818 
>>>>> <https://bugs.documentfoundation.org/attachment.cgi?id=119818>
>>>>> 
>>>>> The following headers are gathered:
>>>>> 
>>>>> HTTP/1.1 200 OK
>>>>> Server: nginx/1.2.1
>>>>> Date: Sat, 26 Dec 2015 01:41:30 GMT
>>>>> Content-Type: text/html; name="text.html"; charset=UTF-8
>>>>> Content-Length: 982
>>>>> Connection: keep-alive
>>>>> X-xss-protection: 1; mode=block
>>>>> Content-disposition: inline; filename="text.html"
>>>>> X-content-type-options: nosniff
>>>>> 
>>>>> Some warnings are spat out that it editeng's eehtml can't detect the 
>>>>> encoding. I initially thought it was looking for a BOM, which makes no 
>>>>> sense for a webpage, but that's wrong. Instead, for some reason the 
>>>>> headers don't seem to be processed and the HTML parser is falling back to 
>>>>> ISO-8859-1 and not UTF8 as the character encoding.
>>>>> 
>>>>> We seem to use Neon to make the GET request to the webserver. A few 
>>>>> observations:
>>>>> 
>>>>> 1. We detect a server OK response as an error
>>>>> 2. (Probably more to the point) I believe PROPFIND is being used, but 
>>>>> actually even though the function being used indicates a PROPFIND verb is 
>>>>> used a GET is used as is normal but the headers aren't being stored. This 
>>>>> ,Evans that when the parser looks for the headers to find the encoding 
>>>>> it's not finding anything, resulting in a fallback to ISO-8859-1.
>>>>> 
>>>>> One easy thing (doesn't solve the root issue) is that wouldn't it be a 
>>>>> better idea to fallback to UTF8 and not ISO-8859-1, given ISO-8859-1 is 
>>>>> really just a subset of UTF-8?
>>>>> 
>>>>> Any pointers on how to get to the bottom of this would be appreciated, 
>>>>> I'm honestly not up on webdav or Neon.
>>>>> 
>>>>> Chris Sherlock
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LibreOffice mailing list
>>>> LibreOffice@lists.freedesktop.org 
>>>> <mailto:LibreOffice@lists.freedesktop.org>
>>>> http://lists.freedesktop.org/mailman/listinfo/libreoffice 
>>>> <http://lists.freedesktop.org/mailman/listinfo/libreoffice>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Mark Hung
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Mark Hung
>> 
> 
> 
> _______________________________________________
> LibreOffice mailing list
> LibreOffice@lists.freedesktop.org <mailto:LibreOffice@lists.freedesktop.org>
> http://lists.freedesktop.org/mailman/listinfo/libreoffice 
> <http://lists.freedesktop.org/mailman/listinfo/libreoffice>
> 
> 
> 
> 
> -- 
> Mark Hung

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to