Erik Larsson wrote:
> Hi Jean-Pierre,
>
> You are very right, but the upside is that listing the directory at
> least works (with the exception of the files with the bad filenames) as
> opposed to aborting with error as soon as a bad filename is encountered.
>
> So we are more error-tolerant with this patch... I think this is a good
> thing given that chkdsk doesn't appear to make any efforts at repairing
> this filename (it doesn't think there is any corruption on this
> particular volume... tested with WinXP's chkdsk and Win8's).
>
> Manufacturing a fake UTF-8 file name as a handle just to be able to
> access these corrupted UTF-16 filenames seems overly complex for this
> case... taking into account possible name collisions and such.

I agree, this is a slippery road, and your proposal
will save time dealing with rare issues.

Regards

Jean-Pierre

>
> Best regards,
>
> - Erik
>
> On 2016-04-06 18:14, Jean-Pierre André wrote:
>> Hi Erik,
>>
>> Your patch will help for examining the directory, but
>> IMHO you will not be able the read, delete or rename
>> the bad file, because you will have to enter a uts8
>> name which will not translate to the bad Unicode for
>> accessing the file. Even if you use wildcards, ntfs-3g
>> only get requests with utf8 names.
>>
>> When accessing the directory, you will however get the
>> inode number to retrieve the contents using ntfscat.
>>
>> Regards
>>
>> Jean-Pierre
>>
>> Erik Larsson wrote:
>>> Hi,
>>>
>>> Attached to this email is a patch which does just what I suggested...
>>> emitting a log message but proceeding normally and ignoring the entry
>>> when a bad filename is encountered during readdir. This fixes the
>>> problem for me.
>>>
>>> Jean-Pierre, please review and decide whether this is a good idea.
>>>
>>> Best regards,
>>>
>>> - Erik
>>>
>>> On 2016-04-06 17:27, Erik Larsson wrote:
>>>> Hi,
>>>>
>>>> I looked into this image and noticed that there are 4 filenames in
>>>> /WINDOWS/system32 that cannot be decoded.
>>>>
>>>> One example is the MFT entry 30661 with the filename (as UTF-16
>>>> units): 0xDE5C 0xDC93 0x002E 0x006C 0x006F 0x0067
>>>> The filename ends with '.log' but the first two UTF-16 units is where
>>>> Unicode decoding blows up. 0xDE5C is the low value of a surrogate pair
>>>> according to Wikipedia (range: 0xDC00-0xDFFF). We are expecting the
>>>> high value (0xD800-0xDBFF) to come first.
>>>> It is then followed by another low value of a surrogate pair, 0xDC93.
>>>> This is clearly a corruption... a surrogate pair should consist of a
>>>> high value followed by a low value.
>>>>
>>>> I have no idea how this file was created... if Windows did this, then
>>>> we might need to be able to cope with such corruption better (e.g.
>>>> ignoring the entry during readdir and just emit a log message).
>>>>
>>>> Best regards,
>>>>
>>>> - Erik
>>>>
>>>> On 2016-04-06 13:06, Richard W.M. Jones wrote:
>>>>> The reporter kindly gave me permission to distribute the metadata
>>>>> file.  I've put it up here:
>>>>>
>>>>>    http://oirase.annexia.org/tmp/bz1301593/
>>>>>
>>>>>    $ md5sum ntfsclone_sda2.xz
>>>>>    6cadc64de3196311c8159dc12f84484c  ntfsclone_sda2.xz
>>>>>
>>>>> Rich.
>>>>>
>>>>
>>>
>>
>
>



------------------------------------------------------------------------------
_______________________________________________
ntfs-3g-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to