Erik Larsson wrote: > Hi Jean-Pierre, > > You are very right, but the upside is that listing the directory at > least works (with the exception of the files with the bad filenames) as > opposed to aborting with error as soon as a bad filename is encountered. > > So we are more error-tolerant with this patch... I think this is a good > thing given that chkdsk doesn't appear to make any efforts at repairing > this filename (it doesn't think there is any corruption on this > particular volume... tested with WinXP's chkdsk and Win8's). > > Manufacturing a fake UTF-8 file name as a handle just to be able to > access these corrupted UTF-16 filenames seems overly complex for this > case... taking into account possible name collisions and such.
I agree, this is a slippery road, and your proposal will save time dealing with rare issues. Regards Jean-Pierre > > Best regards, > > - Erik > > On 2016-04-06 18:14, Jean-Pierre André wrote: >> Hi Erik, >> >> Your patch will help for examining the directory, but >> IMHO you will not be able the read, delete or rename >> the bad file, because you will have to enter a uts8 >> name which will not translate to the bad Unicode for >> accessing the file. Even if you use wildcards, ntfs-3g >> only get requests with utf8 names. >> >> When accessing the directory, you will however get the >> inode number to retrieve the contents using ntfscat. >> >> Regards >> >> Jean-Pierre >> >> Erik Larsson wrote: >>> Hi, >>> >>> Attached to this email is a patch which does just what I suggested... >>> emitting a log message but proceeding normally and ignoring the entry >>> when a bad filename is encountered during readdir. This fixes the >>> problem for me. >>> >>> Jean-Pierre, please review and decide whether this is a good idea. >>> >>> Best regards, >>> >>> - Erik >>> >>> On 2016-04-06 17:27, Erik Larsson wrote: >>>> Hi, >>>> >>>> I looked into this image and noticed that there are 4 filenames in >>>> /WINDOWS/system32 that cannot be decoded. >>>> >>>> One example is the MFT entry 30661 with the filename (as UTF-16 >>>> units): 0xDE5C 0xDC93 0x002E 0x006C 0x006F 0x0067 >>>> The filename ends with '.log' but the first two UTF-16 units is where >>>> Unicode decoding blows up. 0xDE5C is the low value of a surrogate pair >>>> according to Wikipedia (range: 0xDC00-0xDFFF). We are expecting the >>>> high value (0xD800-0xDBFF) to come first. >>>> It is then followed by another low value of a surrogate pair, 0xDC93. >>>> This is clearly a corruption... a surrogate pair should consist of a >>>> high value followed by a low value. >>>> >>>> I have no idea how this file was created... if Windows did this, then >>>> we might need to be able to cope with such corruption better (e.g. >>>> ignoring the entry during readdir and just emit a log message). >>>> >>>> Best regards, >>>> >>>> - Erik >>>> >>>> On 2016-04-06 13:06, Richard W.M. Jones wrote: >>>>> The reporter kindly gave me permission to distribute the metadata >>>>> file. I've put it up here: >>>>> >>>>> http://oirase.annexia.org/tmp/bz1301593/ >>>>> >>>>> $ md5sum ntfsclone_sda2.xz >>>>> 6cadc64de3196311c8159dc12f84484c ntfsclone_sda2.xz >>>>> >>>>> Rich. >>>>> >>>> >>> >> > > ------------------------------------------------------------------------------ _______________________________________________ ntfs-3g-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
