Hi Jean-Pierre, You are very right, but the upside is that listing the directory at least works (with the exception of the files with the bad filenames) as opposed to aborting with error as soon as a bad filename is encountered.
So we are more error-tolerant with this patch... I think this is a good thing given that chkdsk doesn't appear to make any efforts at repairing this filename (it doesn't think there is any corruption on this particular volume... tested with WinXP's chkdsk and Win8's). Manufacturing a fake UTF-8 file name as a handle just to be able to access these corrupted UTF-16 filenames seems overly complex for this case... taking into account possible name collisions and such. Best regards, - Erik On 2016-04-06 18:14, Jean-Pierre André wrote: > Hi Erik, > > Your patch will help for examining the directory, but > IMHO you will not be able the read, delete or rename > the bad file, because you will have to enter a uts8 > name which will not translate to the bad Unicode for > accessing the file. Even if you use wildcards, ntfs-3g > only get requests with utf8 names. > > When accessing the directory, you will however get the > inode number to retrieve the contents using ntfscat. > > Regards > > Jean-Pierre > > Erik Larsson wrote: >> Hi, >> >> Attached to this email is a patch which does just what I suggested... >> emitting a log message but proceeding normally and ignoring the entry >> when a bad filename is encountered during readdir. This fixes the >> problem for me. >> >> Jean-Pierre, please review and decide whether this is a good idea. >> >> Best regards, >> >> - Erik >> >> On 2016-04-06 17:27, Erik Larsson wrote: >>> Hi, >>> >>> I looked into this image and noticed that there are 4 filenames in >>> /WINDOWS/system32 that cannot be decoded. >>> >>> One example is the MFT entry 30661 with the filename (as UTF-16 >>> units): 0xDE5C 0xDC93 0x002E 0x006C 0x006F 0x0067 >>> The filename ends with '.log' but the first two UTF-16 units is where >>> Unicode decoding blows up. 0xDE5C is the low value of a surrogate pair >>> according to Wikipedia (range: 0xDC00-0xDFFF). We are expecting the >>> high value (0xD800-0xDBFF) to come first. >>> It is then followed by another low value of a surrogate pair, 0xDC93. >>> This is clearly a corruption... a surrogate pair should consist of a >>> high value followed by a low value. >>> >>> I have no idea how this file was created... if Windows did this, then >>> we might need to be able to cope with such corruption better (e.g. >>> ignoring the entry during readdir and just emit a log message). >>> >>> Best regards, >>> >>> - Erik >>> >>> On 2016-04-06 13:06, Richard W.M. Jones wrote: >>>> The reporter kindly gave me permission to distribute the metadata >>>> file. I've put it up here: >>>> >>>> http://oirase.annexia.org/tmp/bz1301593/ >>>> >>>> $ md5sum ntfsclone_sda2.xz >>>> 6cadc64de3196311c8159dc12f84484c ntfsclone_sda2.xz >>>> >>>> Rich. >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ ntfs-3g-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
