On 06/13/2013 12:51 AM, Joseph D. Wagner wrote: > On 06/11/2013 4:36 pm, Pádraig Brady wrote: > >> On 06/11/2013 07:20 AM, Joseph D. Wagner wrote: >> >>> Currently, when --remove (-u) is specified, shred overwrites the file >>> name once for each character, so a file name of 0123456789 would be >>> overridden 10 times. While this may be the most secure, it is also the >>> most time consuming as each of the 10 renames has its own fsync. Also, >>> renaming may not be as effective on some journaled file systems. This >>> patch adds the option --wipename (-w) which accepts the options: * >>> perchar - overwrite file name once per character; same as current. * >>> once - overwrite file name once total. * none - skip overwrite of file >>> name entirely; just unlink. If --remove is specified but not >>> --wipename, perchar is assumed, preserving current behavior. Specifying >>> --wipename implies --remove. In theory, this should provide improved >>> performance for those who choose it, especially when deleting many >>> small files. I am currently testing performance on my system, but I >>> wanted to get the ball rolling by soliciting your comments and your >>> receptiveness to accepting this patch. Thoughts? >> >> Thanks for the patch. >> While on the face of it, the extra control seems beneficial, >> I'm not convinced. The main reason is that this gives >> some extra credence to per file shreds, which TBH are >> not guaranteed due to journalling etc. >> >> I see performance as an important consideration when >> shredding large amounts of data like a disk device. >> However single file performance is less of a concern. >> The normal use case for shred would be for single files, >> or at the device level. shredding many files is not the >> normal use case to worry about IMHO. If one was worried >> about securely shredding many files, it's probably best >> to have those files on a separate file system, and shred >> that at a lower level. >> >> In any case if you really were OK with just unlinking files >> after shredding the data, that can be done in a separate operation: >> find | xargs shred >> find | xargs rm >> >> So I'm 60:40 against adding this option. >> >> thanks, >> Pádraig. > > I thought about running two separate operations, as you suggested. > However, my problem with that would be the loss of an atomic > transaction. What if something happens midway through the shred? I > would not know which files were shredded, and I would have to start > over. Worse, if running from a script, it might execute the unlinking > without having completed the shred. While I could create all sorts of > sophisticated code to check these things, it would be a lot easier if I > could simply rely on the mechanisms already built into shred.
Well you'd use the standard simple idiom of: shred && rm But granted that would mean no unlinking was done, if there is any IO error in a shred. > I can understand your concern about a tool being misused. If adding a > warning to the usage output would help alleviate your concerns, I would > be happy to draft one and add it to my patch. However, I do not believe > people should be denied a tool due to its potential misuse. Would you > deny people the use of an iron due to its risk of misuse or injury? My > personal philosophy is to give them the tool with instructions and > warnings. If the user disregards this information, it is not my > problem. In my case, I am using shred to purge information from file > systems that cannot be taken offline. Given the specific file system, > its configuration, and modest sensitivity of the information, the > decision was made that this is an acceptable risk. I believe I should > be able to assume those risks, without being denied optimizations > because they are not considered best practices for the majority of use > cases. > > As for the performance improvement itself, the result is measurably > significant. I wrote a script that creates 100,000 files, and then > measures the performance of shredding those files using the different > wipename options in my patch. Exact results and the exact script are > below. > > I am hoping these hard numbers and my kind, persuasive arguments will > convince you to change your mind, and accept my patch. Thanks for the clear and detailed arguments. They're certainly persuasive. > ## perchar ## > real 678m33.468s > user 0m9.450s > sys 3m20.001s > > ## once ## > real 151m54.655s > user 0m3.336s > sys 0m32.357s > > ## none ## > real 107m34.307s > user 0m2.637s > sys 0m21.825s Whoa, so this creates 23s CPU work but waits for 1 hour 47 mins on the sync! What file system and backing device are you using here as a matter of interest? > > perchar: 11 hours 18 minutes 33.468 seconds > once: 2 hours 31 minutes 54.655 seconds > * a 346% improvement over perchar > none: 1 hour 47 minutes 34.307 seconds > * a 530% improvement over perchar > * a 41% improvement over once cheers, Pádraig.