Hi, I'm looking into writing a patch for HDFS which will provide a new method within HDFS which can securely delete the contents of a block on all the nodes upon which it exists. By securely delete I mean, overwrite with 1's/0's/random data cyclically such that the data could not be recovered forensically.
I'm not currently aware of any existing code / methods which provide this, so was going to implement this myself. I figured the DataNode.java was probably the place to start looking into how this could be done, so I've read the source for this, but it's not really enlightened me a massive amount. I'm assuming I need to tell the NameServer that all DataNodes with a particular block id would be required to be deleted, then as each DataNode calls home, the DataNode would be instructed to securely delete the relevant block, and it would oblige. Unfortunately I have no idea where to begin and was looking for some pointers? I guess specifically I'd like to know: 1. Where the hdfs CLI commands are implemented 2. How a DataNode identifies a block / how a NameServer could inform a DataNode to delete a block 3. Where the existing "delete" is implemented so I can make sure my secure delete makes use of it after successfully blanking the block contents 4. If I've got the right idea about this at all? Kind regards, Matt Fellows -- [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94] First Option Software Ltd Signal House Jacklyns Lane Alresford SO24 9JJ Tel: +44 (0)1962 738232 Mob: +44 (0)7710 160458 Fax: +44 (0)1962 600112 Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/> -- ____________________________________________________ This is confidential, non-binding and not company endorsed - see full terms at www.fosolutions.co.uk/emailpolicy.html First Option Software Ltd Registered No. 06340261 Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K. ____________________________________________________