On Thu, May 27, 2010 at 6:58 AM, Arv Mistry <a...@kindsight.net> wrote:
> Thanks for responding Ted. I did see that link before but there wasn't enough 
> details there for me to make sense of it. I'm not sure who Owen is ;(

I'm Owen, although I think I've used at least 5 different email
addresses on these lists at various times. *smile*

Since you specify 0.20, you'd probably want to put your keys in to
HDFS and read it from the tasks. Note that this is *not* secure and
other users of your cluster can access your data in HDFS with only a
tiny bit of mis-direction. (This will be fixed in 0.22, where we are
adding strong authentication based on Kerberos.)

The next step would be to define a compression codec that does the
encryption. So let's say you define a XorEncryption that does a simple
xor with a byte. (Obviously, you would use something better than xor,
it is just an example!) XorEncryption would need to implement
org.apache.hadoop.io.compression.CompressionCodec. You'd also need add
your new class to the list of codecs in the configuration variable
io.compression.codecs.

For details of how to configure your mapreduce job with compression
(or in this case encryption), look at http://bit.ly/9PMHUA. If
XorEncryption returned ".xor" getDefaultExtension, then any file that
ended in .xor would automatically be put through the encryption. So
input is automatically handled. You need to define some configuration
variables to get it applied to the output of MapReduce.

-- Owen

Reply via email to