I implemented an RMI protocol using Hadoop IPC and implemented basic HMAC signing. It is I believe faster than public key private key because it uses a secret key and does not require public key provisioning like PKI would. Perhaps it would be a baseline way to sign the data.
On Thu, Sep 25, 2008 at 7:47 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Owen O'Malley wrote: >> >> On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote: >> >>> We are developing a project and we are intend to use Hadoop to handle the >>> processing vast amount of data. But to convince our customers about the >>> using of Hadoop in our project, we must show them the advantages ( and maybe >>> ? the disadvantage ) when deploy the project with Hadoop compare to Oracle >>> Database Platform. >> >> The primary advantage of Hadoop is scalability. On an equivalent hardware >> budget, Hadoop can handle much much larger databases. We had a process that >> was run once a week on Oracle that is now run once an hour on Hadoop. >> Additionally, Hadoop scales out much much farther. We can store petabytes of >> data in a single Hadoop cluster and have jobs that read and generate 100's >> of terabytes. > > That said, what a database gives you -on the right hardware- is very fast > responses, especially if the indices are set up right and the data > denormalised when appropriate. There is also really good integration with > tools and application servers, with things like Java EE designed to make > running code against a database easy. > > Not using Oracle means you don't have to work with an Oracle DBA, which, in > my experience, can only be a good thing. DBAs and developers never seem to > see eye-to-eye. > > >> >> Hadoop only has very primitive security at the moment, although I expect >> that to change in the next 6 months. >> > > Right now you need to trust everyone else on the network where you run > hadoop to not be malicious; the filesystem and job tracker interfaces are > insecure. The forthcoming 0.19 release will ask who you are, but the far end > trusts you to be who you say you are. In that respect, it's as secure as NFS > over UDP. > > To secure Hadoop you'd probably need to > -sign every IPC request, with a CPU time cost at both ends. > -require some form of authentication for the HTTP exported parts of the > system, such as digest authentication, or issue lots of HTTPS private keys > and use that instead. Giving everyone a key management problem as well as > extra communications overhead. > > What is easier would be to lock down remote access to the filesystem/job > submission so that only authenticated users would be able to upload jobs and > data. The cluster would continue to trust everything else on its network, > but the system doesn't trust people to submit work unless they could prove > who they were. > >