All, Okay, I was being facetious earlier with the 'COOL' comment.
This is a very bad idea. Well, not so much bad, but think about the ramifications of what you are proposing. Putting a 'comm' code lib together that facilitates comms and 'helps' with architecture issues also creates a a SPOF (as another gent pointed out); moreover, it creates a nice target for exploitation as the lib will undoubtedly become a repository of embedded passwords, alternate dummy accounts, bypass routes, and all sorts of goop to make things 'easier'. And since is has to be world readable, and easy to get access to, it will be very tough to protect - or easy to DoS/DDoS. Anything and everything from random timing attacks, substitution spoofs, TOUTOCs, you name it. This whole thing is already a very nice open highway to distribute embedded and tunneled 'items' of a certain unnatural nature, don't try to override what little security you have already by 'punching holes in the firewall' and other silly stuff. Long run, what might be better is a discovery agent that provides continual validation of paths and service availability specific to Hadoop and sub programs. That way any outage or problem can be immediately addressed or brought to the attention of the SysAds/Networkers. Like a service monitoring program. Just don't make it simple for the 'hats out there to own you in under five minutes flat (especially with an rpc or soap call to some lib or flat file - and ssh/ssl abso-lu-tely does not matter, trust me). You can disagree, and I really don't mean to be a 'buzz kill', but if you ask your local 'Sherrif', I think you'll be advised not to pursue this path too heavily. Have a good computational day... Best, Hal > Hadoop has some classes for controlling how sockets are used. See org.apache.hadoop.net.StandardSocketFactory, SocksSocketFactory. > > The socket factory implementation chosen is controlled by the > hadoop.rpc.socket.factory.class.default configuration parameter. You could > probably write your own SocketFactory that gives back socket > implementations > that tee the conversation to another port, or to a file, etc. > > So, "it's possible," but I don't know that anyone's implemented this. I think others may have examined Hadoop's protocols via wireshark or other external tools, but those don't have much insight into Hadoop's internals. > (Neither, for that matter, would the socket factory. You'd probably need to > be pretty clever to introspect as to exactly what type of message is being > sent and actually do semantic analysis, etc.) > > Allen's suggestion is probably more "correct," but might incur additional > work on your part. > > Cheers, > - Aaron > > On Thu, Jun 10, 2010 at 3:54 PM, Allen Wittenauer > <[email protected]>wrote: > >> On Jun 10, 2010, at 3:25 AM, Ahmad Shahzad wrote: >> > Reason for doing that is that i want all the communication to happen >> through >> > a communication library that resolves every communication problem that >> we >> > can have e.g firewalls, NAT, non routed paths, multi homing etc etc. >> By >> > using that library all the headache of communication will be gone. So, >> we >> > will be able to use hadoop quite easily and there will be no >> communication >> > problems. >> I know Owen pointed you towards using proxies, but anything remotely complex would probably be better in an interposer library, as then it is >> application agnostic. >
