Hadoop has some classes for controlling how sockets are used. See
org.apache.hadoop.net.StandardSocketFactory, SocksSocketFactory.

The socket factory implementation chosen is controlled by the
hadoop.rpc.socket.factory.class.default configuration parameter. You could
probably write your own SocketFactory that gives back socket implementations
that tee the conversation to another port, or to a file, etc.

So, "it's possible," but I don't know that anyone's implemented this. I
think others may have examined Hadoop's protocols via wireshark or other
external tools, but those don't have much insight into Hadoop's internals.
(Neither, for that matter, would the socket factory. You'd probably need to
be pretty clever to introspect as to exactly what type of message is being
sent and actually do semantic analysis, etc.)

Allen's suggestion is probably more "correct," but might incur additional
work on your part.

Cheers,
- Aaron

On Thu, Jun 10, 2010 at 3:54 PM, Allen Wittenauer
<awittena...@linkedin.com>wrote:

>
> On Jun 10, 2010, at 3:25 AM, Ahmad Shahzad wrote:
> > Reason for doing that is that i want all the communication to happen
> through
> > a communication library that resolves every communication problem that we
> > can have e.g firewalls, NAT, non routed paths, multi homing etc etc. By
> > using that library all the headache of communication will be gone. So, we
> > will be able to use hadoop quite easily and there will be no
> communication
> > problems.
>
> I know Owen pointed you towards using proxies, but anything remotely
> complex would probably be better in an interposer library, as then it is
> application agnostic.

Reply via email to