Recently I spent some time hacking the contrib/ec2 scripts to install
and configure OpenVPN on top of the other installed packages.  Our use
case required that all the slaves running mappers would need to
connect back through to our primary mysql database (firewalled as you
can imagine).  Simultaneously, our webservers had to be able to
connect to Hbase running atop the same Hadoop cluster via Thrift.

The scheme I eventually settled on was to have a server cert/key and a
"client" cert/key which would be shared across all the clients - then
make the master node the OpenVPN server, and have all the slave nodes
connect as clients.  Then, if any other box needed access to the
cluster (like our firewalled database and webservers), they'd connect
to the master hadoop node, whose EC2 group had UDP 1194 open to
0.0.0.0.  Such a client could then address any hadoop nodes by their
tunneled vpn IP (10.8.0.x), derived from their AMI instance start ID.

I almost had it all working - the only piece which was giving me
trouble was actually making the slaves connect back to the master at
instance boot time.  I could have figured it out, but got pulled off
because we decided to move away from ec2 for the time being :/

-- Jim R. Wilson (jimbojw)

On Wed, May 28, 2008 at 4:23 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> That doesn't work because the various web pages have links or redirects to
> other pages on other machines.
>
> Also, you would need to ssh to ALL of your cluster to get the file browser
> to work.
>
> Better to do the proxy thing.
>
>
> On 5/28/08 2:16 PM, "Chris Anderson" <[EMAIL PROTECTED]> wrote:
>
>> Andreas,
>>
>> If you can ssh into the nodes, you can always set up port-forwarding
>> with ssh -L to bring those ports to your local machine.
>>
>> On Wed, May 28, 2008 at 1:51 PM, Andreas Kostyrka <[EMAIL PROTECTED]>
>> wrote:
>>> What I wonder is what ports do I need to access?
>>>
>>> 50060 on all nodes.
>>> 50030 on the jobtracker.
>>>
>>> Any other ports?
>>>
>>> Andreas
>>>
>>> Am Mittwoch, den 28.05.2008, 13:37 -0700 schrieb Allen Wittenauer:
>>>>
>>>>
>>>> On 5/28/08 1:22 PM, "Andreas Kostyrka" <[EMAIL PROTECTED]> wrote:
>>>>> I just wondered what other people use to access the hadoop webservers,
>>>>> when running on EC2?
>>>>
>>>>     While we don't run on EC2 :), we do protect the hadoop web processes by
>>>> putting a proxy in front of it.  A user connects to the proxy,
>>>> authenticates, and then gets the output from the hadoop process.  All of 
>>>> the
>>>> redirection magic happens via a localhost connection, so no data is leaked
>>>> unprotected.
>>>>
>>>
>>
>>
>
>

Reply via email to