[ 
https://issues.apache.org/jira/browse/CASSANDRA-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224527#comment-14224527
 ] 

Olivier Michallat commented on CASSANDRA-7431:
----------------------------------------------

I'm running into this issue with the Java driver as well (context: if both 
client and C* are deployed on EC2, the client should use private addresses for 
C* nodes in the same region, and public addresses for C* nodes in another 
region -- EC2's DNS resolves the "right" address automatically if you lookup 
the node's public hostname; in order to get this public hostname, we do a 
reverse DNS lookup on the IP exposed in {{system.peers.rpc_address}}).

When run from an EC2 instance, a reverse lookup with 
{{InetAddress.getHostName(publicIp)}} works correctly for instances _in another 
EC2 region_, but fails for instances in the same region. As mentioned by Paulo, 
it returns the unresolved IP, whereas command-line tools like {{host}} or 
{{dig}} correctly resolve to the public hostname. I have no explanation as to 
why it fails with Java, but it appears to be a JDK bug.

The lookup via JNDI (as done in Paulo's patch) works, but the fact that we 
initialize the factory with {{com.sun.jndi.dns.DnsContextFactory}} makes me 
wonder if this is portable to other JDK implementations. Another approach is to 
use [dnsjava|http://www.xbill.org/dnsjava/] (that's what they did in Whirr).

> Hadoop integration does not perform reverse DNS lookup correctly on EC2
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-7431
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7431
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>         Attachments: 2.0-CASSANDRA-7431.txt
>
>
> The split assignment on AbstractColumnFamilyInputFormat:247 peforms a reverse 
> DNS lookup of Cassandra IPs in order to preserve locality in Hadoop (task 
> trackers are identified by hostnames).
> However, the reverse lookup of an EC2 IP does not yield the EC2 hostname of 
> that endpoint when running from an EC2 instance due to the use of 
> InetAddress.getHostname().
> In order to show this, consider the following piece of code:
> {code:title=DnsResolver.java|borderStyle=solid}
> public class DnsResolver {
>     public static void main(String[] args) throws Exception {
>         InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>         System.out.println("getHostAddress: " + 
> namenodePublicAddress.getHostAddress());
>         System.out.println("getHostName: " + 
> namenodePublicAddress.getHostName());
>     }
> }
> {code}
> When this code is run from my machine to perform reverse lookup of an EC2 IP, 
> the output is:
> {code:none}
> ➜  java DnsResolver 54.201.254.99
> getHostAddress: 54.201.254.99
> getHostName: ec2-54-201-254-99.compute-1.amazonaws.com
> {code}
> When this code is executed from inside an EC2 machine, the output is:
> {code:none}
> ➜  java DnsResolver 54.201.254.99
> getHostAddress: 54.201.254.99
> getHostName: 54.201.254.99
> {code}
> However, when using linux tools such as "host" or "dig", the EC2 hostname is 
> properly resolved from the EC2 instance, so there's some problem with Java's 
> InetAddress.getHostname() and EC2.
> Two consequences of this bug during AbstractColumnFamilyInputFormat split 
> definition are:
> 1) If the Hadoop cluster is configured to use EC2 public DNS, the locality 
> will be lost, because Hadoop will try to match the CFIF split location 
> (public IP) with the task tracker location (public DNS), so no matches will 
> be found.
> 2) If the Cassandra nodes' broadcast_address is set to public IPs, all hadoop 
> communication will be done via the public IP, what will incurr additional 
> transference charges. If the public IP is mapped to the EC2 DNS during split 
> definition, when the task is executed, ColumnFamilyRecordReader will resolve 
> the public DNS to the private IP of the instance, so there will be not 
> additional charges.
> A similar bug was filed in the WHIRR project: 
> https://issues.apache.org/jira/browse/WHIRR-128



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to