[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783962#action_12783962
 ] 

Dick King commented on MAPREDUCE-1222:
--------------------------------------

After I wrote my comment of 24/Nov/09 07:59 PM , I looked at the Java API 
because I came to wonder whether unescaping and using the Java API could be 
made to work by itself.  I did look for alternatives before I created my big 
regular expression.

The big problem is that Java doesn't really present any API that distinguishes 
numeric IP addresses from symbolic addresses.  Although 
InetAddress.getByName(String) must have some means of parsing an IPV4 and IPV6 
literal numeric address, this functionality is not presented to java.net.* 
users.  InetAddress.getByName(String) will parse either a numeric address or a 
symbolic name and produce indistinguishable results.  That piece of the API 
does not give us a means to distinguish the two.  I was unable to find any 
other API that did make the distinction.

The formats of numeric literal IPV4 and IPV6 internet addresses are fixed in 
RFCs and are extremely unlikely to be changed in the foreseeable future.  We 
are therefore not exposed to any non-future-proofing.  The only exposure we 
have is a possible future IPV8, but the ICANN is doing its best to make that 
unnecessary for a very long time.

Considering that Apache already owns this regular expression we should consider 
using it.

I considered the simpler approach of considering any address that contains a 
colon character to be a numeric IPV6 address, but colons are used as other 
punctuation, ie., separation between IP address and port number.  That solution 
felt to me to be too brittle and accident-prone, and doesn't solve the IPV8 
problem.  There is a continuum of IPV6 solutions ranging from "look for a 
colon" to the correct regular expression you see here, and no principled way to 
decide where to stop.

> [Mumak] We should not include nodes with numeric ips in cluster topology.
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1222
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, 
> mapreduce-1222-20091121.patch
>
>
> Rumen infers cluster topology by parsing input split locations from job 
> history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip 
> or as a host name in job history logs. We should exclude nodes appeared as 
> numeric ips in cluster toplogy when we run mumak until a solution is found so 
> that numeric ips would never appear in input split locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to