[jira] Created: (HADOOP-6876) Path.suffix(...) on a path url with no path results in NullPointerException

2010-07-22 Thread Jordan Sissel (JIRA)
Path.suffix(...) on a path url with no path results in NullPointerException
-

 Key: HADOOP-6876
 URL: https://issues.apache.org/jira/browse/HADOOP-6876
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.20.2
Reporter: Jordan Sissel


I can most briefly demo this using the hbase jruby shell (as it has hadoop in 
the classpath:

{noformat}

hbase(main):001:0 path = org.apache.hadoop.fs.Path.new(hdfs://testing/)
= #Java::OrgApacheHadoopFs::Path:0x6128453c 
@java_object=#Java::JavaObject:0x1ad997f9
hbase(main):002:0 path.suffix(Test)
NativeException: java.lang.NullPointerException: null
{noformat}

I expected instead that path.suffix(Test) would simply generate a new path 
with 'Test' appended. For above, this would be the path hdfs://testing/Test

Workaround: This should work - instead of using path.suffix(), use this (java):
{noformat}
   Path newpath = new Path(oldpath.toString() + mysuffix);
{noformat}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6876) Path.suffix(...) on a path url with no path results in NullPointerException

2010-07-22 Thread Jordan Sissel (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891390#action_12891390
 ] 

Jordan Sissel commented on HADOOP-6876:
---

Updated my own code and confirmed that my suggested workaround does work.

 Path.suffix(...) on a path url with no path results in NullPointerException
 -

 Key: HADOOP-6876
 URL: https://issues.apache.org/jira/browse/HADOOP-6876
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.20.2
Reporter: Jordan Sissel

 I can most briefly demo this using the hbase jruby shell (as it has hadoop in 
 the classpath:
 {noformat}
 hbase(main):001:0 path = org.apache.hadoop.fs.Path.new(hdfs://testing/)
 = #Java::OrgApacheHadoopFs::Path:0x6128453c 
 @java_object=#Java::JavaObject:0x1ad997f9
 hbase(main):002:0 path.suffix(Test)
 NativeException: java.lang.NullPointerException: null
 {noformat}
 I expected instead that path.suffix(Test) would simply generate a new path 
 with 'Test' appended. For above, this would be the path hdfs://testing/Test
 Workaround: This should work - instead of using path.suffix(), use this 
 (java):
 {noformat}
Path newpath = new Path(oldpath.toString() + mysuffix);
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6867) Using socket address for datanode registry breaks multihoming

2010-07-19 Thread Jordan Sissel (JIRA)
Using socket address for datanode registry breaks multihoming
-

 Key: HADOOP-6867
 URL: https://issues.apache.org/jira/browse/HADOOP-6867
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.20.2
 Environment: hadoop-0.20-0.20.2+228-1, centos 5, distcp
Reporter: Jordan Sissel


Related: 
* https://issues.apache.org/jira/browse/HADOOP-985
* https://issues.apache.org/jira/secure/attachment/12350813/HADOOP-985-1.patch
* http://old.nabble.com/public-IP-for-datanode-on-EC2-td19336240.html
* 
http://www.cloudera.com/blog/2008/12/securing-a-hadoop-cluster-through-a-gateway/
 

Datanodes register using their dns name (even configurable with 
dfs.datanode.dns.interface). However, the Namenode only really uses the source 
address that the registration came from when sharing it to clients wanting to 
write to HDFS.

Specific environment that causes this problem:
* Datanode and Namenode multihomed on two networks.
* Datanode registers to namenode using dns name on network #1
* Client (distcp) connects to namenode on network #2 (*) and is told to write 
to datanodes on network #1, which doesn't work for us.

(*) Allowing contact to the namenode on multiple networks was achieved with a 
socat proxy hack that tunnels network#2 to network#1 port 8020. This is 
unrelated to the issue at hand.


The cloudera link above recommends proxying for other reasons than multihoming, 
but it would work, but it doesn't sound like it would well (bandwidth, 
multiplicity, multitenant, etc).

Our specific scenario is wanting to distcp over a different network interface 
than the datanodes register themselves on, but it would be nice if both (all) 
interfaces worked. We are internally going to patch hadoop to roll back parts 
of the patch mentioned above so that we rely the datanode name rather than the 
socket address it uses to talk to the namenode. The alternate option is to push 
config changes to all nodes that force them to listen/register on one specific 
interface only. This helps us work around our specific problem, but doesn't 
really help with multihoming. 

I would propose that datanodes register all interface addresses during the 
registration/heartbeat/whatever process does this and hdfs clients would be 
given all addresses for a specific node to perform operations against and they 
could select accordingly (or 'whichever worked first') just like round-robin 
dns does.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.