Re: Improving locality of table access...

2008-10-22 Thread Billy Pearson

generate a patch and post it here
https://issues.apache.org/jira/browse/HBASE-675

Billy

"Arthur van Hoff" <[EMAIL PROTECTED]> wrote in 
message news:[EMAIL PROTECTED]

Hi,

Below is some code for improving the read performance of large tables by
processing each region on the host holding that region. We measured 50-60%
lower network bandwidth.

To use this class instead of 
org.apache.hadoop.hbase.mapred.TableInputFormat

class use:

   jobconf.setInputFormat(ellerdale.mapreduce.TableInputFormatFix);

Please send me feedback, if you can think off better ways to do this.

--
Arthur van Hoff - Grand Master of Alphabetical Order
The Ellerdale Project, Menlo Park, CA
[EMAIL PROTECTED], 650-283-0842


-- TableInputFormatFix.java --

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// Author: Arthur van Hoff, [EMAIL PROTECTED]

package ellerdale.mapreduce;

import java.io.*;
import java.util.*;

import org.apache.hadoop.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.mapred.*;

import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.mapred.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.*;
import org.apache.hadoop.hbase.util.*;

//
// Attempt to fix the localized nature of table segments.
// Compute table splits so that they are processed locally.
// Combine multiple splits to avoid the number of splits exceeding
numSplits.
// Sort the resulting splits so that the shortest ones are processed last.
// The resulting savings in network bandwidth are significant (we measured
60%).
//
public class TableInputFormatFix extends TableInputFormat
{
   public static final int ORIGINAL= 0;
   public static final int LOCALIZED= 1;
   public static final int OPTIMIZED= 2;// not yet functional

   //
   // A table split with a location.
   //
   static class LocationTableSplit extends TableSplit implements 
Comparable

   {
   String location;

   public LocationTableSplit()
   {
   }
   public LocationTableSplit(byte [] tableName, byte [] startRow, byte []
endRow, String location)
   {
   super(tableName, startRow, endRow);
   this.location = location;
   }
   public String[] getLocations()
   {
   return new String[] {location};
   }
   public void readFields(DataInput in) throws IOException
   {
   super.readFields(in);
   this.location = Bytes.toString(Bytes.readByteArray(in));
   }
   public void write(DataOutput out) throws IOException
   {
   super.write(out);
   Bytes.writeByteArray(out, Bytes.toBytes(location));
   }
   public int compareTo(Object other)
   {
   LocationTableSplit otherSplit = (LocationTableSplit)other;
   int result = Bytes.compareTo(getStartRow(),
otherSplit.getStartRow());
   return result;
   }
   public String toString()
   {
   return location.substring(0, location.indexOf('.')) + ": " +
Bytes.toString(getStartRow()) + "-" + Bytes.toString(getEndRow());
   }
   }

   //
   // A table split with a location that covers multiple regions.
   //
   static class MultiRegionTableSplit extends LocationTableSplit
   {
   byte[][] regions;

   public MultiRegionTableSplit()
   {
   }
   public MultiRegionTableSplit(byte[] tableName, String location, 
byte[][]

regions) throws IOException
   {
   super(tableName, regions[0], regions[regions.length-1], location);
   this.location = location;
   this.regions = regions;
   }
   public void readFields(DataInput in) throws IOException
   {
   super.readFields(in);
   int n = in.readInt();
   regions = new byte[n][];
   for (int i = 0 ; i < n ; i++) {
   regions[i] = Bytes.readByteArray(in);
   }
   }
   public void write(DataOutput out) throws IOException
   {
   super.write(out);
   out.writeInt(regions.length);
   for (int i = 0 ; i < regions.length ; i++) {
   Bytes.writeByteArray(out, regions[i]);
   }
   }
   public String toString()
   {
   String str = location.substring(0, location.indexOf('.')) + ": ";
   for (int i = 0 ; i < regions.length ; i += 2) {
   if (i > 0) {
   str += ", ";
   }
   str += Bytes.toString(regi

Is there a way to know the input filename at Hadoop Streaming?

2008-10-22 Thread Steve Gao
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer

In mapper:
while (){
  //how to tell the current line is from file1 or file2?
}




  

Re: distcp port for 0.17.2

2008-10-22 Thread bzheng

Thanks.  The fs.default.name is "file:///" and dfs.http.address is
"0.0.0.0:50070".  I tried:

hadoop dfs -ls /path/file to make sure file exists on cluster1
hadoop distcp file:///cluster1_master_node_ip:50070/path/file
file:///cluster2_master_node_ip:50070/path/file

It gives this error message:
08/10/22 15:43:47 INFO util.CopyFiles:
srcPaths=[file:/cluster1_master_node_ip:50070/path/file]
08/10/22 15:43:47 INFO util.CopyFiles:
destPath=file:/cluster2_master_node_ip:50070/path/file
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source
file:/cluster1_master_node_ip:50070/path/file does not exist.
at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:578)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:594)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)


If I use hdfs:// instead of file:///, I get:
Copy failed: java.net.SocketTimeoutException: timed out waiting for rpc
response
at org.apache.hadoop.ipc.Client.call(Client.java:559)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
at
org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102)
at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:178)
at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1280)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1291)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:203)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:572)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:594)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)



s29752-hadoopuser wrote:
> 
> Hi,
> 
> There is no such thing called distcp port.  distcp uses (generic) file
> system API and so it does not care about the file system implementation
> details like port number.
> 
> It is common to use distcp with HDFS or HFTP.  The urls will look like
> hdfs://namenode:port/path and hftp://namenode:port/path for HDFS and HFTP,
> respectively.   The HDFS and HFTP ports are specified by fs.default.name
> and dfs.http.address, respectively.
> 
> Nicholas Sze
> 
> 
> 
> 
> - Original Message 
>> From: bzheng <[EMAIL PROTECTED]>
>> To: core-user@hadoop.apache.org
>> Sent: Wednesday, October 22, 2008 11:57:43 AM
>> Subject: distcp port for 0.17.2
>> 
>> 
>> What's the port number for distcp in 0.17.2?  I can't find any
>> documentation
>> on distcp for version 0.17.2.  For version 0.18, the documentation says
>> it's
>> 8020.  
>> 
>> I'm using a standard install and the only open ports associated with
>> hadoop
>> are 50030, 50070, and 50090.  None of them work with distcp.  So, how do
>> you
>> use distcp in 0.17.2?  are there any extra setup/configuration needed?  
>> 
>> Thanks in advance for your help.
>> -- 
>> View this message in context: 
>> http://www.nabble.com/distcp-port-for-0.17.2-tp20117463p20117463.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/distcp-port-for-0.17.2-tp20117463p20121246.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: Improving locality of table access...

2008-10-22 Thread Jim Kellerman (POWERSET)
In the future, you should send HBase questions to the HBase user
mailing list: [EMAIL PROTECTED] if you want to get
a more timely response. HBase development is disconnected from
Hadoop development for the most part.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -Original Message-
> From: Arthur van Hoff [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 22, 2008 3:14 PM
> To: core-user@hadoop.apache.org
> Subject: Improving locality of table access...
>
> Hi,
>
> Below is some code for improving the read performance of large tables by
> processing each region on the host holding that region. We measured 50-60%
> lower network bandwidth.
>
> To use this class instead of
> org.apache.hadoop.hbase.mapred.TableInputFormat
> class use:
>
> jobconf.setInputFormat(ellerdale.mapreduce.TableInputFormatFix);
>
> Please send me feedback, if you can think off better ways to do this.
>
> --
> Arthur van Hoff - Grand Master of Alphabetical Order
> The Ellerdale Project, Menlo Park, CA
> [EMAIL PROTECTED], 650-283-0842
>
>
> -- TableInputFormatFix.java --
>
> /**
>  * Licensed to the Apache Software Foundation (ASF) under one
>  * or more contributor license agreements.  See the NOTICE file
>  * distributed with this work for additional information
>  * regarding copyright ownership.  The ASF licenses this file
>  * to you under the Apache License, Version 2.0 (the
>  * "License"); you may not use this file except in compliance
>  * with the License.  You may obtain a copy of the License at
>  *
>  * http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> // Author: Arthur van Hoff, [EMAIL PROTECTED]
>
> package ellerdale.mapreduce;
>
> import java.io.*;
> import java.util.*;
>
> import org.apache.hadoop.io.*;
> import org.apache.hadoop.fs.*;
> import org.apache.hadoop.util.*;
> import org.apache.hadoop.conf.*;
> import org.apache.hadoop.mapred.*;
>
> import org.apache.hadoop.hbase.*;
> import org.apache.hadoop.hbase.mapred.*;
> import org.apache.hadoop.hbase.client.*;
> import org.apache.hadoop.hbase.client.Scanner;
> import org.apache.hadoop.hbase.io.*;
> import org.apache.hadoop.hbase.util.*;
>
> //
> // Attempt to fix the localized nature of table segments.
> // Compute table splits so that they are processed locally.
> // Combine multiple splits to avoid the number of splits exceeding
> numSplits.
> // Sort the resulting splits so that the shortest ones are processed last.
> // The resulting savings in network bandwidth are significant (we measured
> 60%).
> //
> public class TableInputFormatFix extends TableInputFormat
> {
> public static final int ORIGINAL= 0;
> public static final int LOCALIZED= 1;
> public static final int OPTIMIZED= 2;// not yet functional
>
> //
> // A table split with a location.
> //
> static class LocationTableSplit extends TableSplit implements
> Comparable
> {
> String location;
>
> public LocationTableSplit()
> {
> }
> public LocationTableSplit(byte [] tableName, byte [] startRow, byte []
> endRow, String location)
> {
> super(tableName, startRow, endRow);
> this.location = location;
> }
> public String[] getLocations()
> {
> return new String[] {location};
> }
> public void readFields(DataInput in) throws IOException
> {
> super.readFields(in);
> this.location = Bytes.toString(Bytes.readByteArray(in));
> }
> public void write(DataOutput out) throws IOException
> {
> super.write(out);
> Bytes.writeByteArray(out, Bytes.toBytes(location));
> }
> public int compareTo(Object other)
> {
> LocationTableSplit otherSplit = (LocationTableSplit)other;
> int result = Bytes.compareTo(getStartRow(),
> otherSplit.getStartRow());
> return result;
> }
> public String toString()
> {
> return location.substring(0, location.indexOf('.')) + ": " +
> Bytes.toString(getStartRow()) + "-" + Bytes.toString(getEndRow());
> }
> }
>
> //
> // A table split with a location that covers multiple regions.
> //
> static class MultiRegionTableSplit extends LocationTableSplit
> {
> byte[][] regions;
>
> public MultiRegionTableSplit()
> {
> }
> public MultiRegionTableSplit(byte[] tableName, String location,
> byte[][]
> regions) throws IOException
> {
> super(tableName, regions[0], regions[regions.length-1], location);
> this.location = location;
> this.regions = regions;
> }
> public void readFields(DataInput in) throws IOException
> {
> super

Improving locality of table access...

2008-10-22 Thread Arthur van Hoff
Hi,

Below is some code for improving the read performance of large tables by
processing each region on the host holding that region. We measured 50-60%
lower network bandwidth.

To use this class instead of org.apache.hadoop.hbase.mapred.TableInputFormat
class use:

jobconf.setInputFormat(ellerdale.mapreduce.TableInputFormatFix);

Please send me feedback, if you can think off better ways to do this.

-- 
Arthur van Hoff - Grand Master of Alphabetical Order
The Ellerdale Project, Menlo Park, CA
[EMAIL PROTECTED], 650-283-0842


-- TableInputFormatFix.java --

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
// Author: Arthur van Hoff, [EMAIL PROTECTED]

package ellerdale.mapreduce;

import java.io.*;
import java.util.*;

import org.apache.hadoop.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.mapred.*;

import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.mapred.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.*;
import org.apache.hadoop.hbase.util.*;

//
// Attempt to fix the localized nature of table segments.
// Compute table splits so that they are processed locally.
// Combine multiple splits to avoid the number of splits exceeding
numSplits.
// Sort the resulting splits so that the shortest ones are processed last.
// The resulting savings in network bandwidth are significant (we measured
60%).
//
public class TableInputFormatFix extends TableInputFormat
{
public static final int ORIGINAL= 0;
public static final int LOCALIZED= 1;
public static final int OPTIMIZED= 2;// not yet functional

//
// A table split with a location.
//
static class LocationTableSplit extends TableSplit implements Comparable
{
String location;

public LocationTableSplit()
{
}
public LocationTableSplit(byte [] tableName, byte [] startRow, byte []
endRow, String location)
{
super(tableName, startRow, endRow);
this.location = location;
}
public String[] getLocations()
{
return new String[] {location};
}
public void readFields(DataInput in) throws IOException
{
super.readFields(in);
this.location = Bytes.toString(Bytes.readByteArray(in));
}
public void write(DataOutput out) throws IOException
{
super.write(out);
Bytes.writeByteArray(out, Bytes.toBytes(location));
}
public int compareTo(Object other)
{
LocationTableSplit otherSplit = (LocationTableSplit)other;
int result = Bytes.compareTo(getStartRow(),
otherSplit.getStartRow());
return result;
}
public String toString()
{
return location.substring(0, location.indexOf('.')) + ": " +
Bytes.toString(getStartRow()) + "-" + Bytes.toString(getEndRow());
}
}

//
// A table split with a location that covers multiple regions.
//
static class MultiRegionTableSplit extends LocationTableSplit
{
byte[][] regions;

public MultiRegionTableSplit()
{
}
public MultiRegionTableSplit(byte[] tableName, String location, byte[][]
regions) throws IOException
{
super(tableName, regions[0], regions[regions.length-1], location);
this.location = location;
this.regions = regions;
}
public void readFields(DataInput in) throws IOException
{
super.readFields(in);
int n = in.readInt();
regions = new byte[n][];
for (int i = 0 ; i < n ; i++) {
regions[i] = Bytes.readByteArray(in);
}
}
public void write(DataOutput out) throws IOException
{
super.write(out);
out.writeInt(regions.length);
for (int i = 0 ; i < regions.length ; i++) {
Bytes.writeByteArray(out, regions[i]);
}
}
public String toString()
{
String str = location.substring(0, location.indexOf('.')) + ": ";
for (int i = 0 ; i < regions.length ; i += 2) {
if (i > 0) {
str += ", ";
}
str += Bytes.toString(regions[i]) + "-" +
Bytes.toString(regions[i+1]);
}
return str;
 

Re: Passing Constants from One Job to the Next

2008-10-22 Thread Yih Sun Khoo
Are you saying that I can pass, say, a single integer constant with either
of these three: JobConf? A HDFS file? DistributedCache?
Or are you asking if I can pass given the context of: JobConf? A HDFS file?
DistributedCache?
I'm thinking of how to pass a single int so from one Jobconf to the next

On Wed, Oct 22, 2008 at 2:57 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

>
> On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote:
>
>  I like to hear some good ways of passing constants from one job to the
>> next.
>>
>
> Unless I'm missing something: JobConf? A HDFS file? DistributedCache?
>
> Arun
>
>
>
>> These are some ways that I can think of:
>> 1)  The obvious solution is to carry the constant as part of your value
>> from
>> one job to the next, but that would mean every value would hold that
>> constant
>> 2)  Use the reporter as a hack so that you can set the status message and
>> then get the status message back when u need the constant
>>
>> Any other ideas?  (Also please do not include code)
>>
>
>


Re: Passing Constants from One Job to the Next

2008-10-22 Thread Arun C Murthy


On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote:

I like to hear some good ways of passing constants from one job to  
the next.


Unless I'm missing something: JobConf? A HDFS file? DistributedCache?

Arun



These are some ways that I can think of:
1)  The obvious solution is to carry the constant as part of your  
value from

one job to the next, but that would mean every value would hold that
constant
2)  Use the reporter as a hack so that you can set the status  
message and

then get the status message back when u need the constant

Any other ideas?  (Also please do not include code)




Passing Constants from One Job to the Next

2008-10-22 Thread Yih Sun Khoo
I like to hear some good ways of passing constants from one job to the next.
These are some ways that I can think of:
1)  The obvious solution is to carry the constant as part of your value from
one job to the next, but that would mean every value would hold that
constant
2)  Use the reporter as a hack so that you can set the status message and
then get the status message back when u need the constant

Any other ideas?  (Also please do not include code)


Re: distcp port for 0.17.2

2008-10-22 Thread Tsz Wo (Nicholas), Sze
Hi,

There is no such thing called distcp port.  distcp uses (generic) file system 
API and so it does not care about the file system implementation details like 
port number.

It is common to use distcp with HDFS or HFTP.  The urls will look like 
hdfs://namenode:port/path and hftp://namenode:port/path for HDFS and HFTP, 
respectively.   The HDFS and HFTP ports are specified by fs.default.name and 
dfs.http.address, respectively.

Nicholas Sze




- Original Message 
> From: bzheng <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Wednesday, October 22, 2008 11:57:43 AM
> Subject: distcp port for 0.17.2
> 
> 
> What's the port number for distcp in 0.17.2?  I can't find any documentation
> on distcp for version 0.17.2.  For version 0.18, the documentation says it's
> 8020.  
> 
> I'm using a standard install and the only open ports associated with hadoop
> are 50030, 50070, and 50090.  None of them work with distcp.  So, how do you
> use distcp in 0.17.2?  are there any extra setup/configuration needed?  
> 
> Thanks in advance for your help.
> -- 
> View this message in context: 
> http://www.nabble.com/distcp-port-for-0.17.2-tp20117463p20117463.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.



distcp port for 0.17.2

2008-10-22 Thread bzheng

What's the port number for distcp in 0.17.2?  I can't find any documentation
on distcp for version 0.17.2.  For version 0.18, the documentation says it's
8020.  

I'm using a standard install and the only open ports associated with hadoop
are 50030, 50070, and 50090.  None of them work with distcp.  So, how do you
use distcp in 0.17.2?  are there any extra setup/configuration needed?  

Thanks in advance for your help.
-- 
View this message in context: 
http://www.nabble.com/distcp-port-for-0.17.2-tp20117463p20117463.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Is it possible to change parameters using org.apache.hadoop.conf.Configuration API?

2008-10-22 Thread Alex Loddengaard
Just to be clear, you want to persist a configuration change to your entire
cluster without bringing it down, and you're hoping to use the Configuration
API to do so.  Did I get your question correct?

I don't know of a way to do this without restarting the cluster, because I'm
pretty sure Configuration changes will only affect the current job.  Does
anyone else have a suggestion?

Alex

On Wed, Oct 22, 2008 at 12:44 AM, Jinyeon Lee <[EMAIL PROTECTED]> wrote:

> I have a running Hadoop/HBase cluster.
> When I want to change hadoop parameters without stopping the cluster, can I
> use org.apache.hadoop.conf.Configuration API?
>
> I wrote following java source, but it didn't do anything.
> -
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> public class ChangeConf
> {
>public static void main(String[] args) throws Exception
>{
>Configuration conf = new Configuration();
>conf.addResource(new
> Path("/hadoop-0.18.1/conf/hadoop-default.xml"));
>conf.addResource(new Path("/hadoop-0.18.1/conf/hadoop-site.xml"));
>conf.setStrings("dfs.data.dir",
> "/hadoop-data/data.dir","/data4/hadoop-data");
>return;
>}
> }
> 
>
> Actually, I know I can use "decommission" to add more directory without
> stopping whole cluster.
> But I'm very confused "org.apache.hadoop.conf.Configuration" API.
>
> Could anyone please let me know clearly?
>


Task Random Fail

2008-10-22 Thread Zhou, Yunqing
Recently the tasks on our cluster random failed (both map tasks and reduce
tasks) . When rerun them, they are all ok.
The whole job is a IO-bound job. (250G input and 500G output(map) and
10G(final))
from the jobtracker, I can see the failed job says:
   task_200810220830_0004_m_000653_0
 
tip_200810220830_0004_m_000653
 vidi-005 
 FAILED
 java.io.IOException: Task process exit with nonzero status of 65. at
org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:479) at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:391)
 Last 
4KB
Last 
8KB
All 
and the log says (follow the link in the right-most column):

 Task Logs: 'task_200810220830_0004_m_000653_0'

*stdout logs*

--


*stderr logs*

--


*syslog logs*

2008-10-22 19:59:51,640 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-10-22 19:59:59,507 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 26
2008-10-22 20:12:25,968 INFO org.apache.hadoop.mapred.TaskRunner:
Communication exception: java.net.SocketTimeoutException: timed out
waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:559)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.mapred.$Proxy0.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$1.run(Task.java:316)
at java.lang.Thread.run(Thread.java:619)

2008-10-22 20:13:29,015 INFO org.apache.hadoop.mapred.TaskRunner:
Communication exception: java.net.SocketTimeoutException: timed out
waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:559)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.mapred.$Proxy0.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$1.run(Task.java:316)
at java.lang.Thread.run(Thread.java:619)

2008-10-22 20:14:32,030 INFO org.apache.hadoop.mapred.TaskRunner:
Communication exception: java.net.SocketTimeoutException: timed out
waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:559)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.mapred.$Proxy0.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$1.run(Task.java:316)
at java.lang.Thread.run(Thread.java:619)

2008-10-22 20:14:32,781 INFO org.apache.hadoop.mapred.TaskRunner:
Process Thread Dump: Communication exception
9 active threads
Thread 13 (Comm thread for task_200810220830_0004_m_000653_0):
  State: RUNNABLE
  Blocked count: 2
  Waited count: 430
  Stack:
sun.management.ThreadImpl.getThreadInfo0(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:114)

org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:168)
org.apache.hadoop.mapred.Task$1.run(Task.java:338)
java.lang.Thread.run(Thread.java:619)
Thread 12 ([EMAIL PROTECTED]):
  State: TIMED_WAITING
  Blocked count: 0
  Waited count: 872
  Stack:
java.lang.Thread.sleep(Native Method)
org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:763)
java.lang.Thread.run(Thread.java:619)
Thread 11 (IPC Client connection to hadoop5/192.168.4.105:9000):
  State: WAITING
  Blocked count: 0
  Waited count: 2
  Waiting on [EMAIL PROTECTED]
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:247)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:286)
Thread 9 (IPC Client connection to /127.0.0.1:49078):
  State: RUNNABLE
  Blocked count: 5
  Waited count: 214
  Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)

org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:237)
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:149)
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:122)
java.io.FilterInputStream.read(FilterInputStream.java:116)
org.apache.hadoop.ipc.Clien

Re: Still find this problem! -_-!!!

2008-10-22 Thread David Wei

Steve Loughran 写道:

David Wei wrote:
  

Steve Loughran ??:


David Wei wrote:

  

Error initializing attempt_200810220716_0004_m_01_0:
java.lang.IllegalArgumentException: Wrong FS:
hdfs://192.168.52.129:9000/tmp/hadoop-root/mapred/system/job_200810220716_0004/job.xml,

 expected: hdfs://datacenter5:9000 at
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

And if I configure the masters/slaves with hostname, datanode will
not be able to connect to the master!

No one knows how to solve this?




that implies DNS problems. what does

nslookup datacenter5


return on the command line of all the slaves?



  

Steve,




  

the output is:
Server: 192.168.18.10
Address: 192.168.18.10#53

Non-authoritative answer:
Name: datacenter5.papa
Address: 61.140.3.66

And none of them is the address of datacenter5(192.168.52.129). How can
I solve this problem?



well, your datacenter5 hostname may be the public IP address; the
192.168. addresses are the addresses within the cluster. Because they
are different, and because the namenode thinks it is called datacenter5,
things are breaking

Some options
-bring up your own DNS server and add the entries for the hostnames
-edit /etc/hosts in the different servers (this becomes expensive after
a while, and if one machine gets inconsistent, you will have a hard time
tracking down the problem)
-somehow get the namenode to come up with a filesystem called
hdfs://192.168.52.129:9000/. That should be something you can do in the
site fs.default.name option.



  

I had ready try the last 2 options and found that:

In 2nd option, the SAME problem. I edit all the nodes's hosts file and 
append the ip/hostname mapping inside it, so that all the nodes can 
access each other with hostname(can ping and ssh to each other)


In 3rd option, I remove all the setting inside hosts file and find that 
running reduce is almost impossible. We always got this kind of error:


java.lang.NullPointerException
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1777)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:254)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
---
Task attempt_200810220833_0001_r_00_0 failed to report status for 601 
seconds. Killing!

I had googled for set up of hadoop and heard that if setting totally based on IP, doing Reduce will meet this kind of problem. 
Is this a bug of hadoop?


BTW, datacenter5 is the only master in my cluster

thx




Re: Still find this problem! -_-!!!

2008-10-22 Thread Steve Loughran

David Wei wrote:

Steve Loughran ??:

David Wei wrote:
 

Error initializing attempt_200810220716_0004_m_01_0:
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.52.129:9000/tmp/hadoop-root/mapred/system/job_200810220716_0004/job.xml, 

 expected: hdfs://datacenter5:9000 at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)


And if I configure the masters/slaves with hostname, datanode will 
not be able to connect to the master!


No one knows how to solve this?




that implies DNS problems. what does

nslookup datacenter5


return on the command line of all the slaves?


  

Steve,




the output is:
Server: 192.168.18.10
Address: 192.168.18.10#53

Non-authoritative answer:
Name: datacenter5.papa
Address: 61.140.3.66

And none of them is the address of datacenter5(192.168.52.129). How can 
I solve this problem?


well, your datacenter5 hostname may be the public IP address; the 
192.168. addresses are the addresses within the cluster. Because they 
are different, and because the namenode thinks it is called datacenter5, 
things are breaking


Some options
-bring up your own DNS server and add the entries for the hostnames
-edit /etc/hosts in the different servers (this becomes expensive after 
a while, and if one machine gets inconsistent, you will have a hard time 
tracking down the problem)
-somehow get the namenode to come up with a filesystem called 
hdfs://192.168.52.129:9000/. That should be something you can do in the 
site fs.default.name option.




Re: Still find this problem! -_-!!!

2008-10-22 Thread David Wei

Steve Loughran 写道:

David Wei wrote:
  

Error initializing attempt_200810220716_0004_m_01_0:
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.52.129:9000/tmp/hadoop-root/mapred/system/job_200810220716_0004/job.xml,
 expected: hdfs://datacenter5:9000 at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

And if I configure the masters/slaves with hostname, datanode will not be able 
to connect to the master!

No one knows how to solve this?




that implies DNS problems. what does

nslookup datacenter5


return on the command line of all the slaves?


  

Steve,

the output is:
Server: 192.168.18.10
Address: 192.168.18.10#53

Non-authoritative answer:
Name: datacenter5.papa
Address: 61.140.3.66

And none of them is the address of datacenter5(192.168.52.129). How can 
I solve this problem?

BTW, I can ssh to datacenter5(master) without password.

thx!




Re: Still find this problem! -_-!!!

2008-10-22 Thread Steve Loughran

David Wei wrote:

Error initializing attempt_200810220716_0004_m_01_0:
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.52.129:9000/tmp/hadoop-root/mapred/system/job_200810220716_0004/job.xml,
 expected: hdfs://datacenter5:9000 at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

And if I configure the masters/slaves with hostname, datanode will not be able 
to connect to the master!

No one knows how to solve this?



that implies DNS problems. what does

nslookup datacenter5


return on the command line of all the slaves?


Still find this problem! -_-!!!

2008-10-22 Thread David Wei
Error initializing attempt_200810220716_0004_m_01_0:
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.52.129:9000/tmp/hadoop-root/mapred/system/job_200810220716_0004/job.xml,
 expected: hdfs://datacenter5:9000 at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

And if I configure the masters/slaves with hostname, datanode will not be able 
to connect to the master!

No one knows how to solve this?





Re: adding more datanode

2008-10-22 Thread David Wei
I think only when I need this node to be a secondary namenode that I
will do this.


Jinyeon Lee 写道:
> Konstantin is right.
> Anyway, did you add namenode address to file "masters" under conf
> directory?
>




Is it possible to change parameters using org.apache.hadoop.conf.Configuration API?

2008-10-22 Thread Jinyeon Lee
I have a running Hadoop/HBase cluster.
When I want to change hadoop parameters without stopping the cluster, can I
use org.apache.hadoop.conf.Configuration API?

I wrote following java source, but it didn't do anything.
-
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
public class ChangeConf
{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
conf.addResource(new
Path("/hadoop-0.18.1/conf/hadoop-default.xml"));
conf.addResource(new Path("/hadoop-0.18.1/conf/hadoop-site.xml"));
conf.setStrings("dfs.data.dir",
"/hadoop-data/data.dir","/data4/hadoop-data");
return;
}
}


Actually, I know I can use "decommission" to add more directory without
stopping whole cluster.
But I'm very confused "org.apache.hadoop.conf.Configuration" API.

Could anyone please let me know clearly?


Re: adding more datanode

2008-10-22 Thread Jinyeon Lee
Konstantin is right.
Anyway, did you add namenode address to file "masters" under conf directory?