Re: What is the class that launches the reducers?

2016-08-26 Thread xeon Mailinglist
Right now the map and reduce task produces digests of the output. This
logic is inside the map and reduce functions. I need to pause the execution
when all maps finish because there will be an external program that is
synchronizing several mapreduce runtimes. When all map tasks finish from
the several jobs, the map output will be verified. Then, this external
program will resume the execution.

I really want to create a knob in mapreduce by modifying the source code,
because with this knob I can exclude the identity maps execution and boost
the performance. I think the devs should create this feature.

Anyway, I am looking in the source code for the part where reduce tasks are
set to launch. Does anyone know which class launches the reduce tasks in
mapreduce v2?

On Aug 26, 2016 02:07, "Daniel Templeton" <dan...@cloudera.com> wrote:

> How are you intending to verify the map output?  It's only partially
> dumped to disk.  None of the intermediate data goes into HDFS.
>
> Daniel
>
> On Aug 25, 2016 4:10 PM, "xeon Mailinglist" <xeonmailingl...@gmail.com>
> wrote:
>
>> But then I need to set identity maps to run the reducers. If I suspend a
>> job after the maps finish, I don't need to set identity maps up. I want to
>> suspend a job so that I don't run identity maps and get better
>> performance.
>>
>> On Aug 25, 2016 10:12 PM, "Haibo Chen" <haiboc...@cloudera.com> wrote:
>>
>> One thing you can try is to write a map-only job first and then verify the
>> map out.
>>
>> On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <
>> xeonmailingl...@gmail.com
>> > wrote:
>>
>> > I am using Mapreduce v2.
>> >
>> > On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <xeonmailingl...@gmail.com>
>> > wrote:
>> >
>> > > I am trying to implement a mechanism in MapReduce v2 that allows to
>> > > suspend and resume a job. I must suspend a job when all the mappers
>> > finish,
>> > > and resume the job from that point after some time. I do this,
>> because I
>> > > want to verify the integrity of the map output before executing the
>> > > reducers.
>> > >
>> > > I am looking for the class that tells when the Reduce tasks should
>> start.
>> > > Does anyone know where is this?
>> > >
>> >
>>
>


Re: What is the class that launches the reducers?

2016-08-25 Thread xeon Mailinglist
But then I need to set identity maps to run the reducers. If I suspend a
job after the maps finish, I don't need to set identity maps up. I want to
suspend a job so that I don't run identity maps and get better performance.

On Aug 25, 2016 10:12 PM, "Haibo Chen" <haiboc...@cloudera.com> wrote:

One thing you can try is to write a map-only job first and then verify the
map out.

On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <xeonmailingl...@gmail.com
> wrote:

> I am using Mapreduce v2.
>
> On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <xeonmailingl...@gmail.com>
> wrote:
>
> > I am trying to implement a mechanism in MapReduce v2 that allows to
> > suspend and resume a job. I must suspend a job when all the mappers
> finish,
> > and resume the job from that point after some time. I do this, because I
> > want to verify the integrity of the map output before executing the
> > reducers.
> >
> > I am looking for the class that tells when the Reduce tasks should start.
> > Does anyone know where is this?
> >
>


Re: What is the class that launches the reducers?

2016-08-25 Thread xeon Mailinglist
I am using Mapreduce v2.

On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <xeonmailingl...@gmail.com>
wrote:

> I am trying to implement a mechanism in MapReduce v2 that allows to
> suspend and resume a job. I must suspend a job when all the mappers finish,
> and resume the job from that point after some time. I do this, because I
> want to verify the integrity of the map output before executing the
> reducers.
>
> I am looking for the class that tells when the Reduce tasks should start.
> Does anyone know where is this?
>


What is the class that launches the reducers?

2016-08-25 Thread xeon Mailinglist
I am trying to implement a mechanism in MapReduce v2 that allows to suspend
and resume a job. I must suspend a job when all the mappers finish, and
resume the job from that point after some time. I do this, because I want
to verify the integrity of the map output before executing the reducers.

I am looking for the class that tells when the Reduce tasks should start.
Does anyone know where is this?


Fwd: Submit, suspend and resume a mapreduce job execution

2016-08-21 Thread xeon Mailinglist
I know that it is not possible to suspend and resume mapreduce job, but I
really need to find a workaround. I have looked to the ChainedJobs and to
the CapacityScheduler, but I am really clueless on what to do.

The main goal was to suspend a job when the map tasks finish and the reduce
tasks start. I know that this is not possible, so I have created to jobs.
One that execute all the map tasks (Job 1), and another job that execute
all the reduce tasks (Job 2). Since I can't start a job with just running
reduce tasks, it was necessary to add an identity mapper before running the
reducers. So in the end, I have Job 1 that just executes all map tasks, and
job 2 that executes the identity mappers and the reduce tasks. But this
really kills performance. I wish I could find a way to obtain better
performance. I have thought in doing pipe of the output of Job 1 to Job 2,
but in the end I really need to stop the execution between these 2 jobs.

I have looked to the ChainedJobs and CapacityScheduler classes to see if I
could implement a way to suspend and resume a job, but I didn't do nothing
successfully. Any idea to emulate a way to suspend a job?

Sorry to say this, but I am really desperate in finding a solution.

Thanks,


On Wed, Feb 18, 2015 at 6:53 PM, Steve Loughran 
wrote:

> Afraid not.
>
> When we suspend/resume a slider application, what we are doing is shutting
> down the entire application, releasing all its YARN resources and killing
> the "Application Master". The  MapReduce engine runs its AM for the
> duration of the job; building up lots of state in that AM as to what is
> happening. Tez runs for longer, but it can dynamically change cluster size
> based on load.
>
> "Hadoop pre-emption" is a mechanism by which your cluster can be set up so
> that higher priority workloads can cause containers of lower-priority jobs
> to get killed, "pre-empted". Maybe that could be useful.
>
> -Steve
>
>
>
> On 18 February 2015 at 17:22:57, xeonmailinglist (
> xeonmailingl...@gmail.com) wrote:
>
> Hi,
>
> I noticed that YARN does not suspend or resume a mapreduce job that it
> is executing. Then, I have found Apache Slider.
> Is it possible to submit a mapreduce job with slider, and suspend and
> resume the job while executing?
>
> Thanks,
>
>


Improve IdentityMapper code for wordcount

2016-08-21 Thread xeon Mailinglist
Hi,

I have created a map method that reads the map output of the wordcount
example [1]. This example is away from using the IdentityMapper.class that
MapReduce offers, but this is the only way that I have found to make a
working IdentityMapper for the Wordcount. The only problem is that this
Mapper is taking much more time than I wanted. I am starting to think that
maybe I am doing some redundant stuff. Any help to improve my
IdentityMapper code?

[1] Identity mapper

public class WordCountIdentityMapper extends MyMapper {
private Text word = new Text();

public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
word.set(itr.nextToken());
Integer val = Integer.valueOf(itr.nextToken());
context.write(word, new IntWritable(val));
}

public void run(Context context) throws IOException, InterruptedException {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
}
}


[2] Map class that generated the mapoutput

public static class MyMap extends Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

public void run(Context context) throws IOException, InterruptedException {
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(),
context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
}



Thanks,


Pause between tasks or jobs?

2016-08-11 Thread xeon Mailinglist
I am looking for a way to pause a chained job or a chained task. I want to
do this because I want to validate the output of each map or reduce phase,
or between each job execution. Is it possible to pause the execution of
chained jobs or chained mappers or reducers in MapReduce V2. I was looking
for the ChainedMapper and ChainedReducer, but I haven't found anything that
could allow me to pause the execution.


Where is the temp output data of a map or reduce tasks

2016-08-11 Thread xeon Mailinglist
With MapReduce v2 (Yarn), the output data that comes out from a map or a
reduce task is saved in the local disk or the HDFS when all the tasks
finish.

Since tasks end at different times, I was expecting that the data were
written as a task finish. For example, task 0 finish and so the output is
written, but task 1 and task 2 are still running. Now task 2 finish the
output is written, and task 1 is still running. Finally, task 1 finish and
the last output is written. But this does not happen. The outputs only
appear in the local disk or HDFS when all the tasks finish.

I want to access the task output as the data is being produced. Where is
the output data before all the tasks finish?


After I have set these params in `mapred-site.xml`


mapreduce.task.files.preserve.failedtaskstrue

mapreduce.task.files.preserve.filepattern*

I still can't found where the intermediate output or the final output is
saved as they are produced by the tasks.

I have listed all directories in `hdfs dfs -ls -R /` and in the `tmp` dir I
have only found the job configuration files.

drwx--   - root supergroup  0 2016-08-11 16:17
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002
-rw-r--r--   1 root supergroup  0 2016-08-11 16:17
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/COMMIT_STARTED
-rw-r--r--   1 root supergroup  0 2016-08-11 16:17
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/COMMIT_SUCCESS
-rw-r--r--  10 root supergroup 112872 2016-08-11 16:14
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job.jar
-rw-r--r--  10 root supergroup   6641 2016-08-11 16:14
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job.split
-rw-r--r--   1 root supergroup797 2016-08-11 16:14
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job.splitmetainfo
-rw-r--r--   1 root supergroup  88675 2016-08-11 16:14
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job.xml
-rw-r--r--   1 root supergroup 439848 2016-08-11 16:17
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job_1470912033891_0002_1.jhist
-rw-r--r--   1 root supergroup 105176 2016-08-11 16:14
/tmp/hadoop-yarn/staging/root/.staging/job_1470912033891_0002/job_1470912033891_0002_1_conf.xml

 Where is the output saved? I am talking about the output that it is stored
as it is being produced by the tasks, and not the final output that comes
when all map or reduce tasks finish.


Re: Copy data from local disc with WebHDFS?

2015-03-02 Thread xeon Mailinglist
1. I am using these 2 commands below to try to copy data from local disk to
HDFS. Unfortunately these commands are not working, and I don't understand
why they are not working. I have configured HDFS to use WebHDFS
protocol. How I copy data from the local disk to HDFS using WebHDfS
protocol?

xubuntu@hadoop-coc-1:~/Programs/hadoop$ *hdfs dfs -copyFromLocal ~/input1
webhdfs://192.168.56.101:8080/ http://192.168.56.101:8080/ *
Java HotSpot(TM) Client VM warning: You have loaded library
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might
have disabled stack guard. The VM will try to fix the stack guard now. It's
highly recommended that you fix the library with 'execstack -c libfile',
or link it with '-z noexecstack'.
 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000
failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000
failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

 xubuntu@hadoop-coc-1:~/Programs/hadoop$ *curl -i -X PUT -T ~/input1
http://192.168.56.101:8080/?op=CREATE;
http://192.168.56.101:8080/?op=CREATE*
HTTP/1.1 100 Continue HTTP/1.1 405 HTTP method PUT is not supported by this
URL Date:
 Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache Date:
Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache
Content-Length: 0
Server: Jetty(6.1.26)


2. Every time I launch a command in YARN I get a java hotspot warning
(warning below). How I remove the Java HotSpotWarning?

xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal
~/input1 webhdfs://192.168.56.101:8080/
Java  HotSpot(TM) Client VM warning: You have loaded library
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which
might  have disabled stack guard. The VM will try to fix the stack
guard now.
It's highly recommended that you fix the library with 'execstack -c
libfile', or link it with '-z noexecstack'.

Thanks,

On Monday, March 2, 2015, xeonmailinglist xeonmailingl...@gmail.com wrote:

  Hi,

 1 - I have HDFS running with WebHDFS protocol. I want to copy data from
 local disk to HDFS, but I get the error below. How I copy data from the
 local disk to HDFS?

 xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 
 webhdfs://192.168.56.101:8080/
 Java HotSpot(TM) Client VM warning: You have loaded library 
 /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have 
 disabled stack guard. The VM will try to fix the stack guard now.
 It's highly recommended that you fix the library with 'execstack -c 
 libfile', or link it with '-z noexecstack'.
 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
 failed on connection exception: java.net.ConnectException: Connection 
 refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
 copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
 failed on connection exception: java.net.ConnectException: Connection 
 refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused


 xubuntu@hadoop-coc-1:~/Programs/hadoop$ curl -i -X PUT -T ~/input1 
 http://192.168.56.101:8080/?op=CREATE; 
 http://192.168.56.101:8080/?op=CREATE
 HTTP/1.1 100 Continue

 HTTP/1.1 405 HTTP method PUT is not supported by this URL
 Date: Mon, 02 Mar 2015 16:50:36 GMT
 Pragma: no-cache
 Date: Mon, 02 Mar 2015 16:50:36 GMT
 Pragma: no-cache
 Content-Length: 0
 Server: Jetty(6.1.26)

 $ netstat -plnet
 tcp0  0 192.168.56.101:8080 0.0.0.0:*   LISTEN
   1000   587397  8229/java
 tcp0  0 0.0.0.0:43690.0.0.0:*   LISTEN
   1158049-
 tcp0  0 127.0.0.1:530.0.0.0:*   LISTEN
   0  8336-
 tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN
   0  7102-
 tcp0  0 127.0.0.1:631   0.0.0.0:*   LISTEN
   0  104794  -
 tcp0  0 0.0.0.0:50010   0.0.0.0:*   LISTEN
   1000   588404  8464/java
 tcp0  0 0.0.0.0:50075   0.0.0.0:*   LISTEN
   1000   589155  8464/java
 tcp0  0 0.0.0.0:50020   0.0.0.0:*   LISTEN
   1000   589169  8464/java
 tcp0  0 192.168.56.101:6600 0.0.0.0:*   LISTEN
   1000   587403  8229/java
 tcp6   0  

Re: 1 job with Input data from 2 HDFS?

2015-02-27 Thread xeon Mailinglist
Hi,

I don't understand this part of your answer: read the other as a
side-input directly by creating a client..

If I consider both inputs through the InputFormat, this means that a job
will contain both input path in its configuration, and this is enough to
work. So, what is the other? Is is the second input? Can you please
explain what you have meant?

On Friday, February 27, 2015, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 It is entirely possible. You should treat one of them as the primary
 inputs through the InputFormat/Mapper and read the other as a side-input
 directly by creating a client.

 +Vinod

 On Feb 27, 2015, at 7:22 AM, xeonmailinglist xeonmailingl...@gmail.com
 javascript:; wrote:

  Hi,
 
  I would like to have a mapreduce job that reads input data from 2 HDFS.
 Is this possible?
 
  Thanks,




Copy data between clusters during the job execution.

2015-02-02 Thread xeon Mailinglist
Hi

I want to have a job that copies the map output, or the reduce output to
another hdfs. Is is possible?

E.g., the job runs in cluster 1 and takes the input from this cluster.
Then, before the job finishes, it copies the map output or the reduce
output to the hdfs in the cluster 2.

Thanks,


Set Hadoop MRv2 behind a NAT

2014-02-18 Thread xeon Mailinglist
I am trying to set Hadoop MapReduce (MRv2) behind the NAT, but when I try
to connect the Datanode, I get the error below.

The hosts have 2 interfaces, one with a private address and another with
the NAT address. To access the host with SSH, I must use an external IP,
that NAT server will redirect.

I want to access MRv2 from outside,  and for that I tried to set an NAT'd
IP, but the Datanado don't starts. How I set the MRv2 address in the
configuration files, so that I can put Hadoop running?

2014-02-18 11:55:43,105 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
block pool Block pool
 BP-1302615141-172.16.100.1-1392724171451 (storage id
DS-1964144366-172.16.100.2-50010-1392724135477) service to /10.103.0.11:
9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied com
munication with namenode: DatanodeRegistration(0.0.0.0,
storageID=DS-1964144366-172.16.100.2-50010-1392724135477, infoPort=500
75, ipcPort=50020,
storageInfo=lv=-40;cid=CID-f45cf960-4e55-420b-a20c-43f6edf1a847;nsid=1117538035;c=0)
at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:631)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3398)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
:


Unable to load native-hadoop library for your platform

2014-02-12 Thread xeon Mailinglist
I am trying to run an example and I get the following error:

HadoopMaster-nh:~# /root/Programs/hadoop/bin/hdfs dfs -count /wiki
OpenJDK 64-Bit Server VM warning: You have loaded library
/root/Programs/hadoop-2.0.5-alpha/lib/native/libhadoop.so.1.0.0 which might
have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
libfile', or link it with '-z noexecstack'.
14/02/13 05:24:48 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable


I tried to run execstack -c, but the problem stays the same. Any help?
HadoopMaster-nh:~# execstack -c
/root/Programs/hadoop-2.0.5-alpha/lib/native/libhadoop.so.1.0.0
HadoopMaster-nh:~# /root/Programs/hadoop/bin/hdfs dfs -count /wiki
OpenJDK 64-Bit Server VM warning: You have loaded library
/root/Programs/hadoop-2.0.5-alpha/lib/native/libhadoop.so.1.0.0 which might
have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
libfile', or link it with '-z noexecstack'.
14/02/13 05:26:45 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable


DisallowedDatanodeException: Datanode denied communication with namenode

2014-02-06 Thread xeon Mailinglist
I am trying to launch the datanodes in Hadoop MRv2, and I get the error
below. I looked to Hadoop conf files and the /etc/hosts and everything
looks ok. What is wrong in my configuration?


org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0,
storageID=DS-1286267910-172.16.XXX.XXX-50010-1391710467907, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-40;cid=CID-86007361-15b7-4022-ac5f-52ca83d98373;nsid=1884118048;c=0)
at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:631)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3398)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)


The /etc/hosts is well configured, and the Hadoop's slaves file also. Here
are my conf files:


172:~/Programs/hadoop/etc/hadoop# cat core-site.xml
?xml version=1.0 encoding=UTF-8?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --
configuration
  property namefs.default.name/name
valuehdfs://172.16.YYY.YYY:9000/value /property
  property namehadoop.tmp.dir/name value/tmp/hadoop-temp/value
/property
  !--
propertynamehadoop.proxyuser.xeon.hosts/namevalue*/value/property

propertynamehadoop.proxyuser.xeon.groups/namevalue*/value/property--
/configuration


172:~/Programs/hadoop/etc/hadoop# cat hdfs-site.xml
?xml version=1.0 encoding=UTF-8?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --
configuration
property namedfs.replication/name value1/value
/property
property namedfs.permissions/name valuefalse/value
/property
property namedfs.name.dir/name
value/tmp/data/dfs/name//value /property
property namedfs.data.dir/name
value/tmp/data/dfs/data//value /property
/configuration


172:~/Programs/hadoop/etc/hadoop# cat mapred-site.xml
?xml version=1.0 encoding=UTF-8?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --
configuration
 property
   namemapreduce.framework.name/name
   valueyarn/value
 /property

 property
   namemapreduce.jobhistory.done-dir/name
   value/root/Programs/hadoop/logs/history/done/value
 /property

 property
   namemapreduce.jobhistory.intermediate-done-dir/name
   value/root/Programs/hadoop/logs/history/intermediate-done-dir/value
 /property

property
  namemapreduce.map.output.compress/name
  valuetrue/value
/property
property
  namemapred.map.output.compress.codec/name
  valueorg.apache.hadoop.io.compress.BZip2Codec/value
/property
/configuration


I don't use any dfs_hosts_allow.txt. I also say that /etc/hosts is ok
because I can access all the nodes with ssh. Just for curiosity the
hostname have the ipaddress. Here is /etc/hosts

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

172.16.XXX.XX1 172.16.XXX.XX1
172.16.XXX.XX2 172.16.XXX.XX2
172.16.XXX.XX3 172.16.XXX.XX3
172.16.XXX.XX4 172.16.XXX.XX4


org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException

2014-01-08 Thread xeon Mailinglist
When I try to launch the namenode and the datanode in MRv2, the datanode
can't connect to the namenode, giving me the error below. I also put the
core-site file that I use below.

The Firewall in the hosts is disabled. I don't have excluded nodes defined.
Why the datanodes can't connect to the namenode?  Any help to solve this
problem?


org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException):
Datanode denied communication with namenode: DatanodeRegistrati
on(0.0.0.0, storageID=DS-1449645935-172.16.1.10-50010-1389224474955,
infoPort=50075, ipcPort=50020,
storageInfo=lv=-40;cid=CID-9a8571a3-17ae-49b2-b957-b009e88b9f9a;nsid=9
34416283;c=0)
at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:631)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3398)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:881)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)
at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at org.apache.hadoop.ipc.Client.call(Client.java:1235)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
at java.lang.Thread.run(Thread.java:701)

I set the core-site.xml

configuration
  property namefs.default.name/name
valuehdfs://10.103.0.17:9000/value
/property
  property namehadoop.tmp.dir/name value/tmp/hadoop-temp/value
/property

propertynamehadoop.proxyuser.root.hosts/namevalue*/value/property

propertynamehadoop.proxyuser.root.groups/namevalue*/value/property
/configuration


is it possible to list jobs that are waiting to run?

2014-01-03 Thread xeon Mailinglist
Hi,

Is it possible that jobs submitted stay waiting before starting to run?

Is there a command that list the jobs that are submitted and are waiting to
start to run?


-- 
Thanks,


java.net.SocketTimeoutException in the Datanode

2014-01-03 Thread xeon Mailinglist
I am running an wordcount example it MRv2, but I get this error in a
Datanode. It looks that it is a problem in the network between the Namenode
and the Datanode, but I am not sure.

What is this error? How can I fix this problem?

2014-01-03 16:46:29,319 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
BP-570096904-155.99.144.100-1388771741297:blk_-3952564661572372834_1072
received exception java.net.SocketTimeoutException: 6 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/155.99.144.101:50010remote=/
155.99.144.101:44937]
2014-01-03 16:46:30,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
pcvm3-3.utahddc.geniracks.net:50010:DataXceiver error processing
WRITE_BLOCK operation  src: /155.99.144.101:44937 dest: /
155.99.144.101:50010
java.net.SocketTimeoutException: 6 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/155.99.144.101:50010remote=/
155.99.144.101:44937]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:644)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:701)