Re: How to set AM attempt interval?

2015-03-02 Thread Nur Kholis Majid
Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_10. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_01_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
 wrote:
> That's an old JIRA. The right solution is not an AM-retry interval but
> launching the AM somewhere.
>
> Why is your AM failing in the first place? If it is due to full-disk, the
> situation should be better with YARN-1781 - can you use the configuration
> (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
> added at YARN-1781?
>
> +Vinod
>
> On Feb 27, 2015, at 7:31 AM, Ted Yu  wrote:
>
> Looks like this is related:
> https://issues.apache.org/jira/browse/YARN-964
>
> On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
>  wrote:
>>
>> Hi All,
>>
>> I have many jobs failed because AM trying to rerun job in very short
>> interval (only in 6 second). How can I add the interval to bigger
>> value?
>>
>> https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
>>
>> Thank you.
>
>
>


Re: How to find bottlenecks of the cluster ?

2015-03-02 Thread Adrien Mogenet
This is a non-sense ; you have to tell us under which conditions you want
to find a bottleneck.

Regardless the workload, we mostly use OpenTSDB to check cpu times (iowait
/ user / sys / idle), disk usage (await, ios in progress...) and memory
(numa allocations, buffers, cache, dirty pages...)

On 2 March 2015 at 08:20, Krish Donald  wrote:

> Basically we have 4 points to consider, CPU , Memory, IO and Network
>
> So how to see which one is causing the bottleneck ?
> What parameters we should consider etc ?
>
> On Sun, Mar 1, 2015 at 10:57 PM, Nishanth S 
> wrote:
>
>> This is a  vast  topic.Can you tell what components are there in your
>> data pipe line and how data flows in to system and the way its
>> processed.There are several  inbuilt tests like testDFSIO and terasort that
>> you can run.
>>
>> -Nishan
>>
>> On Sun, Mar 1, 2015 at 9:45 PM, Krish Donald 
>> wrote:
>>
>>> Hi,
>>>
>>> I wanted to understand, how should we find out the bottleneck of the
>>> cluster?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>


-- 

*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://www.contentsquare.com
4, avenue Franklin D. Roosevelt - 75008 Paris


AW: Hadoop 2.6.0 - No DataNode to stop

2015-03-02 Thread Daniel Klinger
Thanks for your help. But unfortunatly this didn’t do the job. Here’s the 
Shellscript I’ve written to start my cluster (the scripts on the other node 
only contains the command to start the datanode respectively the command to 
start the Nodemanager on the other node (with the right user (hdfs / yarn)):

 

 

#!/bin/bash

# Start 
HDFS-

# Start Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs start namenode"

wait

 

# Start all Datanodes

export HADOOP_SECURE_DN_USER=hdfs

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs start datanode"

wait

ssh root@hadoop-data.klinger.local 'bash startDatanode.sh'

wait

 

# Start Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
start resourcemanager"

wait

 

# Start Nodemanager on all Nodes

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
start nodemanager"

wait

ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh'

wait

 

# Start Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver --config 
$HADOOP_CONF_DIR"

#wait

 

# Start Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver 
--config $HADOOP_CONF_DIR"

wait

 

This script generates the following output:

 

starting namenode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out

starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out

starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out

starting resourcemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out

starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out

starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out

starting historyserver, logging to 
/var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out

 

Following my stopscript and it’s output:

 

#!/bin/bash

# Stop 
HDFS

# Stop Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs stop namenode"

 

# Stop all Datanodes

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs stop datanode"

ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh'

 

# Stop Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
stop resourcemanager"

 

#Stop Nodemanager on all Hosts

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
stop nodemanager"

ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh'

 

#Stop Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver --config 
$HADOOP_CONF_DIR"

 

#Stop Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver 
--config $HADOOP_CONF_DIR"

 

stopping namenode

no datanode to stop

no datanode to stop

stopping resourcemanager

stopping nodemanager

stopping nodemanager

nodemanager did not stop gracefully after 5 seconds: killing with kill -9

stopping historyserver

 

Is there may be anything wrong with my commands?

 

Greets

DK

 

Von: Varun Kumar [mailto:varun@gmail.com] 
Gesendet: Montag, 2. März 2015 05:28
An: user
Betreff: Re: Hadoop 2.6.0 - No DataNode to stop

 

1.Stop the service 

2.Change the permissions for log and pid directory once again to hdfs.

 

3.Start service with hdfs.

 

This will resolve the issue

 

On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger mailto:d...@web-computing.de> > wrote:

Thanks for your answer. 

 

I put the FQDN of the DataNodes in the slaves file on each node (one FQDN per 
line). Here’s the full DataNode log after the start (the log of the other 
DataNode is exactly the same):

 

2015-03-02 00:29:41,841 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
registered UNIX signal handlers for [TERM, HUP, INT]

2015-03-02 00:29:42,207 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
loaded properties from hadoop-metrics2.properties

2015-03-02 00:29:42,312 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Scheduled snapshot period at 10 second(s).

2015-03-02 00:29:42,313 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
DataNode metrics system started

2015-03-02 00:29:42,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Configured hostname is hadoop.klinger.local

2015-03-02 00:29:42,327 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Starting DataNode with maxLockedMemory = 0

2015-03-02 00:29:42,350 INFO org.apach

Re: Permission Denied

2015-03-02 Thread David Patterson
David,

Thanks for the information. I've issued those two commands in my hadoop
shell and still get the same error when I try to initialize accumulo in
*its* shell. :

2015-03-02 13:30:41,175 [init.Initialize] FATAL: Failed to initialize
filesystem
   org.apache.hadoop.security.AccessControlException: Permission denied:
user=accumulo, access=WRITE, inode="/accumulo":
   accumulo.supergroup:supergroup:drwxr-xr-x

My comment that I had 3 users was meant in a linux sense, not in a hadoop
sense. So (to borrow terminoloy from RDF or XML) is there something I have
to do in my hadoop setup (running under linix:hadoop) or my accumulo setup
(running under linux:accumulo) so that the accumuulo I/O gets processed as
from someone in the hadoop:supergroup?


I tried running the accumulo init from the linux:hadoop user and it worked.
I'm not sure if any permissions/etc were hosed by doing it there. I'll see.

Thanks for you help.

(By the way, is it wrong or a bad idea to split the work into three
linux:users, or should it all be done in one linux:user space?)

Dave Patterson

On Sun, Mar 1, 2015 at 8:35 PM, dlmarion  wrote:

> hadoop fs -mkdir /accumulo
> hadoop fs -chown accumulo:supergroup /accumulo
>
>
>
>  Original message 
> From: David Patterson 
> Date:03/01/2015 7:04 PM (GMT-05:00)
> To: user@hadoop.apache.org
> Cc:
> Subject: Re: Permission Denied
>
> David,
>
> Thanks for the reply.
>
> Taking the questions in the opposite order, my accumulo-site.xml does not
> have volumes specified.
>
> I edited the accumulo-site.xml so it now has
>   
> instance.volumes
> hdfs://localhost:9000/accumulo
> comma separated list of URIs for volumes. example:
> hdfs://localhost:9000/accumulo
>   
>
> and got the same error.
>
> How can I precreate /accumulo ?
>
> Dave Patterson
>
> On Sun, Mar 1, 2015 at 3:50 PM, david marion  wrote:
>
>>  It looks like / is owned by hadoop.supergroup and the perms are 755. You
>> could precreate /accumulo and chown it appropriately, or set the perms for
>> / to 775. Init is trying to create /accumulo in hdfs as the accumulo user
>> and your perms dont allow it.
>>
>>  Do you have instance.volumes set in accumulo-site.xml?
>>
>>
>>  Original message 
>> From: David Patterson 
>> Date:03/01/2015 3:36 PM (GMT-05:00)
>> To: user@hadoop.apache.org
>> Cc:
>> Subject: Permission Denied
>>
>>I'm trying to create an Accumulo/Hadoop/Zookeeper configuration
>> on a single (Ubuntu) machine, with Hadoop 2.6.0, Zookeeper 3.4.6 and
>> Accumulo 1.6.1.
>>
>>  I've got 3 userids for these components that are in the same group and
>> no other users are in that group.
>>
>>  I have zookeeper running, and hadoop as well.
>>
>>  Hadoop's core-site.xml file has the hadoop.tmp.dir set to
>> /app/hadoop/tmp.The /app/hadoop/tmp directory is owned by the hadoop user
>> and has permissions that allow other members of the group to write
>> (drwxrwxr-x).
>>
>>  When I try to initialize Accumulo, with bin/accumulo init, I get FATAL:
>> Failed to initialize filesystem.
>>  org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=accumulo, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x
>>
>>  So, my main question is which directory do I need to give group-write
>> permission so the accumulo user can write as needed so it can initialize?
>>
>>  The second problem is that the Accumulo init reports
>> [Configuration.deprecation] INFO : fs.default.name is deprecated.
>> Instead use fs.defaultFS. However, the hadoop core-site.xml file contains:
>> fs.defaultFS
>> hdfs://localhost:9000
>>
>>  Is there somewhere else that this value (fs.default.name) is specified?
>> Could it be due to Accumulo having a default value and not getting the
>> override from hadoop because of the problem listed above?
>>
>>  Thanks
>>
>>  Dave Patterson
>>  patt...@gmail.com
>>
>
>


share the same namespace in 2 YARN instances?

2015-03-02 Thread xeonmailinglist

Hi,

I was reading about Federation of HDFS, which is possible in YARN 
(http://www.devx.com/opensource/enhance-existing-hdfs-architecture-with-hadoop-federation.html), 
and  I started to wonder if is it possible to have 2 YARN runtimes that 
share the same HDFS namespace?


Thanks,


how to catch exception when data cannot be replication to any datanode

2015-03-02 Thread Chen Song
Hey

I got the following error in the application logs when trying to put a file
to DFS.

015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
could only be replicated to 0 nodes instead of minReplication (=1).
There are 317 datanode(s) running and no node(s) are excluded in this
operation.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)


This results in empty file in HDFS. I did some search through this email
thread and found that this could be caused by disk full, or data node
unreachable.

However, this exception was only logged as WARN level when FileSystem.close
is called, and never thrown visible to client. My question is, on the
client level, How can I catch this exception and handle it?

Chen

-- 
Chen Song


Re: Permission Denied

2015-03-02 Thread Sean Busbey
Splitting into three unix users is a good idea. Generally, none of the
linux users should need access to any of the local resources owned by the
others. (that is, the user running the accumulo processes shouldn't be able
to interfere with the backing files used by the HDFS processes).

By default, the linux user that drives a particular process will be
resolved to a Hadoop user by the NameNode process. Presuming your Accumulo
services are running under the linux user "accumulo", you should ensure
that user exists on the linux node that runs the NameNode.

The main issue with running init as the hadoop user is that by default it's
likely going to write the accumulo directories as owned by the user that
created them. Presuming you are using Accumulo because you have security
requirements, the common practice is to make sure only the user that runs
Accumulo processes can write to /accumulo and that only that user can read
/accumulo/tables and /accumulo/wal. This ensures that other users with
access to the HDFS cluster won't be able to bypass the cell-level access
controls provided by Accumulo.

While you are setting up HDFS directories, you should also create a home
directory for the user that runs Accumulo processes. If your HDFS instance
is set to use the trash feature (either in server configs or the client
configs made available to Accumulo), then by default Accumulo will attempt
to use it. Without a home directory, this will result in failures.
Alternatively, you can ensure Accumulo doesn't rely on the trash feature by
setting gc.trash.ignore in your accumulo-site.xml.

One other note:

> I edited the accumulo-site.xml so it now has
>  
>instance.volumes
>hdfs://localhost:9000/accumulo
>comma separated list of URIs for volumes. example:
hdfs://localhost:9000/accumulo
>  

You will save yourself headache later if you stick with fully qualified
domain names for all HDFS, ZooKeeper, and Accumulo connections.

-- 
Sean

On Mon, Mar 2, 2015 at 8:13 AM, David Patterson  wrote:

> David,
>
> Thanks for the information. I've issued those two commands in my hadoop
> shell and still get the same error when I try to initialize accumulo in
> *its* shell. :
>
> 2015-03-02 13:30:41,175 [init.Initialize] FATAL: Failed to initialize
> filesystem
>org.apache.hadoop.security.AccessControlException: Permission denied:
> user=accumulo, access=WRITE, inode="/accumulo":
>accumulo.supergroup:supergroup:drwxr-xr-x
>
> My comment that I had 3 users was meant in a linux sense, not in a hadoop
> sense. So (to borrow terminoloy from RDF or XML) is there something I have
> to do in my hadoop setup (running under linix:hadoop) or my accumulo setup
> (running under linux:accumulo) so that the accumuulo I/O gets processed as
> from someone in the hadoop:supergroup?
>
>
> I tried running the accumulo init from the linux:hadoop user and it
> worked. I'm not sure if any permissions/etc were hosed by doing it there.
> I'll see.
>
> Thanks for you help.
>
> (By the way, is it wrong or a bad idea to split the work into three
> linux:users, or should it all be done in one linux:user space?)
>
> Dave Patterson
>
> On Sun, Mar 1, 2015 at 8:35 PM, dlmarion  wrote:
>
>> hadoop fs -mkdir /accumulo
>> hadoop fs -chown accumulo:supergroup /accumulo
>>
>>
>>
>>  Original message 
>> From: David Patterson 
>> Date:03/01/2015 7:04 PM (GMT-05:00)
>> To: user@hadoop.apache.org
>> Cc:
>> Subject: Re: Permission Denied
>>
>> David,
>>
>> Thanks for the reply.
>>
>> Taking the questions in the opposite order, my accumulo-site.xml does not
>> have volumes specified.
>>
>> I edited the accumulo-site.xml so it now has
>>   
>> instance.volumes
>> hdfs://localhost:9000/accumulo
>> comma separated list of URIs for volumes. example:
>> hdfs://localhost:9000/accumulo
>>   
>>
>> and got the same error.
>>
>> How can I precreate /accumulo ?
>>
>> Dave Patterson
>>
>> On Sun, Mar 1, 2015 at 3:50 PM, david marion 
>> wrote:
>>
>>>  It looks like / is owned by hadoop.supergroup and the perms are 755.
>>> You could precreate /accumulo and chown it appropriately, or set the perms
>>> for / to 775. Init is trying to create /accumulo in hdfs as the accumulo
>>> user and your perms dont allow it.
>>>
>>>  Do you have instance.volumes set in accumulo-site.xml?
>>>
>>>
>>>  Original message 
>>> From: David Patterson 
>>> Date:03/01/2015 3:36 PM (GMT-05:00)
>>> To: user@hadoop.apache.org
>>> Cc:
>>> Subject: Permission Denied
>>>
>>>I'm trying to create an Accumulo/Hadoop/Zookeeper configuration
>>> on a single (Ubuntu) machine, with Hadoop 2.6.0, Zookeeper 3.4.6 and
>>> Accumulo 1.6.1.
>>>
>>>  I've got 3 userids for these components that are in the same group and
>>> no other users are in that group.
>>>
>>>  I have zookeeper running, and hadoop as well.
>>>
>>>  Hadoop's core-site.xml file has the hadoop.tmp.dir set to
>>> /app/hadoop/tmp.The /app/hadoop/tmp directory is owned by the hadoop user
>>> a

Copy data from local disc with WebHDFS?

2015-03-02 Thread xeonmailinglist

Hi,

1 - I have HDFS running with WebHDFS protocol. I want to copy data from 
local disk to HDFS, but I get the error below. How I copy data from the 
local disk to HDFS?


|xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 
webhdfs://192.168.56.101:8080/
Java HotSpot(TM) Client VM warning: You have loaded library 
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have 
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', 
or link it with '-z noexecstack'.
15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused


xubuntu@hadoop-coc-1:~/Programs/hadoop$ curl -i -X PUT -T ~/input1 
"http://192.168.56.101:8080/?op=CREATE";
HTTP/1.1 100 Continue

HTTP/1.1 405 HTTP method PUT is not supported by this URL
Date: Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache
Date: Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache
Content-Length: 0
Server: Jetty(6.1.26)
|

|$ netstat -plnet
tcp0  0 192.168.56.101:8080 0.0.0.0:*   LISTEN  
1000   587397  8229/java
tcp0  0 0.0.0.0:43690.0.0.0:*   LISTEN  
1158049-
tcp0  0 127.0.0.1:530.0.0.0:*   LISTEN  
0  8336-
tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN  
0  7102-
tcp0  0 127.0.0.1:631   0.0.0.0:*   LISTEN  
0  104794  -
tcp0  0 0.0.0.0:50010   0.0.0.0:*   LISTEN  
1000   588404  8464/java
tcp0  0 0.0.0.0:50075   0.0.0.0:*   LISTEN  
1000   589155  8464/java
tcp0  0 0.0.0.0:50020   0.0.0.0:*   LISTEN  
1000   589169  8464/java
tcp0  0 192.168.56.101:6600 0.0.0.0:*   LISTEN  
1000   587403  8229/java
tcp6   0  0 :::22   :::*LISTEN  
0  7086-
tcp6   0  0 ::1:631 :::*LISTEN  
0  104793  -
|

2 - How I remove the Warning that I am always having every time I launch 
a command in YARN?


|xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 
webhdfs://192.168.56.101:8080/
Java  HotSpot(TM) Client VM warning: You have loaded library  
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might  have 
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', 
or link it with '-z noexecstack'.
|


​



Re: how to catch exception when data cannot be replication to any datanode

2015-03-02 Thread Ted Yu
Which hadoop release are you using ?

In branch-2, I see this IOE in BlockManager :

if (targets.length < minReplication) {
  throw new IOException("File " + src + " could only be replicated to "
  + targets.length + " nodes instead of minReplication (="
  + minReplication + ").  There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song  wrote:

> Hey
>
> I got the following error in the application logs when trying to put a
> file to DFS.
>
> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 317 datanode(s) running and no node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
> at org.apache.hadoop.ipc.Client.call(Client.java:1362)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
> at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>
>
> This results in empty file in HDFS. I did some search through this email
> thread and found that this could be caused by disk full, or data node
> unreachable.
>
> However, this exception was only logged as WARN level when
> FileSystem.close is called, and never thrown visible to client. My question
> is, on the client level, How can I catch this exception and handle it?
>
> Chen
>
> --
> Chen Song
>
>


how to check hdfs

2015-03-02 Thread Shengdi Jin
Hi all,
I just start to learn hadoop, I have a naive question

I used
hdfs dfs -ls /home/cluster
to check the content inside.
But I get error
ls: No FileSystem for scheme: hdfs

My configuration file core-site.xml is like


  fs.defaultFS
  hdfs://master:9000



hdfs-site.xml is like


   dfs.replication
   2


   dfs.name.dir
   file:/home/cluster/mydata/hdfs/namenode


   dfs.data.dir
   file:/home/cluster/mydata/hdfs/datanode



is there any thing wrong ?

Thanks a lot.


Monitor data transformation

2015-03-02 Thread Fei Hu
Hi All,

I developed a scheduler for data locality. Now I want to test the performance 
of the scheduler, so I need to monitor how many data are read remotely. Is 
there every tool for monitoring the volume of data moved around the cluster?

Thanks,
Fei

QUERY

2015-03-02 Thread supriya pati
Hello sir/madam,
I am doing research on hadoop job scheduling. I want to modify hadoop job
scheduling algorithm. I have downloaded the source code hadoop-2.2.0 from
apache hadoop website and i have build the same using "mvn package
-Pdist,native,docs,src -DskipTests -Dtar" command to build the jar in
fedora.Since you have worked on the same. I would request you to kindly
guide me on few queries regarding the same that are:-

1) what are the steps to modify source code?
2) how to compile and test the modification i have made to the source code?

Thanking you.


Re: how to catch exception when data cannot be replication to any datanode

2015-03-02 Thread Chen Song
I am using CDH5.1.0, which is hadoop 2.3.0.

On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu  wrote:

> Which hadoop release are you using ?
>
> In branch-2, I see this IOE in BlockManager :
>
> if (targets.length < minReplication) {
>   throw new IOException("File " + src + " could only be replicated to "
>   + targets.length + " nodes instead of minReplication (="
>   + minReplication + ").  There are "
>
> Cheers
>
> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song  wrote:
>
>> Hey
>>
>> I got the following error in the application logs when trying to put a
>> file to DFS.
>>
>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>> /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
>>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
>> are 317 datanode(s) running and no node(s) are excluded in this operation.
>> at 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>> at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>> at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>> at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>> at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>> at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>> at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>> at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>
>>
>> This results in empty file in HDFS. I did some search through this email
>> thread and found that this could be caused by disk full, or data node
>> unreachable.
>>
>> However, this exception was only logged as WARN level when
>> FileSystem.close is called, and never thrown visible to client. My question
>> is, on the client level, How can I catch this exception and handle it?
>>
>> Chen
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song


Re: how to catch exception when data cannot be replication to any datanode

2015-03-02 Thread Chen Song
Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song  wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu  wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>> if (targets.length < minReplication) {
>>   throw new IOException("File " + src + " could only be replicated to
>> "
>>   + targets.length + " nodes instead of minReplication (="
>>   + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song  wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>>> /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
>>>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
>>> are 317 datanode(s) running and no node(s) are excluded in this operation.
>>> at 
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>> at 
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>> at 
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>> at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>> at 
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>> at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>> at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song


Data locality

2015-03-02 Thread Fei Hu
Hi All,

I developed a scheduler for data locality. Now I want to test the performance 
of the scheduler, so I need to monitor how many data are read remotely. Is 
there any tool for monitoring the volume of data moved around the cluster?

Thanks,
Fei

Re: AW: Hadoop 2.6.0 - No DataNode to stop

2015-03-02 Thread Ulul

Hi
The hadoop-daemon.sh script prints the no $command to stop if it doesn'f 
find the pid file.
You should echo the $pid variable and see if you hava a correct pid file 
there.

Ulul

Le 02/03/2015 13:53, Daniel Klinger a écrit :


Thanks for your help. But unfortunatly this didn’t do the job. Here’s 
the Shellscript I’ve written to start my cluster (the scripts on the 
other node only contains the command to start the datanode 
respectively the command to start the Nodemanager on the other node 
(with the right user (hdfs / yarn)):


#!/bin/bash

# Start 
HDFS-


# Start Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config 
$HADOOP_CONF_DIR --script hdfs start namenode"


wait

# Start all Datanodes

export HADOOP_SECURE_DN_USER=hdfs

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config 
$HADOOP_CONF_DIR --script hdfs start datanode"


wait

ssh root@hadoop-data.klinger.local 'bash startDatanode.sh'

wait

# Start Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config 
$HADOOP_CONF_DIR start resourcemanager"


wait

# Start Nodemanager on all Nodes

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config 
$HADOOP_CONF_DIR start nodemanager"


wait

ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh'

wait

# Start Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver --config 
$HADOOP_CONF_DIR"


#wait

# Start Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start 
historyserver --config $HADOOP_CONF_DIR"


wait

This script generates the following output:

starting namenode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out


starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out


starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out


starting resourcemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out


starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out


starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out


starting historyserver, logging to 
/var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out


Following my stopscript and it’s output:

#!/bin/bash

# Stop 
HDFS


# Stop Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config 
$HADOOP_CONF_DIR --script hdfs stop namenode"


# Stop all Datanodes

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config 
$HADOOP_CONF_DIR --script hdfs stop datanode"


ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh'

# Stop Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config 
$HADOOP_CONF_DIR stop resourcemanager"


#Stop Nodemanager on all Hosts

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config 
$HADOOP_CONF_DIR stop nodemanager"


ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh'

#Stop Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver --config 
$HADOOP_CONF_DIR"


#Stop Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop 
historyserver --config $HADOOP_CONF_DIR"


stopping namenode

no datanode to stop

no datanode to stop

stopping resourcemanager

stopping nodemanager

stopping nodemanager

nodemanager did not stop gracefully after 5 seconds: killing with kill -9

stopping historyserver

Is there may be anything wrong with my commands?

Greets

DK

*Von:*Varun Kumar [mailto:varun@gmail.com]
*Gesendet:* Montag, 2. März 2015 05:28
*An:* user
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

1.Stop the service

2.Change the permissions for log and pid directory once again to hdfs.

3.Start service with hdfs.

This will resolve the issue

On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger > wrote:


Thanks for your answer.

I put the FQDN of the DataNodes in the slaves file on each node
(one FQDN per line). Here’s the full DataNode log after the start
(the log of the other DataNode is exactly the same):

2015-03-02 00:29:41,841 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX
signal handlers for [TERM, HUP, INT]

2015-03-02 00:29:42,207 INFO
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
from hadoop-metrics2.properties

2015-03-02 00:29:42,312 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).

2015-03-02 00:29:42,313 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
metrics system started

2015-03-02 00:29:42,319 INFO
org.apa

Cloudera Manager Installation is failing

2015-03-02 Thread Krish Donald
Hi,

I am trying to install Cloudera manager but it is failing and below is the
log file:
I have uninstalled postgres and tried again but still the same error.

[root@nncloudera cloudera-manager-installer]# more 5.start-embedded-db.log
mktemp: failed to create file via template `/tmp/': Permission
denied
/usr/share/cmf/bin/initialize_embedded_db.sh: line 393: $PASSWORD_TMP_FILE:
ambiguous redirect
The files belonging to this database system will be owned by user
"cloudera-scm".
This user must also own the server process.
The database cluster will be initialized with locale en_US.UTF8.
The default text search configuration will be set to "english".
fixing permissions on existing directory
/var/lib/cloudera-scm-server-db/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1
... ok
initializing pg_authid ... ok
initdb: could not open file "" for reading: No such file or directory
initdb: removing contents of data directory
"/var/lib/cloudera-scm-server-db/data"
Could not initialize database server.
  This usually means that your PostgreSQL installation failed or isn't
working properly.
  PostgreSQL is installed using the set of repositories found on this
machine. Please
  ensure that PostgreSQL can be installed. Please also uninstall any other
instances of
  PostgreSQL and then try again., giving up


Please suggest.

Thanks
Krish


Re: Cloudera Manager Installation is failing

2015-03-02 Thread Rich Haase
Try posting this question on the Cloudera forum. http://community.cloudera.com/

On Mar 2, 2015, at 3:21 PM, Krish Donald 
mailto:gotomyp...@gmail.com>> wrote:

Hi,

I am trying to install Cloudera manager but it is failing and below is the log 
file:
I have uninstalled postgres and tried again but still the same error.

[root@nncloudera cloudera-manager-installer]# more 5.start-embedded-db.log
mktemp: failed to create file via template `/tmp/': Permission denied
/usr/share/cmf/bin/initialize_embedded_db.sh: line 393: $PASSWORD_TMP_FILE: 
ambiguous redirect
The files belonging to this database system will be owned by user 
"cloudera-scm".
This user must also own the server process.
The database cluster will be initialized with locale en_US.UTF8.
The default text search configuration will be set to "english".
fixing permissions on existing directory /var/lib/cloudera-scm-server-db/data 
... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1 ... 
ok
initializing pg_authid ... ok
initdb: could not open file "" for reading: No such file or directory
initdb: removing contents of data directory 
"/var/lib/cloudera-scm-server-db/data"
Could not initialize database server.
  This usually means that your PostgreSQL installation failed or isn't working 
properly.
  PostgreSQL is installed using the set of repositories found on this machine. 
Please
  ensure that PostgreSQL can be installed. Please also uninstall any other 
instances of
  PostgreSQL and then try again., giving up


Please suggest.

Thanks
Krish



Re: Cloudera Manager Installation is failing

2015-03-02 Thread Krish Donald
Thanks Rich

On Mon, Mar 2, 2015 at 2:23 PM, Rich Haase  wrote:

>  Try posting this question on the Cloudera forum.
> http://community.cloudera.com/
>
>  On Mar 2, 2015, at 3:21 PM, Krish Donald  wrote:
>
>  Hi,
>
>  I am trying to install Cloudera manager but it is failing and below is
> the log file:
> I have uninstalled postgres and tried again but still the same error.
>
>  [root@nncloudera cloudera-manager-installer]# more
> 5.start-embedded-db.log
> mktemp: failed to create file via template `/tmp/': Permission
> denied
> /usr/share/cmf/bin/initialize_embedded_db.sh: line 393:
> $PASSWORD_TMP_FILE: ambiguous redirect
> The files belonging to this database system will be owned by user
> "cloudera-scm".
> This user must also own the server process.
> The database cluster will be initialized with locale en_US.UTF8.
> The default text search configuration will be set to "english".
> fixing permissions on existing directory
> /var/lib/cloudera-scm-server-db/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 32MB
> creating configuration files ... ok
> creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1
> ... ok
> initializing pg_authid ... ok
> initdb: could not open file "" for reading: No such file or directory
> initdb: removing contents of data directory
> "/var/lib/cloudera-scm-server-db/data"
> Could not initialize database server.
>   This usually means that your PostgreSQL installation failed or isn't
> working properly.
>   PostgreSQL is installed using the set of repositories found on this
> machine. Please
>   ensure that PostgreSQL can be installed. Please also uninstall any other
> instances of
>   PostgreSQL and then try again., giving up
>
>
>  Please suggest.
>
>  Thanks
> Krish
>
>
>


AW: AW: Hadoop 2.6.0 - No DataNode to stop

2015-03-02 Thread Daniel Klinger
Hi,

 

thanks for your help. The HADOOP_PID_DIR variable is pointing to 
/var/run/cluster/hadoop (which has hdfs:hadoop) as it’s owner. 3 PID are 
created there (datanode namenode and secure_dn). It looks like the PID was 
written but there was a readproblem.

 

I did chmod –R 777 on the folder and now the Datanodes are Stopped correctly. 
It only works when I’m running the start and stop command as user HDFS. If I 
try to start and stop as root (like its documented in the Documentation I still 
get the “no Datanode to stop” error.

 

Is it important to start the DN as root? The only thing I recognized is the 
secure_dn PID-File is not created when im starting the Datanode as HDFS-User. 
Is this a Problem?

 

Greets

DK

Von: Ulul [mailto:had...@ulul.org] 
Gesendet: Montag, 2. März 2015 21:50
An: user@hadoop.apache.org
Betreff: Re: AW: Hadoop 2.6.0 - No DataNode to stop

 

Hi
The hadoop-daemon.sh script prints the no $command to stop if it doesn'f find 
the pid file.
You should echo the $pid variable and see if you hava a correct pid file there.
Ulul

Le 02/03/2015 13:53, Daniel Klinger a écrit :

Thanks for your help. But unfortunatly this didn’t do the job. Here’s the 
Shellscript I’ve written to start my cluster (the scripts on the other node 
only contains the command to start the datanode respectively the command to 
start the Nodemanager on the other node (with the right user (hdfs / yarn)):

 

 

#!/bin/bash

# Start 
HDFS-

# Start Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs start namenode"

wait

 

# Start all Datanodes

export HADOOP_SECURE_DN_USER=hdfs

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs start datanode"

wait

ssh root@hadoop-data.klinger.local   
'bash startDatanode.sh'

wait

 

# Start Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
start resourcemanager"

wait

 

# Start Nodemanager on all Nodes

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
start nodemanager"

wait

ssh root@hadoop-data.klinger.local   
'bash startNodemanager.sh'

wait

 

# Start Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver --config 
$HADOOP_CONF_DIR"

#wait

 

# Start Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver 
--config $HADOOP_CONF_DIR"

wait

 

This script generates the following output:

 

starting namenode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out

starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out

starting datanode, logging to 
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out

starting resourcemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out

starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out

starting nodemanager, logging to 
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out

starting historyserver, logging to 
/var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out

 

Following my stopscript and it’s output:

 

#!/bin/bash

# Stop 
HDFS

# Stop Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs stop namenode"

 

# Stop all Datanodes

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs stop datanode"

ssh root@hadoop-data.klinger.local   
'bash stopDatanode.sh'

 

# Stop Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
stop resourcemanager"

 

#Stop Nodemanager on all Hosts

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR 
stop nodemanager"

ssh root@hadoop-data.klinger.local   
'bash stopNodemanager.sh'

 

#Stop Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver --config 
$HADOOP_CONF_DIR"

 

#Stop Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver 
--config $HADOOP_CONF_DIR"

 

stopping namenode

no datanode to stop

no datanode to stop

stopping resourcemanager

stopping nodemanager

stopping nodemanager

nodemanager did not stop gracefully after 5 seconds: killing with kill -9

stopping historyserver

 

Is there may be anything wrong with my commands?

 

Greets

DK

 

Von: Varun Kumar [mailto:varun@gmail.com] 
Gesendet: Montag, 2. März 2015 05:28
An: user
Betreff: Re: 

Re: Data locality

2015-03-02 Thread Demai Ni
hi, folks,

I have the similar question. Is there an easy way to tell(from a user
perspective) whether short circuit is enabled? thanks

Demai

On Mon, Mar 2, 2015 at 11:46 AM, Fei Hu  wrote:

> Hi All,
>
> I developed a scheduler for data locality. Now I want to test the
> performance of the scheduler, so I need to monitor how many data are read
> remotely. Is there any tool for monitoring the volume of data moved around
> the cluster?
>
> Thanks,
> Fei


Push or pull in yarn

2015-03-02 Thread Павел Мезенцев
Hi All!


Resently I read an article about facebook's corona
https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

they solved in mr1 pull based task assignment:
task trackers send heartbeat to jobtracker and recive new tasks  in
respond.
This approach wasted 10-20 seconds for each tasks.

My question is about YARN: was this trouble solved in YARN or not?

Best regards,
Mezentsev Pavel


Re: Copy data from local disc with WebHDFS?

2015-03-02 Thread xeon Mailinglist
1. I am using these 2 commands below to try to copy data from local disk to
HDFS. Unfortunately these commands are not working, and I don't understand
why they are not working. I have configured HDFS to use WebHDFS
protocol. How I copy data from the local disk to HDFS using WebHDfS
protocol?

xubuntu@hadoop-coc-1:~/Programs/hadoop$ *hdfs dfs -copyFromLocal ~/input1
webhdfs://192.168.56.101:8080/  *
Java HotSpot(TM) Client VM warning: You have loaded library
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might
have disabled stack guard. The VM will try to fix the stack guard now. It's
highly recommended that you fix the library with 'execstack -c ',
or link it with '-z noexecstack'.
 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000
failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000
failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

 xubuntu@hadoop-coc-1:~/Programs/hadoop$ *curl -i -X PUT -T ~/input1
"http://192.168.56.101:8080/?op=CREATE";
*
HTTP/1.1 100 Continue HTTP/1.1 405 HTTP method PUT is not supported by this
URL Date:
 Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache Date:
Mon, 02 Mar 2015 16:50:36 GMT
Pragma: no-cache
Content-Length: 0
Server: Jetty(6.1.26)


2. Every time I launch a command in YARN I get a java hotspot warning
(warning below). How I remove the Java HotSpotWarning?

xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal
~/input1 webhdfs://192.168.56.101:8080/
Java  HotSpot(TM) Client VM warning: You have loaded library
/home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which
might  have disabled stack guard. The VM will try to fix the stack
guard now.
It's highly recommended that you fix the library with 'execstack -c
', or link it with '-z noexecstack'.

Thanks,

On Monday, March 2, 2015, xeonmailinglist  wrote:

>  Hi,
>
> 1 - I have HDFS running with WebHDFS protocol. I want to copy data from
> local disk to HDFS, but I get the error below. How I copy data from the
> local disk to HDFS?
>
> xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 
> webhdfs://192.168.56.101:8080/
> Java HotSpot(TM) Client VM warning: You have loaded library 
> /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have 
> disabled stack guard. The VM will try to fix the stack guard now.
> It's highly recommended that you fix the library with 'execstack -c 
> ', or link it with '-z noexecstack'.
> 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>
>
> xubuntu@hadoop-coc-1:~/Programs/hadoop$ curl -i -X PUT -T ~/input1 
> "http://192.168.56.101:8080/?op=CREATE"; 
> 
> HTTP/1.1 100 Continue
>
> HTTP/1.1 405 HTTP method PUT is not supported by this URL
> Date: Mon, 02 Mar 2015 16:50:36 GMT
> Pragma: no-cache
> Date: Mon, 02 Mar 2015 16:50:36 GMT
> Pragma: no-cache
> Content-Length: 0
> Server: Jetty(6.1.26)
>
> $ netstat -plnet
> tcp0  0 192.168.56.101:8080 0.0.0.0:*   LISTEN
>   1000   587397  8229/java
> tcp0  0 0.0.0.0:43690.0.0.0:*   LISTEN
>   1158049-
> tcp0  0 127.0.0.1:530.0.0.0:*   LISTEN
>   0  8336-
> tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN
>   0  7102-
> tcp0  0 127.0.0.1:631   0.0.0.0:*   LISTEN
>   0  104794  -
> tcp0  0 0.0.0.0:50010   0.0.0.0:*   LISTEN
>   1000   588404  8464/java
> tcp0  0 0.0.0.0:50075   0.0.0.0:*   LISTEN
>   1000   589155  8464/java
> tcp0  0 0.0.0.0:50020   0.0.0.0:*   LISTEN
>   1000   589169  8464/java
> tcp0  0 192.168.56.101:6600 0.0.0.0:*   LISTEN
>   1000   587403  8229

Re: how to check hdfs

2015-03-02 Thread Vikas Parashar
Hi,

Kindly install hadoop-hdfs rpm in your machine..

Rg:
Vicky

On Mon, Mar 2, 2015 at 11:19 PM, Shengdi Jin  wrote:

> Hi all,
> I just start to learn hadoop, I have a naive question
>
> I used
> hdfs dfs -ls /home/cluster
> to check the content inside.
> But I get error
> ls: No FileSystem for scheme: hdfs
>
> My configuration file core-site.xml is like
> 
> 
>   fs.defaultFS
>   hdfs://master:9000
> 
> 
>
> hdfs-site.xml is like
> 
> 
>dfs.replication
>2
> 
> 
>dfs.name.dir
>file:/home/cluster/mydata/hdfs/namenode
> 
> 
>dfs.data.dir
>file:/home/cluster/mydata/hdfs/datanode
> 
> 
>
> is there any thing wrong ?
>
> Thanks a lot.
>