cp command in webhdfs (and Filesystem Java Object)

2016-06-28 Thread Jérôme BAROTIN
Hello,

I'm writing this email, because, I spent one hour to look for a cp command
in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm
using this command), but, I can't find it on the webhdfs REST API. I
thought that webhdfs is an implementation of the Filesystem object (
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html).
I checked at the Java API and I haven't found any cp command. The only java
cp command is on the FileUtil Object (
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html)
and I'm not sure that it work identicaly than "hdfs dfs -cp" command.

I also checked at the Hadoop JIRA, and I found nothing :
https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme


Re: Setting up secure Multi-Node cluster

2016-06-28 Thread Aneela Saleem
Thanks Rakesh.

On Tue, Jun 28, 2016 at 8:28 AM, Rakesh Radhakrishnan 
wrote:

> Hi Aneela,
>
> IIUC, Namenode, Datanode is using _HOST pattern in their principal and
> needs to create separate principal for NN and DN if running in different
> machines. I hope the below explanation will help you.
>
> "dfs.namenode.kerberos.principal" is typically set to nn/_HOST@REALM.
> Each Namenode will substitute the _HOST with its own fully qualified
> hostname at startup.The _HOST placeholder allows using the same
> configuration setting on both Active and Standby NameNodes in an HA setup
>
> Similarly "dfs.datanode.kerberos.principal" will set to dn/_HOST@REALM.
> DataNode will substitute _HOST with its own fully qualified hostname at
> startup. The _HOST placeholder allows using the same configuration setting
> on all DataNodes.
>
> Again, if you are using HA setup with QJM,
> "dfs.journalnode.kerberos.principal" will set to jn/_HOST@REALM
>
> >Do i need to copy all the kerberos configuration files like kdc.conf
> and krb5.conf etc on every node in default locations?
> Yes, you need to place these in appropriate paths in all the machines.
>
> Regards,
> Rakesh
>
> On Tue, Jun 28, 2016 at 3:15 AM, Aneela Saleem 
> wrote:
>
>> Hi all,
>>
>> I have configured Kerberos for single node cluster successfully. I used
>> this
>> 
>>  documentation
>> for configurations. Now i'm enabling security for multi node cluster and i
>> have some confusions about that. Like
>>
>> How principals would be managed for namenode and data node? because till
>> now i had only one principal *hdfs/_HOST@platalyticsrealm *used for both
>> namenode as well as for datanode? Do i need to add separate principals for
>> both namenode and datanode having different hostname? for example:
>> if my namenode hostname is *hadoop-master* then there should be
>> principal added *nn/hadoop-master@platalyticsrealm *(with appropriate
>> keytab file)
>> if my datanode hostname is *hadoop-slave *then there should be principal
>> added *dn/hadoop-slave@platalyticsrealm* (with appropriate keytab file)
>>
>> Do i need to copy all the kerberos configuration files like kdc.conf and
>> krb5.conf etc on every node in default locations?
>>
>> A little guidance would be highly appreciated. Thanks
>>
>
>


datanode is unable to connect to namenode

2016-06-28 Thread Aneela Saleem
Hi all,

I have setup two nodes cluster with security enabled. I have everything
running successful like namenode, datanode, resourcemanager, nodemanager,
jobhistoryserver etc. But datanode is unable to connect to namenode, as i
can see only one node on the web UI. checking logs of datanode gives
following warning:

*WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting
to server: hadoop-master/192.168.23.206:8020 *

Rest of the things look fine. Please help me in this regard, what could be
the issue?


Re: datanode is unable to connect to namenode

2016-06-28 Thread sreebalineni .
Are you able to telnet ping. Check the firewalls as well
On Jun 29, 2016 12:39 AM, "Aneela Saleem"  wrote:

> Hi all,
>
> I have setup two nodes cluster with security enabled. I have everything
> running successful like namenode, datanode, resourcemanager, nodemanager,
> jobhistoryserver etc. But datanode is unable to connect to namenode, as i
> can see only one node on the web UI. checking logs of datanode gives
> following warning:
>
> *WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting
> to server: hadoop-master/192.168.23.206:8020 *
>
> Rest of the things look fine. Please help me in this regard, what could be
> the issue?
>


Re: datanode is unable to connect to namenode

2016-06-28 Thread Aneela Saleem
Thanks Sreebalineni for the response.

This is the result of the *netstat -a | grep 8020* command

tcp0  0 hadoop-master:8020  *:* LISTEN
tcp0  0 hadoop-master:33356 hadoop-master:8020
 ESTABLISHED
tcp0  0 hadoop-master:8020  hadoop-master:33356
ESTABLISHED
tcp0  0 hadoop-master:55135 hadoop-master:8020
 TIME_WAIT

And this is my */etc/hosts* file

#127.0.0.1  localhost
#127.0.1.1  vm6-VirtualBox
192.168.23.206  hadoop-master platalytics.com vm6-VirtualBox
192.168.23.207  hadoop-slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


Can you please tell me what's wrong with above configuration and how can i
check whether it is firewall issue?

Thanks

On Wed, Jun 29, 2016 at 12:11 AM, sreebalineni . 
wrote:

> Are you able to telnet ping. Check the firewalls as well
> On Jun 29, 2016 12:39 AM, "Aneela Saleem"  wrote:
>
>> Hi all,
>>
>> I have setup two nodes cluster with security enabled. I have everything
>> running successful like namenode, datanode, resourcemanager, nodemanager,
>> jobhistoryserver etc. But datanode is unable to connect to namenode, as i
>> can see only one node on the web UI. checking logs of datanode gives
>> following warning:
>>
>> *WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting
>> to server: hadoop-master/192.168.23.206:8020 *
>>
>> Rest of the things look fine. Please help me in this regard, what could
>> be the issue?
>>
>


Avg time of map reduced but the result not become better

2016-06-28 Thread Fu, Yong
Hi guys,
I am running Terasort benchmark on CDH 5.4.3.  I have seen there spilt to 1840 
maps and I scheduled 70*20 map container/tasks via yarn, so I think there 
should have an improvement while scheduling less map tasks(approximately half 
of # maps, that means all maps should finished in two cycle), but the final 
result show performance has a bit degradation even I seen the avg time of map 
reduced from 40s to 30s. Why?


Re: datanode is unable to connect to namenode

2016-06-28 Thread Aneela Saleem
Following is the result of telnet

Trying 192.168.23.206...
Connected to hadoop-master.
Escape character is '^]'.

On Wed, Jun 29, 2016 at 3:57 AM, Aneela Saleem 
wrote:

> Thanks Sreebalineni for the response.
>
> This is the result of the *netstat -a | grep 8020* command
>
> tcp0  0 hadoop-master:8020  *:* LISTEN
> tcp0  0 hadoop-master:33356 hadoop-master:8020
>  ESTABLISHED
> tcp0  0 hadoop-master:8020  hadoop-master:33356
> ESTABLISHED
> tcp0  0 hadoop-master:55135 hadoop-master:8020
>  TIME_WAIT
>
> And this is my */etc/hosts* file
>
> #127.0.0.1  localhost
> #127.0.1.1  vm6-VirtualBox
> 192.168.23.206  hadoop-master platalytics.com vm6-VirtualBox
> 192.168.23.207  hadoop-slave
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>
>
> Can you please tell me what's wrong with above configuration and how can i
> check whether it is firewall issue?
>
> Thanks
>
> On Wed, Jun 29, 2016 at 12:11 AM, sreebalineni . 
> wrote:
>
>> Are you able to telnet ping. Check the firewalls as well
>> On Jun 29, 2016 12:39 AM, "Aneela Saleem"  wrote:
>>
>>> Hi all,
>>>
>>> I have setup two nodes cluster with security enabled. I have everything
>>> running successful like namenode, datanode, resourcemanager, nodemanager,
>>> jobhistoryserver etc. But datanode is unable to connect to namenode, as i
>>> can see only one node on the web UI. checking logs of datanode gives
>>> following warning:
>>>
>>> *WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem
>>> connecting to server: hadoop-master/192.168.23.206:8020
>>> *
>>>
>>> Rest of the things look fine. Please help me in this regard, what could
>>> be the issue?
>>>
>>
>


Re: unsubscribe

2016-06-28 Thread chandu banavaram
please unsubscribe me.

On Fri, Jun 24, 2016 at 12:41 PM, Anand Sharma 
wrote:

>
> --
> Thanks
> Anand
>


Re: unsubscribe

2016-06-28 Thread Anand Tigadikar
please unsubscribe me.

On Tue, Jun 28, 2016 at 10:13 PM, chandu banavaram <
chandu.banava...@gmail.com> wrote:

> please unsubscribe me.
>
> On Fri, Jun 24, 2016 at 12:41 PM, Anand Sharma 
> wrote:
>
>>
>> --
>> Thanks
>> Anand
>>
>
>


-- 
Cheers,
Anand


Re: cp command in webhdfs (and Filesystem Java Object)

2016-06-28 Thread Rohan Rajeevan
May be look at this?
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
If you are interested in intra cluster copy, may look at DistCp
?

On Tue, Jun 28, 2016 at 9:36 AM, Jérôme BAROTIN  wrote:

> Hello,
>
> I'm writing this email, because, I spent one hour to look for a cp command
> in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).
>
> This command is implemented in the "hdfs dfs" command line client (and I'm
> using this command), but, I can't find it on the webhdfs REST API. I
> thought that webhdfs is an implementation of the Filesystem object (
> https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html).
> I checked at the Java API and I haven't found any cp command. The only java
> cp command is on the FileUtil Object (
> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html)
> and I'm not sure that it work identicaly than "hdfs dfs -cp" command.
>
> I also checked at the Hadoop JIRA, and I found nothing :
> https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)
>
> is there a way to execute a cp command through a REST API ?
>
> All my best,
>
>
> Jérôme
>


Re: Manual Installation: CentOS 7 + SystemD + Unit Files + Hadoop at boot

2016-06-28 Thread Rohan Rajeevan
Can you post the namenode logs?

On Sat, Jun 25, 2016 at 9:32 AM, Juliano Atanazio 
wrote:

> Hi.
>
> I'm a novice in Hadoop.
> I'm trying to install Hadoop (single node) manually in CentOS 7 with
> SystemD to start Hadoop at boot.
> But, when I start the DFS service (systemctl start dfs), it starts and
> then seconds later dies...
> I have "googled" for days and nothing about SystemD...
> Below are the steps I have done (with the Java environment OpenJDK with
> pre installed):
>
> Excuse-me for my bad english :(
>
>
>
>
>
> =
> # yum erase NetworkManager{,-{libnm,tui,wifi}}
>
> # groupadd -r hdfs
>
> # useradd -r -g hdfs -d /usr/local/hdfs -s /bin/bash -k /etc/skel -c 'HDFS
> System User' -m hdfs
>
> # mkdir /usr/local/hdfs/data
>
> # yum install ssh rsync
>
> # cat << EOF >> ~hdfs/.bashrc
>
> export HADOOP_COMMON_HOME='/usr/local/hdfs/hadoop'
> export HADOOP_MAPRED_HOME="\${HADOOP_COMMON_HOME}"
> export HADOOP_HDFS_HOME="\${HADOOP_COMMON_HOME}"
> export YARN_HOME="\${HADOOP_COMMON_HOME}"
> export JAVA_HOME='/usr/local/openjdk'
> export JRE_HOME="${JAVA_HOME}/jre"
> export
> PATH="\${PATH}:\${HADOOP_COMMON_HOME}/bin:\${HADOOP_COMMON_HOME}/sbin:\${JAVA_HOME}/bin"
> EOF
>
> # chown -R hdfs: ~hdfs/
>
> # su - hdfs
>
> $ tar xf /usr/src/hadoop-2.7.2.tar.gz
>
> $ mv hadoop-2.7.2/ hadoop/
>
> $ rm -f ${HADOOP_COMMON_HOME}/{{,s}bin,etc/hadoop,libexec}/*.cmd
>
> $ cat << EOF > ${HADOOP_COMMON_HOME}/etc/hadoop/core-site.xml
> 
> 
> fs.defaultFS
> hdfs://0.0.0.0:9000
> NameNode URI
> 
> 
> EOF
>
>
> $ cat << EOF > ${HADOOP_COMMON_HOME}/etc/hadoop/yarn-site.xml
> 
> 
> yarn.resourcemanager.hostname
> hadoop
> The hostname of the ResourceManager
> 
>
> 
> yarn.nodemanager.aux-services
> mapreduce_shuffle
> shuffle service for MapReduce
> 
> 
> EOF
>
>
> $ cat << EOF > ${HADOOP_COMMON_HOME}/etc/hadoop/hdfs-site.xml
> 
> 
> dfs.datanode.data.dir
> file:///usr/local/hdfs/data/data
> DataNode directory for storing data
> chunks.
> 
>
> 
> dfs.namenode.name.dir
> file:///usr/local/hdfs/data/name
> NameNode directory for namespace and transaction logs
> storage.
> 
>
> 
> dfs.replication
> 3
> Number of replication for each chunk.
> 
>
> 
> dfs.webhdfs.enabled
> true
> Enable or disable webhdfs. Defaults to
> false
> 
>
> 
> EOF
>
> $ cat << EOF > ${HADOOP_COMMON_HOME}/etc/hadoop/mapred-site.xml
> 
> 
> mapreduce.framework.name
> yarn
> Execution framework.
> 
> 
> EOF
>
> $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
>
> $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
>
> $ chmod 0600 ~/.ssh/authorized_keys
>
> $ ssh localhost 'echo test'
> test
>
> $ hdfs namenode -format
>
> $ start-dfs.sh
>
> http://localhost:50070/
>
> $ hdfs dfs -mkdir -p /user/${USER}
>
>
> $ cat << EOF > /tmp/foo.txt
> linha 1
> linha 2
> linha 3
> EOF
>
> $ hdfs dfs -put /tmp/foo.txt /user/hdfs/
>
> $ hdfs dfs -cat /user/hdfs/foo.txt
>
> $ jps
>
>
> # cat << EOF > /lib/systemd/system/hadoop-namenode.service
> [Unit]
> Description=DFS
> After=syslog.target network.target
> DefaultDependencies=true
>
> [Service]
> Type=simple
> User=hdfs
> Group=hdfs
> Environment=YARN_HOME=/usr/local/hdfs/hadoop
> Environment=HADOOP_HDFS_HOME=/usr/local/hdfs/hadoop
> Environment=HADOOP_COMMON_HOME=/usr/local/hdfs/hadoop
> Environment=JAVA_HOME=/usr/local/openjdk
> Environment=HADOOP_MAPRED_HOME=/usr/local/hdfs/hadoop
> OOMScoreAdjust=-1000
> ExecStart=/usr/local/hdfs/hadoop/sbin/hadoop-daemon.sh start namenode
> ExecStop=/usr/local/hdfs/hadoop/sbin/hadoop-daemon.sh stop namenode
> TimeoutSec=300
> [Install]
> WantedBy=multi-user.target
> EOF
>
> systemctl enable dfs
>
> systemctl start dfs
>