RE: Hadoop 2.6.0 Error

2015-03-25 Thread hadoop.support
Hello Anand,

 

Set your Java home in hadoop-env.sh - /usr/local/hadoop/etc/hadoop/hadoop-env.sh

 

export JAVA_HOME='/usr/lib/jvm/java-7-openjdk-amd64'

 

It would resolve your error.

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

email: servi...@visolve.com mailto:servi...@visolve.com  | Phone: 408-850-2243

 

 

From: Alexandru Pacurar [mailto:alexandru.pacu...@propertyshark.com] 
Sent: Wednesday, March 25, 2015 3:17 PM
To: user@hadoop.apache.org
Subject: RE: Hadoop 2.6.0 Error

 

Hello,

 

I had a similar problem and my solution to this was setting JAVA_HOME in 
/etc/environment.

 

The problem is, from what I remember, that the start-dfs.sh script calls 
hadoop-daemons.sh with the necessary options to start the Hadoop daemons. 
hadoop-daemons.sh in turn calls hadoop-daemon.sh with the necessary options via 
ssh, in an non-interactive fashion. When you are executing a command via ssh in 
a non-interactive manner (ex. ssh host1 ‘ls -la’ ) you have a minimal 
environment and you do not source the .profile file, and other environment 
related files. But the /etc/environment is sourced so you could set JAVA_HOME 
there. Technically you should set BASH_ENV there which should point to a file 
containing the environment variables you need.

 

For more info see  
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man
 
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man,
 or man bash

 

Thank you,

Alex

 

From: Olivier Renault [ mailto:orena...@hortonworks.com 
mailto:orena...@hortonworks.com] 
Sent: Wednesday, March 25, 2015 10:44 AM
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

 

It should be : 

export JAVA_HOME=…

 

Olivier

 

 

From: Brahma Reddy Battula
Reply-To:  mailto:user@hadoop.apache.org user@hadoop.apache.org
Date: Wednesday, 25 March 2015 08:28
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org, Anand Murali
Subject: RE: Hadoop 2.6.0 Error

 

HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...

 

Thanks  Regards 

Brahma Reddy Battula

 

 

  _  

From: Anand Murali [ mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

Dear All:

 

Even after setting JAVA_HOME in .profile I get

 

JAVA_HOME is not set and could not be found -error.

 

 

If anyone of you know of a more stable version please do let me know.

 

Thanks,

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:57 PM, Anand Murali  
mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote:

 

Dear Mr. Bhrama Reddy:

 

Should I type

 

SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

 

 

in root (profile) or at user level (.profile). Reply most welcome

 

Thanks

 

Regards

 

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:37 PM, Anand Murali  
mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote:

 

Dear All:

 

I get this error shall try setting JAVA_HOME in .profile

 

Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
 mailto:anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$ 
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$

 

Thanks

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula  
mailto:brahmareddy.batt...@huawei.com brahmareddy.batt...@huawei.com wrote:

 

Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11538d=AwMGaQc=011sAuWmEGuAVOWaydfJseI4cXS0FNE1rSze05FbvBEr=ZAivIGqQeG1xHpYmaYPBcgf03sOn2v2J3r9hTy7sK4jmLRosJPeq02_uC8oa5Pa7m=C3HqXnaRYKppI_bCvbqjXzXa6SZIULL08MPZDtxtGNMs=7JTHoL-82HcBMB5Gdgtjowf5Dv96ZX8GY9TEEzMoIsUe=
 https://issues.apache.org/jira/browse/HADOOP-11538

 

Thanks  Regards

 Brahma Reddy Battula

 

  _  

From: Anand Murali [ 

RE: Intermittent BindException during long MR jobs

2015-02-26 Thread hadoop.support
Hello Krishna,

 

Exception seems to be IP specific. It might be occurred due to unavailability 
of IP address in the system to assign. Double check the IP address availability 
and run the job. 

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

email: servi...@visolve.com mailto:servi...@visolve.com  | Phone: 408-850-2243

 

 

From: Krishna Rao [mailto:krishnanj...@gmail.com] 
Sent: Thursday, February 26, 2015 9:48 PM
To: u...@hive.apache.org; user@hadoop.apache.org
Subject: Intermittent BindException during long MR jobs

 

Hi,

 

we occasionally run into a BindException causing long running jobs to 
occasionally fail.

 

The stacktrace is below.

 

Any ideas what this could be caused by?

 

Cheers,

 

Krishna

 

 

Stacktrace:

379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job Submission 
failed with exception 'java.net.BindException(Problem binding to 
[back10/10.4.2.10:0 http://10.4.2.10:0 ] java.net.BindException: Cann

ot assign requested address; For more details see:  
http://wiki.apache.org/hadoop/BindException)'

java.net.BindException: Problem binding to [back10/10.4.2.10:0 
http://10.4.2.10:0 ] java.net.BindException: Cannot assign requested address; 
For more details see:  http://wiki.apache.org/hadoop/BindException

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)

at org.apache.hadoop.ipc.Client.call(Client.java:1242)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

at com.sun.proxy.$Proxy10.create(Unknown Source)

at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)

at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

at com.sun.proxy.$Proxy11.create(Unknown Source)

at 
org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376)

at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)

at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)

at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)

at 
org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)

at 
org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)

at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)

at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)

at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)

at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)

at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)

at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)

at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)

at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)

at 

RE: Hadoop - HTTPS communication between nodes - How to Confirm ?

2015-02-22 Thread hadoop.support
Thanks for the help Ulul/Daemeon,

 

Problem has sorted using lsof –p datanode pid command. 

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

Email: servi...@visolve.com mailto:servi...@visolve.com  | Phone:
408-850-2243

 

From: Ulul [mailto:had...@ulul.org] 
Sent: Sunday, February 22, 2015 5:23 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop - HTTPS communication between nodes - How to Confirm ?

 

Hi

Be careful, HTTPS is to secure WebHDFS. If you want to protect all network
streams you need more than that :
 
https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk
_reference/content/reference_chap-wire-encryption.html
https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_
reference/content/reference_chap-wire-encryption.html

If you're just interested in HTTPS an lsof -p datanode pid | grep TCP will
show you that DN listening on 50075 for HTTP, 50475 for HTTPS. For namenode
that would be 50070 and 50470

Ulul
 

Le 21/02/2015 19:53, hadoop.supp...@visolve.com
mailto:hadoop.supp...@visolve.com  a écrit :

Hello Everyone,

 

We are trying to measure performance between HTTP and HTTPS version on
Hadoop DFS, Mapreduce and other related modules.

 

As of now, we have tested using several metrics on Hadoop HTTP Mode.
Similiarly we are trying to test the same metrics on HTTPS Platform.
Basically our test suite cluster consists of one Master Node and two Slave
Nodes. 

We have configured HTTPS connection and now we need to verify whether Nodes
are communicating directly through HTTPS. Tried checking logs, clusters
webhdfs ui, health check information, dfs admin report but of no help. Since
there is only limited documentation available in HTTPS, we are unable to
verify whether Nodes are communicating through HTTPS. 

 

Hence any experts around here can shed some light on how to confirm HTTPS
communication status between nodes (might be with mapreduce/DFS).

 

Note: If I have missed any information, feel free to check with me for the
same. 

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

email: servi...@visolve.com mailto:servi...@visolve.com  | Phone:
408-850-2243

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 



RE: Copy data between clusters during the job execution.

2015-02-02 Thread hadoop.support
It seems in your first error message, you have missed the source directory 
argument by a bit. One common usage of distcp is :

 

Distcp (solution to your problem)

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

It is also wise to use latest tool: 

distcp2

hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1

 

Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another 
node. 

 

Optional:

If you need, you can provide multiple directories using “\”  option the command

 

Thanks and Regards, 
S.RagavendraGanesh 
Hadoop Support Team 
ViSolve Inc.| http://www.visolve.com/ www.visolve.com

 

 

From: dbis...@gmail.com [mailto:dbis...@gmail.com] On Behalf Of Artem Ervits
Sent: Tuesday, February 03, 2015 6:49 AM
To: user@hadoop.apache.org
Subject: Re: Copy data between clusters during the job execution.

 

take a look at oozie, once first job completes you can distcp to another server.

Artem Ervits

On Feb 2, 2015 5:46 AM, Daniel Haviv danielru...@gmail.com 
mailto:danielru...@gmail.com  wrote:

It should run after your job finishes.

You can create the flow using a simple bash script

Daniel


On 2 בפבר׳ 2015, at 12:31, xeonmailinglist xeonmailingl...@gmail.com 
mailto:xeonmailingl...@gmail.com  wrote:

But can I use discp inside my job, or I need to program something that executes 
distcp after executing my job?



On 02-02-2015 10:20, Daniel Haviv wrote:

an use distcp

 

Daniel

 

On 2 בפבר׳ 2015, at 11:12, 

 



RE: Multiple separate Hadoop clusters on same physical machines

2015-02-02 Thread hadoop.support
Hello Harun,

 

Your question is very interesting and will be useful for future Hadoop setups 
for startup/individuals too.

 

Normally for testing purposes, we prefer you to use pseudo-distributed 
environments (i.e. installation of all cluster files in single node). You can 
refer few links which will guide you through the whole process below for 
reference:

 

https://districtdatalabs.silvrback.com/creating-a-hadoop-pseudo-distributed-environment

 

Individual Pseudo Distributed Cluster Implementation:

 

http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/

http://hbase.apache.org/book.html#quickstart_pseudo

and please check for others.

 

From our 20 years of Server  its related Industrial experience, we recommend 
you to use VM/Instances for production  Business Critical environment. Other 
way around, if you are developing some products related to Hadoop, you can use 
docker  other related resources for development. As shipment to production 
will become stress free with the use of these tools with cluster environment 
setup. 

 

Feel free to ask for further queries.

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| http://www.visolve.com www.visolve.com

 

 

 

From: Alexander Pivovarov [mailto:apivova...@gmail.com] 
Sent: Monday, February 02, 2015 12:56 PM
To: user@hadoop.apache.org
Subject: Re: Multiple separate Hadoop clusters on same physical machines

 

start several vms and install hadoop on each vm

keywords: kvm, QEMU

 

On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer harun.za...@tubitak.gov.tr 
mailto:harun.za...@tubitak.gov.tr  wrote:

Hi everyone,

We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, pig, 
hive etc.) on 7 physical servers. We want to test Hadoop (maybe different 
versions) and ecosystem on physical machines (virtualization is not an option) 
from different perspectives.

As a bunch of developer we would like to work in parallel. We want every team 
member play with his/her own cluster. However we have limited amount of servers 
(strong machines though).

So the question is, by changing port numbers, environment variables and other 
configuration parameters, is it possible to setup several independent clusters 
on same physical machines. Is there any constraints? What are the possible 
difficulties we are to face?

Thanks in advance

-- 
Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268 tel:%2B90%20262%20675%203268 
W  http://www.hrzafer.com

 



RE: About HDFS's single-writer, multiple-reader model, any use case?

2015-02-01 Thread hadoop.support
Hello Dongzhe Ma,

 

Yes HDFS employs Single writer, multiple reader model. This means :

 

WRITE

•   HDFS client maintains a lease on files it opened for write (for entire 
file not for block)

•   Only one client can hold a lease on a single file

•   For each block of data, setup a pipeline of Data Nodes to write to.

•   A file written cannot be modified, but can be appended

•   Client periodically renews the lease by sending heartbeats to the 
NameNode

•   Lease Timeout/Expiration:

•   Soft Limit: exclusive access to file, can extend lease

•   Until soft limit expires client has exclusive access to the file

•   After Soft limit, any client can claim the lease

•   Hard Limit: 1 hour - continue to have access unless some other client 
pre-empts it. 

•   Also after hard limit, the file is closed.

READ

•   Get list of Data Nodes from Name Node in topological sorted order. Then 
read data directly from Data Nodes.

•   During read, the checksum is validated and if found different, it is 
reported to Name Node which marks it for deletion.

•   On error while reading a block, next replica from the pipeline is used 
to read it.

 

You can refer below link for creating test cases .  

 

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

 

Hope this helps!!!

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| http://www.visolve.com www.visolve.com

 

 

 

From: Dongzhe Ma [mailto:mdzfi...@gmail.com] 
Sent: Monday, February 02, 2015 10:50 AM
To: user@hadoop.apache.org
Subject: About HDFS's single-writer, multiple-reader model, any use case?

 

We know that HDFS employs a single-writer, multiple-reader model, which means 
that there could be only one process writing to a file at the same time, but 
multiple readers can also work in parallel and new readers can even observe the 
new content. The reason for this design is to simplify concurrency control. 
But, is it necessary to support reading during writing? Can anyone bring up 
some use cases? Why not just lock the whole file like other posix file systems 
(in terms of locking granularity)?



RE: Enable symlinks

2015-01-22 Thread hadoop.support
Hello Tang,

 

There are few resolved issues which might help you with this

 

https://issues.apache.org/jira/browse/HDFS-6629

https://issues.apache.org/jira/browse/HDFS-245

 

Hope it helps!!!

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| http://www.visolve.com www.visolve.com

 

From: Rich Haase [mailto:rha...@pandora.com] 
Sent: Thursday, January 22, 2015 9:22 PM
To: user@hadoop.apache.org
Subject: Re: Enable symlinks 

 

Hadoop does not currently support symlinks.  Hence the Symlinks not
supported exception message.  

 

You can follow progress on making symlinks production ready via this JIRA:
https://issues.apache.org/jira/browse/HADOOP-10019
https://issues.apache.org/jira/browse/HADOOP-10019

 

Cheers,

 

Rich

 

From: Tang  mailto:shawndow...@gmail.com shawndow...@gmail.com
Reply-To:  mailto:user@hadoop.apache.org user@hadoop.apache.org 
mailto:user@hadoop.apache.org user@hadoop.apache.org
Date: Wednesday, January 21, 2015 at 10:04 PM
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org 
mailto:user@hadoop.apache.org user@hadoop.apache.org
Subject: Enable symlinks 

 

org.apache.hadoop.ipc.RemoteException: Symlinks not supported







Re: Hadoop testing with 72GB HDD?

2015-01-04 Thread hadoop.support
Yes Amjad,

 

Virtualization is nothing but a layer where you can different instances of 
Operating systems running on same machine. I don’t see any difference in using 
real Hardware with Virtualization. One advantage with Virtualization is you can 
have multiple instances of  OS on Same machine and also you can use for various 
purposes. Where as in dedicated hardware we have this limited to 2 nodes (in 
case of 2 machines used).

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| http://www.visolve.com www.visolve.com

 

From: Amjad Syed [mailto:amjad...@gmail.com] 
Sent: Monday, January 05, 2015 11:30 AM
To: user@hadoop.apache.org mailto:user@hadoop.apache.org 
Subject: Re: Hadoop testing with 72GB HDD?

 

Thanks jeffrey, But if i want to do it using Centos 7.0 without virtualization 
on my machines, is it possible?

 

On Mon, Jan 5, 2015 at 8:45 AM, Jeffrey Rodriguez jeffrey...@gmail.com 
mailto:jeffrey...@gmail.com  wrote:

Yes it can be done through virtualization providing you can setup VMWare in 
your hardware for example ( you may find other options for virtualizing). If 
you needed to do a proof of concept though I would advice to go with a cloud 
vendor like IBM Bluemix or Amazon EC.  You may have a better performance 
experience using cloud instances thaat you can provision, use and discard. 

 

On Sun, Jan 4, 2015 at 9:29 PM, Amjad Syed amjad...@gmail.com 
mailto:amjad...@gmail.com  wrote:

Hello,

 

My manager has asked me to do a proof of concept on old HP DL380 G6 Quadcore 
systems having 12GB RAM and 4*72GB SAS Drives.

 

My question is it  possible to make a 2 node hadoop cluster with this hardware?

 

I have read that minimum HDD should be 1TB .