RE: Hadoop 2.6.0 Error
Hello Anand, Set your Java home in hadoop-env.sh - /usr/local/hadoop/etc/hadoop/hadoop-env.sh export JAVA_HOME='/usr/lib/jvm/java-7-openjdk-amd64' It would resolve your error. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243 From: Alexandru Pacurar [mailto:alexandru.pacu...@propertyshark.com] Sent: Wednesday, March 25, 2015 3:17 PM To: user@hadoop.apache.org Subject: RE: Hadoop 2.6.0 Error Hello, I had a similar problem and my solution to this was setting JAVA_HOME in /etc/environment. The problem is, from what I remember, that the start-dfs.sh script calls hadoop-daemons.sh with the necessary options to start the Hadoop daemons. hadoop-daemons.sh in turn calls hadoop-daemon.sh with the necessary options via ssh, in an non-interactive fashion. When you are executing a command via ssh in a non-interactive manner (ex. ssh host1 ‘ls -la’ ) you have a minimal environment and you do not source the .profile file, and other environment related files. But the /etc/environment is sourced so you could set JAVA_HOME there. Technically you should set BASH_ENV there which should point to a file containing the environment variables you need. For more info see http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man, or man bash Thank you, Alex From: Olivier Renault [ mailto:orena...@hortonworks.com mailto:orena...@hortonworks.com] Sent: Wednesday, March 25, 2015 10:44 AM To: mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali Subject: Re: Hadoop 2.6.0 Error It should be : export JAVA_HOME=… Olivier From: Brahma Reddy Battula Reply-To: mailto:user@hadoop.apache.org user@hadoop.apache.org Date: Wednesday, 25 March 2015 08:28 To: mailto:user@hadoop.apache.org user@hadoop.apache.org, Anand Murali Subject: RE: Hadoop 2.6.0 Error HI Ideally it should take effect , if you configure in .profile or hadoop-env.sh.. As you told that you set in .profile ( hope you did source ~/.profile ),,, did you verify that take effect..? ( by checking echo $JAVA_HOME,, or jps..etc )... Thanks Regards Brahma Reddy Battula _ From: Anand Murali [ mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com] Sent: Wednesday, March 25, 2015 1:30 PM To: mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali Subject: Re: Hadoop 2.6.0 Error Dear All: Even after setting JAVA_HOME in .profile I get JAVA_HOME is not set and could not be found -error. If anyone of you know of a more stable version please do let me know. Thanks, Anand Murali 11/7, 'Anand Vihar', Kandasamy St, Mylapore Chennai - 600 004, India Ph: (044)- 28474593/ 43526162 (voicemail) On Wednesday, March 25, 2015 12:57 PM, Anand Murali mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote: Dear Mr. Bhrama Reddy: Should I type SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 in root (profile) or at user level (.profile). Reply most welcome Thanks Regards Anand Murali 11/7, 'Anand Vihar', Kandasamy St, Mylapore Chennai - 600 004, India Ph: (044)- 28474593/ 43526162 (voicemail) On Wednesday, March 25, 2015 12:37 PM, Anand Murali mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote: Dear All: I get this error shall try setting JAVA_HOME in .profile Starting namenodes on [localhost] localhost: Error: JAVA_HOME is not set and could not be found. cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory Starting secondary namenodes [0.0.0.0] 0.0.0.0: Error: JAVA_HOME is not set and could not be found. mailto:anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$ anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$ Thanks Anand Murali 11/7, 'Anand Vihar', Kandasamy St, Mylapore Chennai - 600 004, India Ph: (044)- 28474593/ 43526162 (voicemail) On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula mailto:brahmareddy.batt...@huawei.com brahmareddy.batt...@huawei.com wrote: Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like putting in /etc/profile...) For more details please check the following jira. https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11538d=AwMGaQc=011sAuWmEGuAVOWaydfJseI4cXS0FNE1rSze05FbvBEr=ZAivIGqQeG1xHpYmaYPBcgf03sOn2v2J3r9hTy7sK4jmLRosJPeq02_uC8oa5Pa7m=C3HqXnaRYKppI_bCvbqjXzXa6SZIULL08MPZDtxtGNMs=7JTHoL-82HcBMB5Gdgtjowf5Dv96ZX8GY9TEEzMoIsUe= https://issues.apache.org/jira/browse/HADOOP-11538 Thanks Regards Brahma Reddy Battula _ From: Anand Murali [
RE: Intermittent BindException during long MR jobs
Hello Krishna, Exception seems to be IP specific. It might be occurred due to unavailability of IP address in the system to assign. Double check the IP address availability and run the job. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243 From: Krishna Rao [mailto:krishnanj...@gmail.com] Sent: Thursday, February 26, 2015 9:48 PM To: u...@hive.apache.org; user@hadoop.apache.org Subject: Intermittent BindException during long MR jobs Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0 http://10.4.2.10:0 ] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0 http://10.4.2.10:0 ] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at
RE: Hadoop - HTTPS communication between nodes - How to Confirm ?
Thanks for the help Ulul/Daemeon, Problem has sorted using lsof p datanode pid command. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com Email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243 From: Ulul [mailto:had...@ulul.org] Sent: Sunday, February 22, 2015 5:23 AM To: user@hadoop.apache.org Subject: Re: Hadoop - HTTPS communication between nodes - How to Confirm ? Hi Be careful, HTTPS is to secure WebHDFS. If you want to protect all network streams you need more than that : https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk _reference/content/reference_chap-wire-encryption.html https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_ reference/content/reference_chap-wire-encryption.html If you're just interested in HTTPS an lsof -p datanode pid | grep TCP will show you that DN listening on 50075 for HTTP, 50475 for HTTPS. For namenode that would be 50070 and 50470 Ulul Le 21/02/2015 19:53, hadoop.supp...@visolve.com mailto:hadoop.supp...@visolve.com a écrit : Hello Everyone, We are trying to measure performance between HTTP and HTTPS version on Hadoop DFS, Mapreduce and other related modules. As of now, we have tested using several metrics on Hadoop HTTP Mode. Similiarly we are trying to test the same metrics on HTTPS Platform. Basically our test suite cluster consists of one Master Node and two Slave Nodes. We have configured HTTPS connection and now we need to verify whether Nodes are communicating directly through HTTPS. Tried checking logs, clusters webhdfs ui, health check information, dfs admin report but of no help. Since there is only limited documentation available in HTTPS, we are unable to verify whether Nodes are communicating through HTTPS. Hence any experts around here can shed some light on how to confirm HTTPS communication status between nodes (might be with mapreduce/DFS). Note: If I have missed any information, feel free to check with me for the same. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243
RE: Copy data between clusters during the job execution.
It seems in your first error message, you have missed the source directory argument by a bit. One common usage of distcp is : Distcp (solution to your problem) hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1 It is also wise to use latest tool: distcp2 hadoop distcp hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/some1 Where hdfs://hadoop-coc-2:50070/some1 represents source directory on another node. Optional: If you need, you can provide multiple directories using “\” option the command Thanks and Regards, S.RagavendraGanesh Hadoop Support Team ViSolve Inc.| http://www.visolve.com/ www.visolve.com From: dbis...@gmail.com [mailto:dbis...@gmail.com] On Behalf Of Artem Ervits Sent: Tuesday, February 03, 2015 6:49 AM To: user@hadoop.apache.org Subject: Re: Copy data between clusters during the job execution. take a look at oozie, once first job completes you can distcp to another server. Artem Ervits On Feb 2, 2015 5:46 AM, Daniel Haviv danielru...@gmail.com mailto:danielru...@gmail.com wrote: It should run after your job finishes. You can create the flow using a simple bash script Daniel On 2 בפבר׳ 2015, at 12:31, xeonmailinglist xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote: But can I use discp inside my job, or I need to program something that executes distcp after executing my job? On 02-02-2015 10:20, Daniel Haviv wrote: an use distcp Daniel On 2 בפבר׳ 2015, at 11:12,
RE: Multiple separate Hadoop clusters on same physical machines
Hello Harun, Your question is very interesting and will be useful for future Hadoop setups for startup/individuals too. Normally for testing purposes, we prefer you to use pseudo-distributed environments (i.e. installation of all cluster files in single node). You can refer few links which will guide you through the whole process below for reference: https://districtdatalabs.silvrback.com/creating-a-hadoop-pseudo-distributed-environment Individual Pseudo Distributed Cluster Implementation: http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/ http://hbase.apache.org/book.html#quickstart_pseudo and please check for others. From our 20 years of Server its related Industrial experience, we recommend you to use VM/Instances for production Business Critical environment. Other way around, if you are developing some products related to Hadoop, you can use docker other related resources for development. As shipment to production will become stress free with the use of these tools with cluster environment setup. Feel free to ask for further queries. Thanks and Regards, S.RagavendraGanesh Hadoop Support Team ViSolve Inc.| http://www.visolve.com www.visolve.com From: Alexander Pivovarov [mailto:apivova...@gmail.com] Sent: Monday, February 02, 2015 12:56 PM To: user@hadoop.apache.org Subject: Re: Multiple separate Hadoop clusters on same physical machines start several vms and install hadoop on each vm keywords: kvm, QEMU On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer harun.za...@tubitak.gov.tr mailto:harun.za...@tubitak.gov.tr wrote: Hi everyone, We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, pig, hive etc.) on 7 physical servers. We want to test Hadoop (maybe different versions) and ecosystem on physical machines (virtualization is not an option) from different perspectives. As a bunch of developer we would like to work in parallel. We want every team member play with his/her own cluster. However we have limited amount of servers (strong machines though). So the question is, by changing port numbers, environment variables and other configuration parameters, is it possible to setup several independent clusters on same physical machines. Is there any constraints? What are the possible difficulties we are to face? Thanks in advance -- Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 tel:%2B90%20262%20675%203268 W http://www.hrzafer.com
RE: About HDFS's single-writer, multiple-reader model, any use case?
Hello Dongzhe Ma, Yes HDFS employs Single writer, multiple reader model. This means : WRITE • HDFS client maintains a lease on files it opened for write (for entire file not for block) • Only one client can hold a lease on a single file • For each block of data, setup a pipeline of Data Nodes to write to. • A file written cannot be modified, but can be appended • Client periodically renews the lease by sending heartbeats to the NameNode • Lease Timeout/Expiration: • Soft Limit: exclusive access to file, can extend lease • Until soft limit expires client has exclusive access to the file • After Soft limit, any client can claim the lease • Hard Limit: 1 hour - continue to have access unless some other client pre-empts it. • Also after hard limit, the file is closed. READ • Get list of Data Nodes from Name Node in topological sorted order. Then read data directly from Data Nodes. • During read, the checksum is validated and if found different, it is reported to Name Node which marks it for deletion. • On error while reading a block, next replica from the pipeline is used to read it. You can refer below link for creating test cases . http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ Hope this helps!!! Thanks and Regards, S.RagavendraGanesh Hadoop Support Team ViSolve Inc.| http://www.visolve.com www.visolve.com From: Dongzhe Ma [mailto:mdzfi...@gmail.com] Sent: Monday, February 02, 2015 10:50 AM To: user@hadoop.apache.org Subject: About HDFS's single-writer, multiple-reader model, any use case? We know that HDFS employs a single-writer, multiple-reader model, which means that there could be only one process writing to a file at the same time, but multiple readers can also work in parallel and new readers can even observe the new content. The reason for this design is to simplify concurrency control. But, is it necessary to support reading during writing? Can anyone bring up some use cases? Why not just lock the whole file like other posix file systems (in terms of locking granularity)?
RE: Enable symlinks
Hello Tang, There are few resolved issues which might help you with this https://issues.apache.org/jira/browse/HDFS-6629 https://issues.apache.org/jira/browse/HDFS-245 Hope it helps!!! Thanks and Regards, S.RagavendraGanesh Hadoop Support Team ViSolve Inc.| http://www.visolve.com www.visolve.com From: Rich Haase [mailto:rha...@pandora.com] Sent: Thursday, January 22, 2015 9:22 PM To: user@hadoop.apache.org Subject: Re: Enable symlinks Hadoop does not currently support symlinks. Hence the Symlinks not supported exception message. You can follow progress on making symlinks production ready via this JIRA: https://issues.apache.org/jira/browse/HADOOP-10019 https://issues.apache.org/jira/browse/HADOOP-10019 Cheers, Rich From: Tang mailto:shawndow...@gmail.com shawndow...@gmail.com Reply-To: mailto:user@hadoop.apache.org user@hadoop.apache.org mailto:user@hadoop.apache.org user@hadoop.apache.org Date: Wednesday, January 21, 2015 at 10:04 PM To: mailto:user@hadoop.apache.org user@hadoop.apache.org mailto:user@hadoop.apache.org user@hadoop.apache.org Subject: Enable symlinks org.apache.hadoop.ipc.RemoteException: Symlinks not supported
Re: Hadoop testing with 72GB HDD?
Yes Amjad, Virtualization is nothing but a layer where you can different instances of Operating systems running on same machine. I don’t see any difference in using real Hardware with Virtualization. One advantage with Virtualization is you can have multiple instances of OS on Same machine and also you can use for various purposes. Where as in dedicated hardware we have this limited to 2 nodes (in case of 2 machines used). Thanks and Regards, S.RagavendraGanesh Hadoop Support Team ViSolve Inc.| http://www.visolve.com www.visolve.com From: Amjad Syed [mailto:amjad...@gmail.com] Sent: Monday, January 05, 2015 11:30 AM To: user@hadoop.apache.org mailto:user@hadoop.apache.org Subject: Re: Hadoop testing with 72GB HDD? Thanks jeffrey, But if i want to do it using Centos 7.0 without virtualization on my machines, is it possible? On Mon, Jan 5, 2015 at 8:45 AM, Jeffrey Rodriguez jeffrey...@gmail.com mailto:jeffrey...@gmail.com wrote: Yes it can be done through virtualization providing you can setup VMWare in your hardware for example ( you may find other options for virtualizing). If you needed to do a proof of concept though I would advice to go with a cloud vendor like IBM Bluemix or Amazon EC. You may have a better performance experience using cloud instances thaat you can provision, use and discard. On Sun, Jan 4, 2015 at 9:29 PM, Amjad Syed amjad...@gmail.com mailto:amjad...@gmail.com wrote: Hello, My manager has asked me to do a proof of concept on old HP DL380 G6 Quadcore systems having 12GB RAM and 4*72GB SAS Drives. My question is it possible to make a 2 node hadoop cluster with this hardware? I have read that minimum HDD should be 1TB .