Re: Hadoop Multi-node cluster on Windows

2014-06-05 Thread Wadood . Chaudhary
Thanks but that is almost a 5 year old tutorial, written when Hadoop was
not natively supported on Windows. As of Hadoop 2.x, Windows is supported
and setup is much simpler. No Cygwin required.



From:   Ted Yu yuzhih...@gmail.com
To: common-u...@hadoop.apache.org user@hadoop.apache.org,
Date:   06/04/2014 10:05 PM
Subject:Re: Hadoop Multi-node cluster on Windows



See also
http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html


On Wed, Jun 4, 2014 at 6:43 PM, wadood.chaudh...@instinet.com wrote:
  Thanks. Linux cluster requires SSH etc. for every thing including single
  node. In Windows until now I have been able to setup to pseudo
  distributed level without SSH or CygWin. Someone was saying that some
  Windows service needs to be installed for communication across machines.
  Is that true?








  - Message from Olivier Renault on 06/04/2014 09:36:17 PM -


  
 To:  user@hadoop.apache.org  
  
 Subject: RE: Hadoop Multi-node   
  cluster on Windows  
  





  HDP run natively on Linux and Windows. It's the same code base. You can
  install a windows cluster in the same way as you do for Linux.


  Olivier


  On 4 Jun 2014 18:23, wadood.chaudh...@instinet.com wrote:
   Yes, but Horton runs Hadoop under Linux Virtual Machine. I am talking of
   pure and simple Hadoop.








   - Message from Ted Yu on 06/04/2014 09:18:51 PM -



 To:   
  common-u...@hadoop.apache.org 
   

 Subject: Re: Hadoop Multi-node cluster 
  on Windows




   Have you seen this ?

   
http://hortonworks.com/blog/install-hadoop-windows-hortonworks-data-platform-2-0/

   Cheers


   On Wed, Jun 4, 2014 at 5:47 PM, wadood.chaudh...@instinet.com wrote:
 Has anyone successfully implemented Multi-node cluster under Windows
 using the latest version of Hadoop. We were able to run Pseudo
 distributed. Does it still needs SSH under Windows, or something
 installed as service.

 Inactive hide details for ch huang ---06/04/2014 08:40:38
 PM---hi,mailist:   my company signed a new IDC , i need m
 ch huang ---06/04/2014 08:40:38 PM---hi,mailist:               my
 company signed a new IDC , i need move the total 50T data

 From: ch huang justlo...@gmail.com
 To: user@hadoop.apache.org,
 Date: 06/04/2014 08:40 PM
 Subject: issue about move data between two hdoop cluster



 hi,mailist:
   my company signed a new IDC , i need move the total 50T
 data from old hadoop cluster to the new cluster in new location ,how
 to do it?


 
=



  Disclaimer 


 This message is intended solely for use by the named addressee(s). If
 you receive this transmission in error, please immediately notify the
 sender and destroy this message in its entirety, whether in electronic
 or hard copy format. Any unauthorized use (and reliance thereon),
 copying, disclosure, retention, or distribution of this transmission
 or the material in this transmission is forbidden. We reserve the
 right to monitor and archive electronic communications. This material
 does not constitute an offer or solicitation with respect to the
 purchase or sale of any security. It should not be construed to
 contain any recommendation regarding any security or strategy. Any
 views expressed are those of the individual sender, except where the
 message states otherwise and the sender is authorized to state them to
 be the views of any such entity. This communication is provided on an
 “as is” basis. It contains material that is owned by Instinet
 Incorporated, its subsidiaries or its or their licensors, and may not,
 in whole or in part, be (i) copied, photocopied or duplicated in any
 form, by any means, or (ii) redistributed, posted, published,
 excerpted, or quoted without Instinet Incorporated's prior written
 consent. Please access the following link for important information
 and instructions:
 http://instinet.com/includes/index.jsp?thePage=/html/le_index.txt


 Securities products and services are provided by locally registered
 brokerage subsidiaries of Instinet Incorporated: Instinet Australia
 Pty Limited (ACN: 131 253 686 AFSL No: 327834), regulated by the
 Australian Securities  Investments Commission; Instinet Canada
 Limited, member IIROC/CIPF; Instinet Pacific Limited, authorized and
 regulated by the Securities and Futures Commission of Hong Kong;
 Instinet Singapore 

Re: Hadoop Multi-node cluster on Windows

2014-06-05 Thread Nitin Pawar
Currently as per my knowledge only Hortonwork's distribution comes for
windows.

As such hadoop does not need ssh to work but ssh in linux was needed for
the automated deployments.

If you want to avoid that then, I would say setup a single machine with all
installation where all packages and correct configurations are present.
then make an image out of it and build other nodes.

But then you will need to login to all nodes and start all services
manually.

Hortonwork's HDP 2.1 on windows step by step manual install guide is here
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-Win-latest/bk_releasenotes_HDP-Win/content/index.html


On Thu, Jun 5, 2014 at 11:29 AM, wadood.chaudh...@instinet.com wrote:

 Thanks but that is almost a 5 year old tutorial, written when Hadoop was
 not natively supported on Windows. As of Hadoop 2.x, Windows is supported
 and setup is much simpler. No Cygwin required.

 [image: Inactive hide details for Ted Yu ---06/04/2014 10:05:15 PM---See
 also http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%]Ted Yu
 ---06/04/2014 10:05:15 PM---See also
 http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html

 From: Ted Yu yuzhih...@gmail.com
 To: common-u...@hadoop.apache.org user@hadoop.apache.org,
 Date: 06/04/2014 10:05 PM

 Subject: Re: Hadoop Multi-node cluster on Windows
 --



 See also
 *http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html*
 http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html


 On Wed, Jun 4, 2014 at 6:43 PM, *wadood.chaudh...@instinet.com*
 wadood.chaudh...@instinet.com wrote:

Thanks. Linux cluster requires SSH etc. for every thing including
single node. In Windows until now I have been able to setup to pseudo
distributed level without SSH or CygWin. Someone was saying that some
Windows service needs to be installed for communication across machines.
Is that true?








- Message from Olivier Renault on 06/04/2014 09:36:17 PM -



*To:*


*user@hadoop.apache.org* user@hadoop.apache.org


*Subject:*


RE: Hadoop Multi-node cluster on Windows


HDP run natively on Linux and Windows. It's the same code base. You
can install a windows cluster in the same way as you do for Linux.

Olivier

On 4 Jun 2014 18:23, *wadood.chaudh...@instinet.com*
wadood.chaudh...@instinet.com wrote:
Yes, but Horton runs Hadoop under Linux Virtual Machine. I am talking
   of pure and simple Hadoop.








   - Message from Ted Yu on 06/04/2014 09:18:51 PM -



*To:*


*common-u...@hadoop.apache.org* common-u...@hadoop.apache.org


*Subject:*


Re: Hadoop Multi-node cluster on Windows



Have you seen this ?



 *http://hortonworks.com/blog/install-hadoop-windows-hortonworks-data-platform-2-0/*

 http://hortonworks.com/blog/install-hadoop-windows-hortonworks-data-platform-2-0/

Cheers


On Wed, Jun 4, 2014 at 5:47 PM, *wadood.chaudh...@instinet.com*
wadood.chaudh...@instinet.com wrote:
   Has anyone successfully implemented Multi-node cluster under
   Windows using the latest version of Hadoop. We were able to run Pseudo
   distributed. Does it still needs SSH under Windows, or something 
 installed
   as service.

   [image: Inactive hide details for ch huang ---06/04/2014 08:40:38
   PM---hi,mailist: my company signed a new IDC , i need m]ch huang
   ---06/04/2014 08:40:38 PM---hi,mailist:   my company signed 
 a
   new IDC , i need move the total 50T data

   From: ch huang *justlo...@gmail.com* justlo...@gmail.com
   To: *user@hadoop.apache.org* user@hadoop.apache.org,
   Date: 06/04/2014 08:40 PM
   Subject: issue about move data between two hdoop cluster
   --



   hi,mailist:
 my company signed a new IDC , i need move the total
   50T data from old hadoop cluster to the new cluster in new location 
 ,how to
   do it?


 *
   
 =
   *

   * Disclaimer *

   *This message is intended solely for use by the named addressee(s).
   If you receive this transmission in error, please immediately notify the
   sender and destroy this message in its entirety, whether in electronic 
 or
   hard copy format. Any unauthorized use (and reliance thereon), copying,
   disclosure, retention, or distribution of this transmission or the 
 material
   in this transmission is forbidden. We reserve the right to monitor and
   archive electronic communications. This material does not constitute an
   offer or solicitation with respect to the purchase or sale of any 
 security.
   It should not be construed to contain any recommendation regarding any
   security or strategy. Any views expressed are those of the 

Re: Hadoop Multi-node cluster on Windows

2014-06-05 Thread Nitin Pawar
if you have installed a single node setup on windows then it should be easy
to move to multinode.

Never tried this myself on windows but this is what I do on linux and it
should work on windows as well.

In the current node, instead of having anything as localhost or 127.0.0.1
.. point it to real hostname or static ip address
make an image out of it and launch similar instances
Launch new instances using that image.
disable windows security on all nodes
make sure all nodes can communicate each  other over network
After that login to each node and start services there and it should come
up as distributed node.


On Thu, Jun 5, 2014 at 11:59 AM, wadood.chaudh...@instinet.com wrote:

 Thanks for looking into it. I have used Horton but that is a third party
 tool and should not be required when Apache has started supporting Windows.
 Here is the Wiki article on Windows from Apache's.
 https://wiki.apache.org/hadoop/Hadoop2OnWindows
 I am able to setup in Pseudo distributed mode using the instructions
 above. Unfortunately, the article ends there saying that instructions for
 Multi-cluster will be added later. No mention that you cannot do it under
 Windows or you must use Horton.

 Wadood


 [image: Inactive hide details for Nitin Pawar ---06/05/2014 02:09:29
 AM---Currently as per my knowledge only Hortonwork's distribution]Nitin
 Pawar ---06/05/2014 02:09:29 AM---Currently as per my knowledge only
 Hortonwork's distribution comes for windows.

 From: Nitin Pawar nitinpawar...@gmail.com
 To: user@hadoop.apache.org,
 Date: 06/05/2014 02:09 AM

 Subject: Re: Hadoop Multi-node cluster on Windows
 --



 Currently as per my knowledge only Hortonwork's distribution comes for
 windows.

 As such hadoop does not need ssh to work but ssh in linux was needed for
 the automated deployments.

 If you want to avoid that then, I would say setup a single machine with
 all installation where all packages and correct configurations are present.
 then make an image out of it and build other nodes.

 But then you will need to login to all nodes and start all services
 manually.

 Hortonwork's HDP 2.1 on windows step by step manual install guide is here

 *http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-Win-latest/bk_releasenotes_HDP-Win/content/index.html*
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-Win-latest/bk_releasenotes_HDP-Win/content/index.html


 On Thu, Jun 5, 2014 at 11:29 AM, *wadood.chaudh...@instinet.com*
 wadood.chaudh...@instinet.com wrote:

Thanks but that is almost a 5 year old tutorial, written when Hadoop
was not natively supported on Windows. As of Hadoop 2.x, Windows is
supported and setup is much simpler. No Cygwin required.

[image: Inactive hide details for Ted Yu ---06/04/2014 10:05:15
PM---See also http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%]Ted
Yu ---06/04/2014 10:05:15 PM---See also
*http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html*
http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html

From: Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com
To: *common-u...@hadoop.apache.org* common-u...@hadoop.apache.org 
*user@hadoop.apache.org* user@hadoop.apache.org,
Date: 06/04/2014 10:05 PM


Subject: Re: Hadoop Multi-node cluster on Windows
--




See also
*http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html*
http://ebiquity.umbc.edu/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html


On Wed, Jun 4, 2014 at 6:43 PM, *wadood.chaudh...@instinet.com*
wadood.chaudh...@instinet.com wrote:
   Thanks. Linux cluster requires SSH etc. for every thing including
   single node. In Windows until now I have been able to setup to pseudo
   distributed level without SSH or CygWin. Someone was saying that some
   Windows service needs to be installed for communication across machines.
   Is that true?








   - Message from Olivier Renault on 06/04/2014 09:36:17 PM
   -
*To:*
*user@hadoop.apache.org* user@hadoop.apache.org
*Subject:*
RE: Hadoop Multi-node cluster on Windows
HDP run natively on Linux and Windows. It's the same code base. You
   can install a windows cluster in the same way as you do for Linux.

   Olivier

   On 4 Jun 2014 18:23, *wadood.chaudh...@instinet.com*
   wadood.chaudh...@instinet.com wrote:

  Yes, but Horton runs Hadoop under Linux Virtual Machine. I am
  talking of pure and simple Hadoop.








  - Message from Ted Yu on 06/04/2014 09:18:51 PM -



*To:*


*common-u...@hadoop.apache.org* common-u...@hadoop.apache.org


*Subject:*


Re: Hadoop Multi-node cluster on Windows



Have you seen this ?



 *http://hortonworks.com/blog/install-hadoop-windows-hortonworks-data-platform-2-0/*

 

Re: (Very) newbie questions

2014-06-05 Thread Raj K Singh
are you able to compile it using mvn compile -Pnative
-Dmaven.test.skip=true?


Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Wed, Jun 4, 2014 at 4:01 AM, Christian Convey christian.con...@gmail.com
 wrote:

 I'm completely new to Hadoop, and I'm trying to build it for the first
 time.  I cloned the Git repository and I'm building the tag version
 release-2.4.0.

 I get a clean mvn compile -Pnative, but I'm getting a few unit tests
 failing when I run mvn test.

 Does anyone know to what extent Hadoop (Common) 's unit tests failing is a
 serious issue?  I.e., can I have a healthy Hadoop build, while still having
 had a few unit tests fail?



Re: how can i monitor Decommission progress?

2014-06-05 Thread Raj K Singh
use

$hadoop dfsadmin -report



Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i decommission three node out of my cluster,but question is
 how can i see the decommission progress?,i just can see admin state from
 web ui



Re: how can i monitor Decommission progress?

2014-06-05 Thread Suresh Srinivas
The namenode webui provides that information. Click on the main webui the link 
associated with decommissioned nodes. 

Sent from phone

 On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote:
 
 use
 
 $hadoop dfsadmin -report
 
 
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370
 
 
 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote:
 hi,maillist:
   i decommission three node out of my cluster,but question is 
 how can i see the decommission progress?,i just can see admin state from web 
 ui
 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: how can i monitor Decommission progress?

2014-06-05 Thread ch huang
but it can not show me ,how much it already done

On Fri, Jun 6, 2014 at 2:56 AM, Suresh Srinivas sur...@hortonworks.com
wrote:

  The namenode webui provides that information. Click on the main webui
 the link associated with decommissioned nodes.

 Sent from phone

 On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote:

   use

 $hadoop dfsadmin -report


  
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370


 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i decommission three node out of my cluster,but question is
 how can i see the decommission progress?,i just can see admin state from
 web ui



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


issue about change NN node

2014-06-05 Thread ch huang
hi,mailist:
  I want to replace my NN in hadoop cluster(not NN HA,no secondary
NN),how can i do this?


NN recovery from metadata backup

2014-06-05 Thread Vikas Ranjan
Hi,I am trying NN metadata recovery.  I have taken backup of Namenode and 
Journal node meta data . It contains edit logs and fsimages.
There are two NNs in my system. I take backup of metadata on both NNs (hdfs 
metadata  QJM metadata) at regular frequency. I want to test recovery 
procedure in a worst case scenario. Assume both the NNs and Journal node are 
down with the metadata completely deleted.
I want to recover NN metadata from backup and start NN. I know that there could 
be a data loss as the latest changes done after backup would be missing.
Do you think such a scenario is possible/feasible ?I am facing some issues 
related to txn id mismatch, committed txn id. Please tell if there is a 
solution for the same.
Steps tried:Take metadata backup of NN and QJM.Do some hdfs file operations 
(create new files).Stop NN and Journal node on both the machines.Delete 
metadata from /data/hdfs and journal directories.Restore Fsimages from backup 
(taken some time back). Start NN. It fails with below exception.Alternative: 
Restore all the edit logs and fsimage to both hdfs and qjm directories and 
start NN but still it fails.
Both the NNs are down and I can't bring up. I don't want to format hdfs as it 
will change Cluster ID and the backup won't be usable.


Exceptions:There appears to be a gap in the edit log.  We expected txid 71453, 
but got txid 71466Client trying to move committed txid backward from 71599 to 
71453recoverUnfinalizedSegments failed for required journal. Decided to 
synchronize log to startTxId: 71453 but logger 10.204.64.26:8485 had seen txid 
71599 committed   

Re: How can a task know if its running as a MR1 or MR2 job?

2014-06-05 Thread Raj K Singh
Hi

I guess when we call submit() method that in tern call the setUseNewAPI()
method to set the MR2 relevent setting.. here what i found after code walk
through of hadoop

private void setUseNewAPI()
throws IOException
{
int numReduces = conf.getNumReduceTasks();
String oldMapperClass = mapred.mapper.class;
String oldReduceClass = mapred.reducer.class;
conf.setBooleanIfUnset(mapred.mapper.new-api,
conf.get(oldMapperClass) == null);
if(conf.getUseNewMapper())
{
String mode = new map API;
ensureNotSet(mapred.input.format.class, mode);
ensureNotSet(oldMapperClass, mode);
if(numReduces != 0)
ensureNotSet(mapred.partitioner.class, mode);
else
ensureNotSet(mapred.output.format.class, mode);
} else
{
String mode = map compatability;
ensureNotSet(mapreduce.inputformat.class, mode);
ensureNotSet(mapreduce.map.class, mode);
if(numReduces != 0)
ensureNotSet(mapreduce.partitioner.class, mode);
else
ensureNotSet(mapreduce.outputformat.class, mode);
}
if(numReduces != 0)
{
conf.setBooleanIfUnset(mapred.reducer.new-api,
conf.get(oldReduceClass) == null);
if(conf.getUseNewReducer())
{
String mode = new reduce API;
ensureNotSet(mapred.output.format.class, mode);
ensureNotSet(oldReduceClass, mode);
} else
{
String mode = reduce compatability;
ensureNotSet(mapreduce.outputformat.class, mode);
ensureNotSet(mapreduce.reduce.class, mode);
}
}
}


Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, Jun 3, 2014 at 5:34 PM, Michael Segel msegel_had...@hotmail.com
wrote:

 Just a quick question...

 Suppose you have a M/R job running.
 How does the Mapper or Reducer task know or find out if its running as a
 M/R 1 or M/R 2 job?

 I would suspect the job context would hold that information... but on
 first glance I didn't see it.
 So what am I missing?

 Thx

 -Mike