Re: Unable to start namenode : Address already in use

2015-06-06 Thread Chandrashekhar Kotekar
1) Check if by any chance process is still running by using jps -V command.
2) If it is running then kill it by sudo kill -9 proc-id
3) Execute name node start command again.
4) Go to bottom of the name node log file and post it here.

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455


Re: Map Reduce Help

2015-05-05 Thread Chandrashekhar Kotekar
Technically yes, you can keep all map reduce jobs in single jar file
because all map reduce jobs are nothing but java classes but I think its
better to keep all map-reduce job isolated so that you will be able to
modify them easily in future.


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Tue, May 5, 2015 at 9:18 PM, Nishanth S chinchu2...@gmail.com wrote:

 Hello,

 I am very new to map reduce.We  need to wirte few map reduce jobs to
 process different  binary files.Can all the different map reduce programs
 be packaged into a single jar file?.


 Thanks,
 Chinchu



Re: Can we control data distribution and load balancing in Hadoop Cluster?

2015-05-03 Thread Chandrashekhar Kotekar
Your question is very vague. Can you give us more details about the problem
you are trying to solve?


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Sun, May 3, 2015 at 11:59 PM, Answer Agrawal yrsna.tse...@gmail.com
wrote:

 Hi

 As I studied that data distribution, load balancing, fault tolerance are
 implicit in Hadoop. But I need to customize it, can we do that?

 Thanks




journal node shared edits directory should be present on HDFS or NAS or anything else?

2015-02-12 Thread Chandrashekhar Kotekar
Hi,

I am  trying to configure name node HA and I want to further configure
automatic fail over. I am confused about '*dfs.namenode.shared.edits.dir*'
configuration.

Documentation says that active namde node writes to shared storage. I
would like to know if this means that name nodes write it on HDFS or do
they require shared storage like NAS or SAN or something else.


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455


Re: journal node shared edits directory should be present on HDFS or NAS or anything else?

2015-02-12 Thread Chandrashekhar Kotekar
Hi Brahma Reddy,

Thanks for the quick answer. It explains a lot but I have one more
question. Maybe it is a stupid question but, required shared storage
means active name node will write to its local disk? Do I need to configure
or use any shared storage like NAS or SAN array or S3 storage for this
purpose?


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Thu, Feb 12, 2015 at 5:08 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:

  Hello Chandrashekhar,

 Active namenode will write to require shared storage and will not write
 to HDFS.. Please check following docs for reference



 *When Sharedstorage is Journalnode:*

  property
   namedfs.namenode.shared.edits.dir/name
   
 valueqjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster/value
 /property



 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html



 *When Sharedstorage is NFS:*

  property
   namedfs.namenode.shared.edits.dir/name
   valuefile:///mnt/filer1/dfs/ha-name-dir-shared/value
 /property




 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html





  Thanks  Regards

  Brahma Reddy Battula


   --
 *From:* Chandrashekhar Kotekar [shekhar.kote...@gmail.com]
 *Sent:* Thursday, February 12, 2015 5:01 PM
 *To:* user@hadoop.apache.org
 *Subject:* journal node shared edits directory should be present on HDFS
 or NAS or anything else?

   Hi,

  I am  trying to configure name node HA and I want to further configure
 automatic fail over. I am confused about '*dfs.namenode.shared.edits.dir*'
 configuration.

  Documentation says that active namde node writes to shared storage. I
 would like to know if this means that name nodes write it on HDFS or do
 they require shared storage like NAS or SAN or something else.


   Regards,
 Chandrash3khar Kotekar
 Mobile - +91 8600011455



Re: What happens to data nodes when name node has failed for long time?

2014-12-14 Thread Chandrashekhar Kotekar
Hi Mark,

Thanks for giving detailed information about name node failure and High
availability feature.

Wish you all the best in your job search.

Thanks again...


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Mon, Dec 15, 2014 at 6:29 AM, mark charts mcha...@yahoo.com wrote:


 Prior to the Hadoop 2.x series, the NameNode was a single point of
 failure in an
 HDFS cluster — in other words, if the machine on which the single NameNode
 was configured became unavailable, the entire cluster would be unavailable
 until the NameNode could be restarted. This was bad news, especially in the
 case of unplanned outages, which could result in significant downtime if
 the
 cluster administrator weren’t available to restart the NameNode.
 The solution to this problem is addressed by the HDFS High Availability
 fea-
 ture. The idea is to run two NameNodes in the same cluster — one active
 NameNode and one hot standby NameNode. If the active NameNode crashes
 or needs to be stopped for planned maintenance, it can be quickly failed
 over
 to the hot standby NameNode, which now becomes the active NameNode.
 The key is to keep the standby node synchronized with the active node; this
 action is now accomplished by having both nodes access a shared NFS direc-
 tory. All namespace changes on the active node are logged in the shared
 directory. The standby node picks up those changes from the directory and
 applies them to its own namespace. In this way, the standby NameNode acts
 as a current backup of the active NameNode. The standby node also has cur-
 rent block location information, because DataNode heartbeats are routinely
 sent to both active and standby NameNodes.
 To ensure that only one NameNode is the “active” node at any given time,
 configure a fencing process for the shared storage directory; then, during
 a
 failover, if it appears that the failed NameNode still carries the active
 state,
 the configured fencing process prevents that node from accessing the shared
 directory and permits the newly active node (the former standby node) to
 complete the failover.
 The machines that will serve as the active and standby NameNodes in your
 High Availability cluster should have equivalent hardware. The shared NFS
 storage directory, which must be accessible to both active and standby
 NameNodes, is usually located on a separate machine and can be mounted on
 each NameNode machine. To prevent this directory from becoming a single
 point of failure, configure multiple network paths to the storage
 directory, and
 ensure that there’s redundancy in the storage itself. Use a dedicated
 network-
 attached storage (NAS) appliance to contain the shared storage directory.
   *sic*

 Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,
 Rafael Coss, and Roman B. Melnyk.


 Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical
 Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a
 few months ago for productivity purposes at work (that's my claim to fame).
 I was laid off shortly afterwards. No correlation I suspect. But I am in FL
 and willing to go anywhere to find contract/permanent work. If anyone knows
 of a position for a tenacious Hadoop engineer, I am interested.


 Thank you.


 Mark Charts



   On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle 
 daeme...@gmail.com wrote:


 I found the terminology of primary and secondary to be a bit confusing in
 describing operation after a failure scenario. Perhaps it is helpful to
 think that the Hadoop instance is guided to select a node as primary for
 normal operation. If that node fails, then the backup becomes the new
 primary. In analyzing traffic it appears that the restored node does not
 become primary again until the whole instance restarts. I myself would
 welcome clarification on this observed behavior.



 *...*






 *“Life should not be a journey to the grave with the intention of arriving
 safely in apretty and well preserved body, but rather to skid in broadside
 in a cloud of smoke,thoroughly used up, totally worn out, and loudly
 proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
 (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0) 20 8144 9872
 %28%2B44%29%20%280%29%2020%208144%209872*

 On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase rha...@pandora.com wrote:

   The remaining cluster services will continue to run.  That way when the
 namenode (or other failed processes) is restored the cluster will resume
 healthy operation.  This is part of hadoop’s ability to handle network
 partition events.

  *Rich Haase* | Sr. Software Engineer | Pandora
 m 303.887.1146 | rha...@pandora.com

   From: Chandrashekhar Kotekar shekhar.kote...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Friday, December 12, 2014 at 3:57 AM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: What happens to data nodes when name node has failed for long
 time

What happens to data nodes when name node has failed for long time?

2014-12-12 Thread Chandrashekhar Kotekar
Hi,

What happens if name node has crashed for more than one hour but secondary
name node, all the data nodes, job tracker, task trackers are running fine?
Do those daemon services also automatically shutdown after some time? Or
those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455


Fwd: Multiple ways to write Hadoop program driver - Which one to choose?

2013-04-23 Thread Chandrashekhar Kotekar
Hi,


I have observed that there are multiple ways to write driver method of
Hadoop program.

Following method is given in Hadoop Tutorial by
Yahoohttp://developer.yahoo.com/hadoop/tutorial/module4.html

 public void run(String inputPath, String outputPath) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName(wordcount);

// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);

FileInputFormat.addInputPath(conf, new Path(inputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));

JobClient.runJob(conf);
  }

and this method is given in Hadoop The Definitive Guide 2012 book by
Oreilly.

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
System.err.println(Usage: MaxTemperature input path output path);
System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName(Max temperature);
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

While trying program given in Oreilly book I found that constructors
of Job class
are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised
to see that they have used deprecated class.

I would like to know which method everyone uses?







Regards,
Chandrash3khar K0tekar
Mobile - 8884631122