In cygwin, hadoop throws exception when running wordcount.

2012-03-28 Thread Tim.Wu
Env:   windows7+cygwin 1.7.11-1+jdk1.6.0_31+hadoop 1.0.0

So far, the standalone model is ok. But, in pseudo or cluster model, the
wordcount example always throw errors.

The HDFS works fine, but tasktracker can not create threads(jvm) for new
job.  It is empty under /logs/userlogs/job-/attempt-/.

The error log of tasktracker is like:


==
12/03/28 14:35:13 INFO mapred.JvmManager: In JvmRunner constructed JVM ID:
jvm_201203280212_0005_m_-1386636958
12/03/28 14:35:13 INFO mapred.JvmManager: JVM Runner
jvm_201203280212_0005_m_-1386636958 spawned.
12/03/28 14:35:17 INFO mapred.JvmManager: JVM Not killed
jvm_201203280212_0005_m_-1386636958 but just removed
12/03/28 14:35:17 INFO mapred.JvmManager: JVM :
jvm_201203280212_0005_m_-1386636958 exited with exit code -1. Number of
tasks it ran: 0
12/03/28 14:35:17 WARN mapred.TaskRunner:
attempt_201203280212_0005_m_02_0 : Child Error
java.io.IOException: Task process exit with nonzero status of -1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/03/28 14:35:21 INFO mapred.TaskTracker: addFreeSlot : current free slots
: 2
12/03/28 14:35:24 INFO mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201203280212_0005_m_02_1 task's state:UNASSIGNED
12/03/28 14:35:24 INFO mapred.TaskTracker: Trying to launch :
attempt_201203280212_0005_m_02_1 which needs 1 slots
12/03/28 14:35:24 INFO mapred.TaskTracker: In TaskLauncher, current free
slots : 2 and trying to launch attempt_201203280212_0005_m_02_1 which
needs 1 slots
12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stdout log for
task: attempt_201203280212_0005_m_02_0
java.io.FileNotFoundException:
D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index
(The system cannot find the path specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:120)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188)
at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423)
at
org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81)
at
org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stderr log for
task: attempt_201203280212_0005_m_02_0
java.io.FileNotFoundException:
D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index
(The system cannot find the path specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:120)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188)
at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423)
at
org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81)
at

Re: Cannot renew lease for DFSClient_977492582. Name node is in safe mode in AWS

2012-03-28 Thread madhu phatak
Hi Mohit,
 HDFS is in safe mode which is read only mod. Run the following command to
get out of safemode

 bin/hadoop dfsadmin -safemode leave.

On Thu, Mar 15, 2012 at 5:54 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

  When I run client to create files in amazon HDFS I get this error. Does
 anyone know what it really means and how to resolve this?

 ---


 2012-03-14 23:16:21,414 INFO org.apache.hadoop.ipc.Server (IPC Server
 handler 46 on 9000): IPC Server handler 46 on 9000, call
 renewLease(DFSClient_977492582) from 10.70.150.119:47240: error:
 org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew
 lease for DFSClient_977492582. Name node is in safe mode.

 The ratio of reported blocks 1. has reached the threshold 0.9990. Safe
 mode will be turned off automatically in 0 seconds.

 org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renew
 lease for DFSClient_977492582. Name node is in safe mode.

 The ratio of reported blocks 1. has reached the threshold 0.9990. Safe
 mode will be turned off automatically in 0 seconds.

 at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:2296)

 at

 org.apache.hadoop.hdfs.server.namenode.NameNode.renewLease(NameNode.java:814)

 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)

 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)

 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)

 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)




-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: dynamic mapper?

2012-03-28 Thread madhu phatak
Hi,
 You can use java API's to compile custom java code and create jars. For
example , look at this code from Sqoop

/**
 * Licensed to Cloudera, Inc. under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  Cloudera, Inc. licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.cloudera.sqoop.orm;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.jar.JarOutputStream;
import java.util.zip.ZipEntry;

import javax.tools.JavaCompiler;
import javax.tools.JavaFileObject;
import javax.tools.StandardJavaFileManager;
import javax.tools.ToolProvider;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.mapred.JobConf;

import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.util.FileListing;

import com.cloudera.sqoop.util.Jars;

/**
 * Manages the compilation of a bunch of .java files into .class files
 * and eventually a jar.
 *
 * Also embeds this program's jar into the lib/ directory inside the
compiled
 * jar to ensure that the job runs correctly.
 */
public class CompilationManager {

  /** If we cannot infer a jar name from a table name, etc., use this. */
  public static final String DEFAULT_CODEGEN_JAR_NAME =
  sqoop-codegen-created.jar;

  public static final Log LOG = LogFactory.getLog(
  CompilationManager.class.getName());

  private SqoopOptions options;
  private ListString sources;

  public CompilationManager(final SqoopOptions opts) {
options = opts;
sources = new ArrayListString();
  }

  public void addSourceFile(String sourceName) {
sources.add(sourceName);
  }

  /**
   * locate the hadoop-*-core.jar in $HADOOP_HOME or --hadoop-home.
   * If that doesn't work, check our classpath.
   * @return the filename of the hadoop-*-core.jar file.
   */
  private String findHadoopCoreJar() {
String hadoopHome = options.getHadoopHome();

if (null == hadoopHome) {
  LOG.info($HADOOP_HOME is not set);
  return Jars.getJarPathForClass(JobConf.class);
}

if (!hadoopHome.endsWith(File.separator)) {
  hadoopHome = hadoopHome + File.separator;
}

File hadoopHomeFile = new File(hadoopHome);
LOG.info(HADOOP_HOME is  + hadoopHomeFile.getAbsolutePath());
File [] entries = hadoopHomeFile.listFiles();

if (null == entries) {
  LOG.warn(HADOOP_HOME appears empty or missing);
  return Jars.getJarPathForClass(JobConf.class);
}

for (File f : entries) {
  if (f.getName().startsWith(hadoop-)
   f.getName().endsWith(-core.jar)) {
LOG.info(Found hadoop core jar at:  + f.getAbsolutePath());
return f.getAbsolutePath();
  }
}

return Jars.getJarPathForClass(JobConf.class);
  }

  /**
   * Compile the .java files into .class files via embedded javac call.
   * On success, move .java files to the code output dir.
   */
  public void compile() throws IOException {
ListString args = new ArrayListString();

// ensure that the jar output dir exists.
String jarOutDir = options.getJarOutputDir();
File jarOutDirObj = new File(jarOutDir);
if (!jarOutDirObj.exists()) {
  boolean mkdirSuccess = jarOutDirObj.mkdirs();
  if (!mkdirSuccess) {
LOG.debug(Warning: Could not make directories for  + jarOutDir);
  }
} else if (LOG.isDebugEnabled()) {
  LOG.debug(Found existing  + jarOutDir);
}

// Make sure jarOutDir ends with a '/'.
if (!jarOutDir.endsWith(File.separator)) {
  jarOutDir = jarOutDir + File.separator;
}

// find hadoop-*-core.jar for classpath.
String coreJar = findHadoopCoreJar();
if (null == coreJar) {
  // Couldn't find a core jar to insert into the CP for compilation.
If,
  // however, we're running this from a unit test, then the path to the
  // .class files might be set via the hadoop.alt.classpath property
  // instead. Check there first.
  String coreClassesPath = System.getProperty(hadoop.alt.classpath);
  if (null == coreClassesPath) {
// no -- we're out of options. Fail.
throw new IOException(Could not find hadoop core jar!);
  } else {
coreJar = coreClassesPath;
  }
}

// find sqoop jar for compilation 

Why does just tasktracker run under cyg_server account in cygwin?

2012-03-28 Thread Tim.Wu
Hi all

I noticed that when I run bin/start-all.sh in cygwin. Namenode, datanode
and jobtracker are all running under the login account, e.g. timwu in my
case. Just tasktracker run under cyg_server, where cyg_server is the
account created by ssh-host-config when I set up sshd in cygwin.

As I install cygwin in d:\cygwin, all namenode, datanode and jobtracker use
the d:\\tmp\\hadoop-timwu as their tmp folder. And only tasktraker use
d:\\tmp\\hadoop-cyg_server.

Do I configure the hadoop correctly?

Best
Tim


MapReduce on autocomplete

2012-03-28 Thread Tony Burton
So I have a lot of small files on S3 that I need to consolidate, so headed to 
Google to see the best way to do it in a MapReduce job. Looks like someone's 
got a different idea, according to Google's autocomplete:

[cid:image001.jpg@01CD0D09.CDEB9E90]


**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the email and no responsibility 
is accepted for any loss or damage arising in any way from receipt or use of 
this email.  Sporting Index Ltd is a company registered in England and Wales 
with company number 2636842, whose registered office is at Brookfield House, 
Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES.  Sporting Index Ltd is 
authorised and regulated by the UK Financial Services Authority (reg. no. 
150404). Any financial promotion contained herein has been issued 
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM


Re: where are my logging output files going to?

2012-03-28 Thread Michael Segel
You don't want users actually running anything directly on the cluster. 
You would set up some machine to launch jobs. 
Essentially any sort of Linux machine where you can install Hadoop, but you 
don't run any jobs...

Sent from my iPhone

On Mar 28, 2012, at 3:30 AM, Jane Wayne jane.wayne2...@gmail.com wrote:

 what do you mean by an edge node? do you mean any node that is not the
 master node (or NameNode or JobTracker node)?
 
 On Wed, Mar 28, 2012 at 3:51 AM, Michel Segel 
 michael_se...@hotmail.comwrote:
 
 First you really don't want to launch the job from the cluster but from an
 edge node.
 
 To answer your question, in a word, yes, you should have a consistent set
 of configuration files as possible, noting that overtime this may not be
 possible as hardware configs may change,
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On Mar 27, 2012, at 8:42 PM, Jane Wayne jane.wayne2...@gmail.com wrote:
 
 if i have a hadoop cluster of 10 nodes, do i have to modify the
 /hadoop/conf/log4j.properties files on ALL 10 nodes to be the same?
 
 currently, i ssh into the master node to execute a job. this node is the
 only place where i have modified the logj4.properties file. i notice that
 although my log files are being created, nothing is being written to
 them.
 when i test on cygwin, the logging works, however, when i go to a live
 cluster (i.e. amazon elastic mapreduce), the logging output on the master
 node no longer works. i wonder if logging is happening at each slave/task
 node?
 
 could someone explain logging or point me to the documentation discussing
 this issue?
 


Re: MapReduce on autocomplete

2012-03-28 Thread Harsh J
Looks like your mail client (or the list) stripped away your image
attachment. Could you post the image as a link from imageshack/etc.
instead?

On Wed, Mar 28, 2012 at 10:10 PM, Tony Burton tbur...@sportingindex.com wrote:

 So I have a lot of small files on S3 that I need to consolidate, so headed to 
 Google to see the best way to do it in a MapReduce job. Looks like someone’s 
 got a different idea, according to Google’s autocomplete:







 P Think of the environment: please don't print this email unless you really 
 need to.

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright and 
 may be legally privileged. If you are not the intended recipient, then the 
 dissemination or copying of this email is prohibited. If you have received 
 this in error, please notify the sender by replying by email and then delete 
 the email completely from your system. Neither Sporting Index nor the sender 
 accepts responsibility for any virus, or any other defect which might affect 
 any computer or IT system into which the email is received and/or opened. It 
 is the responsibility of the recipient to scan the email and no 
 responsibility is accepted for any loss or damage arising in any way from 
 receipt or use of this email. Sporting Index Ltd is a company registered in 
 England and Wales with company number 2636842, whose registered office is at 
 Brookfield House, Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES. Sporting 
 Index Ltd is authorised and regulated by the UK Financial Services Authority 
 (reg. no. 150404). Any financial promotion contained herein has been issued 
 and approved by Sporting Index Ltd.




--
Harsh J


Re: Hbase RegionServer stalls on initialization

2012-03-28 Thread N Keywal
It must be waiting for the master. Have you launched the master?

On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman 
nabib.elrah...@tubemogul.com wrote:

 Hi Guys,

 I'm starting up an region server and it stalls on initialization.  I took
 a thread dump and found it hanging on this spot:

 regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in 
 Object.wait() [0x7fa9101b4000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122)
 - locked 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560)
 at java.lang.Thread.run(Thread.java:662)



 Any Idea on who or what its being blocked on?

 *Nabib El-Rahman *|  Senior Sofware Engineer

 *M:* 734.846.2529
 www.tubemogul.com | *twitter: @nabiber*

  http://www.tubemogul.com/
  http://www.tubemogul.com/




Re: Hbase RegionServer stalls on initialization

2012-03-28 Thread N Keywal
Then you should have an error in the master logs.
If not, it worths checking that the master  the region servers speak to
the same ZK...

As it's hbase related, I redirect the question to hbase user mailing list
(hadoop common is in bcc).

On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman 
nabib.elrah...@tubemogul.com wrote:

 The master is up. is it possible that zookeeper might not know about it?


  *Nabib El-Rahman *|  Senior Sofware Engineer

 *M:* 734.846.25 734.846.2529
 www.tubemogul.com | *twitter: @nabiber*

  http://www.tubemogul.com/
  http://www.tubemogul.com/

 On Mar 28, 2012, at 10:42 AM, N Keywal wrote:

 It must be waiting for the master. Have you launched the master?

 On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman 
 nabib.elrah...@tubemogul.com wrote:

 Hi Guys,

 I'm starting up an region server and it stalls on initialization.  I took
 a thread dump and found it hanging on this spot:

 regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in 
 Object.wait() [0x7fa9101b4000]

java.lang.Thread.State: TIMED_WAITING (on object monitor)

 at java.lang.Object.wait(Native Method)

 - waiting on 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)


 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122)


 - locked 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560)


 at java.lang.Thread.run(Thread.java:662)



 Any Idea on who or what its being blocked on?

  *Nabib El-Rahman *|  Senior Sofware Engineer

 *M:* 734.846.2529
 www.tubemogul.com | *twitter: @nabiber*

  http://www.tubemogul.com/
  http://www.tubemogul.com/






Re: activity on IRC .

2012-03-28 Thread Todd Lipcon
Hey Jay,

That's the only one I know of. Not a lot of idle chatter, but when
people have questions, discussions do start up. Much more active
during PST working hours, of course :)

-Todd

On Wed, Mar 28, 2012 at 8:05 AM, Jay Vyas jayunit...@gmail.com wrote:
 Hi guys : I notice the IRC activity is a little low.  Just wondering if
 theres a better chat channel for hadoop other than the official one
 (#hadoop on freenode)?
 In any case... Im on there :)   come say hi.

 --
 Jay Vyas
 MMSB/UCHC



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: activity on IRC .

2012-03-28 Thread Russell Jurney
I get good answers on Twitter.

Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com

On Mar 28, 2012, at 12:27 PM, Todd Lipcon t...@cloudera.com wrote:

 Hey Jay,

 That's the only one I know of. Not a lot of idle chatter, but when
 people have questions, discussions do start up. Much more active
 during PST working hours, of course :)

 -Todd

 On Wed, Mar 28, 2012 at 8:05 AM, Jay Vyas jayunit...@gmail.com wrote:
 Hi guys : I notice the IRC activity is a little low.  Just wondering if
 theres a better chat channel for hadoop other than the official one
 (#hadoop on freenode)?
 In any case... Im on there :)   come say hi.

 --
 Jay Vyas
 MMSB/UCHC



 --
 Todd Lipcon
 Software Engineer, Cloudera


Hadoop Roadmap

2012-03-28 Thread Edmon Begoli
I have seen several articles including recent one in SD Times on changes
that are about to happen in Hadoop and significant architectural upgrades.

Does anyone have a good, detailed resource to recommend where these changes
are outlined including long term roadmap.

I am interested in this because I would to see how other data parallel
algorithms other than M/R and other parallel file system approaches other
than HDFS could be fitted into this vision.
My team might have resources and interests to contribute.

Regards,
Edmon


Re: Hadoop Roadmap

2012-03-28 Thread Arun C Murthy
Edmon,

 Here is a brief overview:
 http://hadoop.apache.org/common/docs/r0.23.1/

 Ping me if you want more collateral.

thanks,
Arun

On Mar 28, 2012, at 3:35 PM, Edmon Begoli wrote:

 I have seen several articles including recent one in SD Times on changes
 that are about to happen in Hadoop and significant architectural upgrades.
 
 Does anyone have a good, detailed resource to recommend where these changes
 are outlined including long term roadmap.
 
 I am interested in this because I would to see how other data parallel
 algorithms other than M/R and other parallel file system approaches other
 than HDFS could be fitted into this vision.
 My team might have resources and interests to contribute.
 
 Regards,
 Edmon

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Possible to poll JobTracker for information from any language?

2012-03-28 Thread Ryan Cole
Hello,

I'm interested in writing a library, to be used with Node.js, that can ask
the JobTracker for information about jobs. I see that this is possible
using the Java API, with the JobClient interface [1]. I also saw that on
the wiki, it mentions that clients can poll the JobTracker for information,
but does not go into detail [2]. Is it possible to get information about
jobs from the JobTracker using C, or C++, or Thrift, or something else?

Thanks,
Ryan


1.
http://hadoop.apache.org/common/docs/r1.0.1/mapred_tutorial.html#Job+Submission+and+Monitoring
2. http://wiki.apache.org/hadoop/JobTracker