Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-03 Thread Karthik Kambatla
Also, are Hadoop summit registrations required to attend the BoF?

On Wed, Jun 3, 2015 at 10:52 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Going through all Yarn umbrella JIRAs
 https://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20summary%20~%20umbrella%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
  could
 be useful. May be, this is an opportunity to clean up that list. I looked
 at all New Features
 https://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20type%20%3D%20%22New%20Feature%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC,
 but that list is too long to go through.

 On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli 
 vino...@apache.org wrote:

 Hi all,

 We had a blast of a BOF session on Hadoop YARN at last year's Hadoop
 Summit. We had lots of fruitful discussions led by many developers about
 various features, their contributions, it was a great session overall.

 I am coordinating this year's BOF as well and garnering topics of
 discussion. A BOF by definition  involves on-the-spot non-planned
 discussions, but it doesn't hurt to have a bunch of pre-planned topics for
 starters.

 YARN developers/committers, if you are attending, please feel free to send
 me topics that you want to discuss about at the BOF session.

 Hadoop users, you are welcome to attend and join the discussion around
 Hadoop YARN. The meetup link is here:
 http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/222465938/

 Thanks all,
 +Vinod




 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-03 Thread Karthik Kambatla
Going through all Yarn umbrella JIRAs
https://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20summary%20~%20umbrella%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
could
be useful. May be, this is an opportunity to clean up that list. I looked
at all New Features
https://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20type%20%3D%20%22New%20Feature%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC,
but that list is too long to go through.

On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 Hi all,

 We had a blast of a BOF session on Hadoop YARN at last year's Hadoop
 Summit. We had lots of fruitful discussions led by many developers about
 various features, their contributions, it was a great session overall.

 I am coordinating this year's BOF as well and garnering topics of
 discussion. A BOF by definition  involves on-the-spot non-planned
 discussions, but it doesn't hurt to have a bunch of pre-planned topics for
 starters.

 YARN developers/committers, if you are attending, please feel free to send
 me topics that you want to discuss about at the BOF session.

 Hadoop users, you are welcome to attend and join the discussion around
 Hadoop YARN. The meetup link is here:
 http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/222465938/

 Thanks all,
 +Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: FileSystem Vs ZKStateStore for RM recovery

2015-02-11 Thread Karthik Kambatla
We recommend ZK-store, particularly if you plan to deploy multiple
ResourceManagers with failover. ZK-store ensures a single RM has write
access and thus is better protected against split-brain cases where both
RMs think they are active.

On Tue, Feb 10, 2015 at 9:59 PM, Suma Shivaprasad 
sumasai.shivapra...@gmail.com wrote:

 We are planning to deploy Hadoop 2.6.0 with a default configuration to
 cache 1 entries in the state store. With a workload of 150-250
 concurrent applications at any time , which state store is better to use
 and for what reasons ?

 Thanks
 Suma




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Missing Snapshots for 2.5.0

2014-08-27 Thread Karthik Kambatla
There was an issue with the infrastructure. It is now fixed and the 2.5.0
artifacts are available.

Mark - can you please retry now.

Thanks
Karthik


On Tue, Aug 26, 2014 at 6:54 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Thanks for reporting this, Mark.

 It appears the artifacts are published to
 https://repository.apache.org/content/repositories/releases/org/apache/hadoop/hadoop-common/2.5.0/,
 but haven't propagated to
 http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/

 I am following up on this, and will report back once I know more.


 On Mon, Aug 25, 2014 at 6:40 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
 wrote:

 Hi Mark,

 Thanks for your reporting. I also confirmed that we cannot access jars
 of Hadoop 2.5.0.

 Karthik, could you check this problem?

 Thanks,
 - Tsuyoshi

 On Thu, Aug 21, 2014 at 2:08 AM, Campbell, Mark mark.campb...@xerox.com
 wrote:
  It seems that all the needed archives (yard, mapreduce, etc) are
 missing the
  2.5.0 build folders.
 
 
 
  My Hadoop 2.5.0 fails at the final build because none of the
 dependences can
  be found.
 
 
 
  Version 2.6.0 does seem to be in the list, however no binaries are
 available
  that I can see.
 
 
 
  Please advise.
 
 
  Cheers,
  Mark
 
 
 
 
 
  Path /org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0-SNAPSHOT/ not
  found in local storage of repository Snapshots [id=snapshots]
 
 
 
  Downloading:
 
 http://repository.jboss.org/nexus/content/groups/public/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.pom
 
  Downloading:
 
 http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.pom
 
  [WARNING] The POM for
  org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.0 is missing, no
  dependency information available
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  Downloading:
 
 http://repository.jboss.org/nexus/content/groups/public/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  Downloading:
 
 http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  [WARNING] The POM for org.apache.hadoop:hadoop-yarn-api:jar:2.5.0 is
  missing, no dependency information available
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/2.5.0/hadoop-common-2.5.0.jar
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.jar



 --
 - Tsuyoshi





Re: Missing Snapshots for 2.5.0

2014-08-26 Thread Karthik Kambatla
Thanks for reporting this, Mark.

It appears the artifacts are published to
https://repository.apache.org/content/repositories/releases/org/apache/hadoop/hadoop-common/2.5.0/,
but haven't propagated to
http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/

I am following up on this, and will report back once I know more.


On Mon, Aug 25, 2014 at 6:40 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
wrote:

 Hi Mark,

 Thanks for your reporting. I also confirmed that we cannot access jars
 of Hadoop 2.5.0.

 Karthik, could you check this problem?

 Thanks,
 - Tsuyoshi

 On Thu, Aug 21, 2014 at 2:08 AM, Campbell, Mark mark.campb...@xerox.com
 wrote:
  It seems that all the needed archives (yard, mapreduce, etc) are missing
 the
  2.5.0 build folders.
 
 
 
  My Hadoop 2.5.0 fails at the final build because none of the dependences
 can
  be found.
 
 
 
  Version 2.6.0 does seem to be in the list, however no binaries are
 available
  that I can see.
 
 
 
  Please advise.
 
 
  Cheers,
  Mark
 
 
 
 
 
  Path /org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0-SNAPSHOT/ not
  found in local storage of repository Snapshots [id=snapshots]
 
 
 
  Downloading:
 
 http://repository.jboss.org/nexus/content/groups/public/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.pom
 
  Downloading:
 
 http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.pom
 
  [WARNING] The POM for
  org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.0 is missing, no
  dependency information available
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  Downloading:
 
 http://repository.jboss.org/nexus/content/groups/public/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  Downloading:
 
 http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-api/2.5.0/hadoop-yarn-api-2.5.0.pom
 
  [WARNING] The POM for org.apache.hadoop:hadoop-yarn-api:jar:2.5.0 is
  missing, no dependency information available
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/2.5.0/hadoop-common-2.5.0.jar
 
  Downloading:
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapreduce-client-app/2.5.0/hadoop-mapreduce-client-app-2.5.0.jar



 --
 - Tsuyoshi



Re: Why resource requests are normalized in RM ?

2014-06-12 Thread Karthik Kambatla
I believe they are normalized to be multiples of
yarn.scheduler.increment-allocation-mb.
yarn.scheduler.minimum-allocation-mb can be set to as low as zero. Llama
does this.

As to why normalization, I think it is to make sure there is no external
fragmentation. It is similar to why memory is paged.


On Wed, Jun 11, 2014 at 5:52 PM, Ashwin Shankar ashwinshanka...@gmail.com
wrote:

 Hi,
 Anyone knows why resource requests from AMs are normalized to
 be multiples of yarn.scheduler.minimum-allocation-mb which is 1G
 by default ?
 Also is there any problem with reducing yarn.scheduler.minimum-allocation-mb
 to less
 than 1G ?

  /**

* Utility method to normalize a list of resource requests, by insuring
 that

* the memory for each request is a multiple of minMemory and is not
 zero.

*/

 SchedulerUtils.normalizeRequests()
 --
 Thanks,
 Ashwin





Re: HA Jobtracker failure

2014-01-27 Thread Karthik Kambatla
(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil jt-pid.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor or...@infolinks.com wrote:

 Hi.
 We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
 server)
 We had a problem during which the active node suddenly became standby and
 the standby server attempted to start resulting in a java heap space
 failure.
 any ideas to why the active node turned to standby?

 logs attached:
 on (original) active node:
 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
 Initializing job_201401041634_5858
 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
 Initializing job_201401041634_5858
 *2014-01-22 06:50:27,386 INFO
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
 standby*
 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
 pluginDispatcher
 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
 infoServer
 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
 waiting to send params to server
 java.lang.InterruptedException
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at
 java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at
 org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
 at org.apache.hadoop.ipc.Client.call(Client.java:1198)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy9.getFileInfo(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy10.getFileInfo(Unknown Source)
 at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
 at
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
 SelectChannelConnector@0.0.0.0:50031

 on standby node
 2014-01-22 06:50:05,010 INFO
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
 active
 2014-01-22 06:50:05,010 INFO
 org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
 JobTrackerHAHttpRedirector on port 50030
 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
 SelectChannelConnector@0.0.0.0:50030
 2014-01-22 06:50:05,198 INFO
 org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
 2014-01-22 06:50:05,201 INFO
 org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
 system directory hdfs://***/tmp/mapred/system/seq-0022 to hdfs://t
 aykey/tmp/mapred/system/seq-0023
 2014-01-22 06:50:05,244 INFO
 org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
 Scheduler configured with (memSizeForMapSlotOnJT, 

Re: Unclear Hadoop 2.1X documentation

2013-09-16 Thread Karthik Kambatla
Moving general@ to bcc and redirecting this to the appropriate list -
user@hadoop.apache.org


On Mon, Sep 16, 2013 at 2:18 AM, Jagat Singh jagatsi...@gmail.com wrote:

 Hello Mahmoud

 You can run on your machine also.

 I learnt everything on my 3gb 2ghz machine and recently got better machine.

 If you follow this post below you should be able to install and run hadoop
 in 30 mins.

 If your machine is not linux then i suggest you to download virtualbox ,
 give it 1400mb ram and start ubuntu in it.

 Then just follow steps here.


 http://jugnu-life.blogspot.com.au/2012/05/hadoop-20-install-tutorial-023x.html

 Thanks,

 Jagat
 On 16/09/2013 7:07 PM, Mahmoud Al-Ewiwi mew...@gmail.com wrote:

  Thanks Ted,
 
  for now i just need to learn the basics of the hadoop before going to ask
  my university for more powerful machines.
  i just want to know how to install and write some simple programs to ask
 my
  supervisor for another server machines
 
  Best Regards
 
 
  On Mon, Sep 16, 2013 at 3:57 AM, Ted Dunning tdunn...@maprtech.com
  wrote:
 
   This is a very small amount of memory for running Hadoop + user
 programs.
  
   You might consider running your tests on a cloud provider like Amazon.
That will give you access to decent sized machines for a relatively
  small
   cost.
  
  
   On Sun, Sep 15, 2013 at 11:27 AM, Mahmoud Al-Ewiwi mew...@gmail.com
   wrote:
  
Thanks to all, i'v tried to use some of these sandboxs, but
  unfortunately
most of them a high amount of memory(3GB) for the guest machine and i
   have
only a 3GB on my machine (old machine), so i'm going to go along with
  the
the normal installation (i have no choice)
   
Thanks
   
   
On Sun, Sep 15, 2013 at 9:13 AM, Roman Shaposhnik r...@apache.org
   wrote:
   
 On Sat, Sep 14, 2013 at 10:54 AM, Mahmoud Al-Ewiwi 
 mew...@gmail.com
  
 wrote:
  Hello,
 
  I'm new to Hadoop and i want to learn it in order to do a
 project.
  I'v started reading the documentation at this site:
 
 

   
  
 
 http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-common/SingleCluster.html
 
  for setting a single node, but i could not figure a lot of things
  in
 these
  documentation.

 For the first timer like yourself, perhaps using a Hadoop
  distribution
 would be the best way to get started. Bigtop offers a 100%
 community
 driven distro, but there are, of course, vendor choices as well.

 Here's the info on Bigtop:


   
  
 
 https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0

 Thanks,
 Roman.

   
  
 



Re: SequenceFile output in Wordcount example

2013-09-16 Thread Karthik Kambatla
Moving general@ to bcc



On Mon, Sep 16, 2013 at 1:20 PM, xeon xeonmailingl...@gmail.com wrote:

 Hi,

 - I want that the wordcount example produces a SequenceFile output with
 the result. How I do this?

 - I want also to do a cat to the SequenceFile and read the result. A
 simple hdfs dfs -cat sequencefile is enough?


 Thanks,



Re: How to shuffle (Key,Value) pair from mapper to multiple reducer

2013-03-13 Thread Karthik Kambatla
How about sending 0,x to 0 and 1,x to 1; reduce 0 can act based on the
value of x?

On Wed, Mar 13, 2013 at 2:29 AM, Vikas Jadhav vikascjadha...@gmail.comwrote:

 Hello I am not talking about custom partioner(custom partitioner is
 involved but i want to write same pair for more number times)
 i want it go to two reducer.
 for example i have partioning attribute two dimensional
 x1,x2

 singatue reduce
 0,0 0
 0,1 1
 1,0 2
 1,1 3

 for 1,0   it will goto reducer
 for 1,null it should goto to reducer 2 and 3
 for 0,null it should goto reducer 0 and 1

 On Wed, Mar 13, 2013 at 2:32 PM, Viral Bajaria viral.baja...@gmail.comwrote:

 Do you want the pair to go to both reducers or do you want it to go to
 only one but in a random fashion ?

 AFAIK, 1st is not possible. Someone on the list can correct if I am wrong.
 2nd is possible by just implementing your own partitioner which
 randomizes where each key goes (not sure what you gain by that).


 On Wed, Mar 13, 2013 at 1:59 AM, Vikas Jadhav 
 vikascjadha...@gmail.comwrote:


 Hi
 I am specifying requirement again with example.



 I have use case where i need to shufffle same (key,value) pair to
 multiple reducers


 For Example  we have pair  (1,ABC) and two reducers (reducer0 and
 reducer1) are there then

 by default this pair will go to reduce1 (cause  (key % numOfReducer) =
 (1%2) )


 how i should shuffle this pair to both reducer.

 Also I willing to change the code of hadoop framework if Necessory.

   Thank you

 On Wed, Mar 13, 2013 at 12:51 PM, feng lu amuseme...@gmail.com wrote:

 Hi

 you can use Job#setNumReduceTasks(int tasks) method to set the number
 of reducer to output.


 On Wed, Mar 13, 2013 at 2:15 PM, Vikas Jadhav vikascjadha...@gmail.com
  wrote:

 Hello,

 As by default Hadoop framework can shuffle (key,value) pair to only
 one reducer

 I have use case where i need to shufffle same (key,value) pair to
 multiple reducers

 Also I  willing to change the code of hadoop framework if Necessory.


 Thank you

 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*




 --
 Don't Grow Old, Grow Up... :-)




 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*





 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*



Re: MRv2 jobs fail when run with more than one slave

2012-07-17 Thread Karthik Kambatla
Forwarding your email to the cdh-user group.

Thanks
Karthik

On Tue, Jul 17, 2012 at 2:24 PM, Trevor tre...@scurrilous.com wrote:

 Hi all,

 I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some
 strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with
 more than one slave. For every slave except the one running the Application
 Master, I get the following failed tasks and warnings repeatedly:

 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running
 in uber mode : false
 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
 12/07/13 14:23:07 INFO mapreduce.Job: Task Id :
 attempt_1342207265272_0001_m_04_0, Status : FAILED
 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
 returned HTTP response code: 400 for URL: http://

 perfgb0n0:8080/tasklog?plaintext=trueattemptid=attempt_1342207265272_0001_m_04_0filter=stdout
 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
 returned HTTP response code: 400 for URL: http://

 perfgb0n0:8080/tasklog?plaintext=trueattemptid=attempt_1342207265272_0001_m_04_0filter=stderr
 12/07/13 14:23:08 INFO mapreduce.Job: Task Id :
 attempt_1342207265272_0001_m_03_0, Status : FAILED
 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server
 returned HTTP response code: 400 for URL: http://

 perfgb0n0:8080/tasklog?plaintext=trueattemptid=attempt_1342207265272_0001_m_03_0filter=stdout
 ...
 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed
 with state FAILED due to:
 ...
 Failed map tasks=19
 Launched map tasks=31

 The HTTP 400 error appears to be generated by the ShuffleHandler, which is
 configured to run on port 8080 of the slaves, and doesn't understand that
 URL. What I've been able to piece together so far is that /tasklog is
 handled by the TaskLogServlet, which is part of the TaskTracker. However,
 isn't this an MRv1 class that shouldn't even be running in my
 configuration? Also, the TaskTracker appears to run on port 50060, so I
 don't know where port 8080 is coming from.

 Though it could be a red herring, this warning seems to be related to the
 job failing, despite the fact that the job makes progress on the slave
 running the AM. The Node Manager logs on both AM and non-AM slaves appear
 fairly similar, and I don't see any errors in the non-AM logs.

 Another strange data point: These failures occur running the slaves on ARM
 systems. Running the slaves on x86 with the same configuration works. I'm
 using the same tarball on both, which means that the native-hadoop library
 isn't loaded on ARM. The master/client is the same x86 system in both
 scenarios. All nodes are running Ubuntu 12.04.

 Thanks for any guidance,
 Trevor




Re: 答复: How To Distribute One Map Data To All Reduce Tasks?

2012-07-05 Thread Karthik Kambatla
One way to achieve this would be to:

   1. Emit the same value multiple times, each time with a different key.
   2. Use these different keys, in conjunction with the partitioner, to
   achieve the desired distribution.

Hope that helps!

Karthik

On Thu, Jul 5, 2012 at 12:19 AM, 静行 xiaoyong.den...@taobao.com wrote:

  I have different key values to join two tables, but only a few key
 values have large data to join and cost the most time, so I want to
 distribute these key values to every reduce to join

 ** **

 *发件人:* Devaraj k [mailto:devara...@huawei.com]
 *发送时间:* 2012年7月5日 14:06
 *收件人:* mapreduce-user@hadoop.apache.org
 *主题:* RE: How To Distribute One Map Data To All Reduce Tasks?

  ** **

 Can you explain your usecase with some more details?

  

 Thanks

 Devaraj
  --

 *From:* 静行 [xiaoyong.den...@taobao.com]
 *Sent:* Thursday, July 05, 2012 9:53 AM
 *To:* mapreduce-user@hadoop.apache.org
 *Subject:* 答复: How To Distribute One Map Data To All Reduce Tasks?

 Thanks!

 But what I really want to know is how can I distribute one map data to
 every reduce task, not one of reduce tasks.

 Do you have some ideas?

  

 *发件人:* Devaraj k [mailto:devara...@huawei.com]
 *发送时间:* 2012年7月5日 12:12
 *收件人:* mapreduce-user@hadoop.apache.org
 *主题:* RE: How To Distribute One Map Data To All Reduce Tasks?

  

 You can distribute the map data to the reduce tasks using Partitioner.  By
 default Job uses the HashPartitioner. You can use custom Partitioner it
 according to your need.

  


 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html
 

  

 Thanks

 Devaraj
  --

 *From:* 静行 [xiaoyong.den...@taobao.com]
 *Sent:* Thursday, July 05, 2012 9:00 AM
 *To:* mapreduce-user@hadoop.apache.org
 *Subject:* How To Distribute One Map Data To All Reduce Tasks?

 Hi all:

  How can I distribute one map data to all reduce tasks?

  
  --


 This email (including any attachments) is confidential and may be legally
 privileged. If you received this email in error, please delete it
 immediately and do not copy it or use it for any purpose or disclose its
 contents to any other person. Thank you.

 本电邮(包括任何附件)
 可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。



Re:

2012-07-05 Thread Karthik Kambatla
Hi Nishan

Let me forward this to the right list -


Thanks
Karthik

On Thu, Jul 5, 2012 at 6:43 AM, Nishan Shetty nishan.she...@huawei.comwrote:

 Hi

 In CDH security guide it is mentioned that

 “Important

 Remember that the user who launches the job must exist on every node.”

 ** **

 But actually I am successfully able to submit job as user without being
 exist in all the nodes.

 Am I missing something or is this a Bug?

 ** **

 Thanks in advance

 Nishan



Re: multiple input splits from single file

2012-06-11 Thread Karthik Kambatla
Hi Sharat

A couple of questions/comments:

   1. Is your input graph complete?
   2. If it is not complete, it might make sense to use adjacency lists of
   graph nodes as the input to each map function. (Multiple adjacency lists
   for the map task)
   3. Even if it is complete, using the adjacency lists - or a partition of
   the edges as input for each map task might help.

Karthik

On Sun, Jun 10, 2012 at 8:02 AM, sharat attupurath shara...@hotmail.comwrote:

  Hi,

 We are trying to solve the travelling salesman problem using hadoop. our
 input files contain just a single line that has the euclidean coordinates
 of the cities. we need to pass this single line to each mapper who will
 then process that. How can we do this so that we can achieve parallelism in
 a hadoop cluster. Is there any way to generate multiple input splits from
 the single input file.

 Thanks

 Sharat



Re: Is the mapper output type must the same as reducer if combiner is used ?

2009-11-22 Thread Karthik Kambatla
I have never tried it, but the following must also be possible.

map: (k1,v1)  - list(k2,v2)
combine: (k2 ,list(v2)) - list(k3,v3)
reduce: (k3 ,list(v3)) - list(k4,v4)

Karthik Kambatla


On Sun, Nov 22, 2009 at 4:16 AM, Y G gymi...@gmail.com wrote:

 if your combiner is the same as reducer,the output type of mapper must the
 same as
 the input type of reducer.

 map: (k1,v1)  - list(k2,v2)
 combine: (k2 ,list(v2)) - list(k2,v2)
 reduce: (k2 ,list(v2)) - list(k3,v3)
 -
 天天开心
 身体健康
 Sent from Guangzhou, Guangdong, China
 Stephen Leacock
 http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html
 - I detest life-insurance agents: they always argue that I shall some
 day
 die, which is not so.

 2009/11/22 Jeff Zhang zjf...@gmail.com

  Hi all,
 
  As I know, Combiner is used in the mapper task, and most of the time,
  combiner is the same as reducer.
 
  So if combiner is used, the output type of mapper task must the same as
  reducer task, is it right ?
 
 
  Thank you
 
  Jeff Zhang
 



Re: Re: Questions About Passing Parameters to Hado op Job

2009-11-22 Thread Karthik Kambatla
Though it is recommended for large files, DistributedCache might be a good
alternative for you.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
Karthik Kambatla


2009/11/22 Gang Luo lgpub...@yahoo.com.cn

 So, you want to read the sample file in main and add each line to job by
 job.set, and then read these lines in mapper by job.get?

 I think it is better to name the data file as input source to mapper, while
 read the whole sample file in each mapper instance using HDFS api, and then
 compare them. It is actually how map-side join works.


 Gang Luo
 -
 Department of Computer Science
 Duke University
 (919)316-0993
 gang@duke.edu



 - 原始邮件 
 发件人: Boyu Zhang boyuzhan...@gmail.com
 收件人: common-user@hadoop.apache.org
 发送日期: 2009/11/22 (周日) 3:21:23 下午
 主   题: Questions About Passing Parameters to Hadoop Job

 Dear All,

 I am implementing an algorithm that read a data file(.txt file,
 approximately 90MB), compare each line of the data file with each line of a
 specific samples file(.txt file, approximately 20MB). To do this, I need to
 pass each line of the samples file as parameters to map-reduce job. And
 they
 are large, in a sense.

 My current way is that I use the job.set and job.get to set and retrieve
 these lines as configurations. But it is not efficient at all!

 Could anyone help me with an alternative solution? Thanks a million!

 Boyu Zhang
 University of Delaware



   ___
  好玩贺卡等你发,邮箱贺卡全新上线!
 http://card.mail.cn.yahoo.com/