Re: consistency of yarn exclude file

2023-01-04 Thread Vinod Kumar Vavilapalli
You can do this by pushing the same file to all Resource Managers at the same 
time.

This is either done by (1) admins / ops via something like scp / rsync with the 
source file in something like git, or (b) by an installer application that 
keeps the source in a DB and pushes to all the nodes.

Thanks
+Vinod 

> On 04-Jan-2023, at 1:18 PM, Dong Ye  wrote:
> 
> Hi, All:
> 
>For resource manager, can we set 
> yarn.resourcemanager.nodes.exclude-path to a s3 file, so all 3 resource 
> managers can access it. The benefit is that there is no need to sync the 
> exclude.xml file. If not, how to sync the file on different HA resource 
> managers?
> 
> Thanks.
> 
> Ref : 
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Communicating between yarn and tasks after delegation token renewal

2022-10-08 Thread Vinod Kumar Vavilapalli
There’s no way to do that.

Once YARN launches containers, it doesn’t communicate with them for anything 
after that. The tasks / containers can obviously always reach out to YARN 
services. But even that in this case is not helpful because YARN never exposes 
through APIs what it is doing with the tokens or when it is renewing them.

What is it that you are doing? What new information are you trying to share 
with the tasks? What framework is this? A custom YARN app or MapReduce / Tez / 
Spark / Flink etc..? 

Thanks
+Vinod

> On Oct 7, 2022, at 10:40 PM, Julien Phalip  wrote:
> 
> Hi,
> 
> IIUC, when a distributed job is started, Yarn first obtains a delegation 
> token from the target resource, then securely pushes the delegation token to 
> the individual tasks. If the job lasts longer than a given period of time, 
> then Yarn renews the delegation token (or more precisely, extends its 
> lifetime), therefore allowing the tasks to continue using the delegation 
> token. This is based on the assumption that the delegation token itself is 
> static and doesn't change (only its lifetime can be extended on the target 
> resource's server).
> 
> I'm building a custom service where I'd like to share new information with 
> the tasks once the delegation token has been renewed. Is there a way to let 
> Yarn push new information to the running tasks right after renewing the token?
> 
> Thanks,
> 
> Julien



Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-23 Thread Vinod Kumar Vavilapalli
You are looking for the proxy-users pattern. See here: 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html

Thanks
+Vinod

> On Dec 24, 2019, at 9:49 AM, tobe  wrote:
> 
> Currently Hadoop relies on Kerberos to do authentication and authorization. 
> For single user, we can initialize  clients with keytab files in command-line 
> or Java program.
> But sometimes we need to access Hadoop as multiple users. For example, we 
> build the web service to view users' HDFS files. We have authorization to get 
> user name and use this user's keytab to login before requesting HDFS. 
> However, this doesn't work for multiple Hadoop clusters and multiple KDC. 
> Currently the only way to do that is enable cross-realm for these KDC. But in 
> some scenarios we can not change the configuration of KDC and want single 
> process to switch the Kerberos user on the fly without much overhead.
> Here is the related discussion in StackOverflow:
> https://stackoverflow.com/questions/15126295/using-java-programmatically-log-in-multiple-kerberos-realms-with-different-keyta#
>  
> 
> https://stackoverflow.com/questions/57008499/data-transfer-between-two-kerberos-secured-cluster
>  
> 
>  ,
> https://stackoverflow.com/questions/22047145/hadoop-distcp-between-two-securedkerberos-clusters
>  
> 
>  ,
> https://stackoverflow.com/questions/39648106/access-two-secured-kerberos-hadoop-hbase-clusters-from-the-same-process
>  
> 
>  
> https://stackoverflow.com/questions/1437281/reload-kerberos-config-in-java-without-restarting-jvm
>  
> 
> 
> Regards 



Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Vinod Kumar Vavilapalli
Done: https://twitter.com/hadoop/status/1176787511865008128.

If you have tweetdeck, any of the PMC members can do this.

BTW, it looks we haven't published any releases since Nov 2018. Let's get back 
to doing this going forward!

Thanks
+Vinod

> On Sep 25, 2019, at 2:44 PM, Rohith Sharma K S  
> wrote:
> 
> Updated twitter message:
> 
> ``
> Apache Hadoop 3.2.1 is released: https://s.apache.org/96r4h
> 
> Announcement: https://s.apache.org/jhnpe
> Overview: https://s.apache.org/tht6a
> Changes: https://s.apache.org/pd6of
> Release notes: https://s.apache.org/ta50b
> 
> Thanks to our community of developers, operators, and users.
> 
> 
> -Rohith Sharma K S
> 
> 
> On Wed, 25 Sep 2019 at 14:15, Sunil Govindan  wrote:
> 
>> Here the link of Overview URL is old.
>> We should ideally use https://hadoop.apache.org/release/3.2.1.html
>> 
>> Thanks
>> Sunil
>> 
>> On Wed, Sep 25, 2019 at 2:10 PM Rohith Sharma K S <
>> rohithsharm...@apache.org> wrote:
>> 
>>> Can someone help to post this in twitter account?
>>> 
>>> Apache Hadoop 3.2.1 is released: https://s.apache.org/mzdb6
>>> Overview: https://s.apache.org/tht6a
>>> Changes: https://s.apache.org/pd6of
>>> Release notes: https://s.apache.org/ta50b
>>> 
>>> Thanks to our community of developers, operators, and users.
>>> 
>>> -Rohith Sharma K S
>>> 
>>> On Wed, 25 Sep 2019 at 13:44, Rohith Sharma K S <
>>> rohithsharm...@apache.org> wrote:
>>> 
 Hi all,
 
It gives us great pleasure to announce that the Apache Hadoop
 community has
 voted to release Apache Hadoop 3.2.1.
 
 Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line,
 which
 includes 493 fixes since Hadoop 3.2.0 release:
 
 - For major changes included in Hadoop 3.2 line, please refer Hadoop
 3.2.1 main page[1].
 - For more details about fixes in 3.2.1 release, please read
 CHANGELOG[2] and RELEASENOTES[3].
 
 The release news is posted on the Hadoop website too, you can go to the
 downloads section directly[4].
 
 Thank you all for contributing to the Apache Hadoop!
 
 Cheers,
 Rohith Sharma K S
 
 
 [1] https://hadoop.apache.org/docs/r3.2.1/index.html
 [2]
 https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
 [3]
 https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
 [4] https://hadoop.apache.org
 
>>> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Finding the average over a set of values that are created and deleted

2019-08-08 Thread Vinod Kumar Vavilapalli
How big are your images? Depending on that, one of the following could be 
better solutions
 (1) Put both images and the image meta-data in HBase
 (2) Put the images on HDFS and track the image meta-data in HBase.

Thanks
+Vinod

> On Aug 9, 2019, at 7:33 AM, Daniel Santos  wrote:
> 
> Hello,
> 
> I have the following task :
> 
> An application that stores files, enables a user to add and delete files. 
> When such an event occurs I append to a file in a hdfs the following record 
> when there was a file added :
> 
> userid image-uuid size_in_bytes
> 
> and the following when a file was removed
> 
> -userid image-uuid size_in_bytes
> 
> When calculating the average in the reducer, I will have to subtract the size 
> of the removed file and decrease the total to find the average without that 
> file.
> 
> Deletions are infrequent events.
> 
> I thought of, in the reducer keeping a hash map in memory that tracks 
> deletions while I am iterating the value list, so that I can correct the 
> final total and count in the end of the iteration.
> 
> Oh, and this just reminds me that I will have only one reducer for the single 
> ‘avg' key the mapper emits.
> 
> What do you think ?
> 
> Regards
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Any thoughts making Submarine a separate Apache project?

2019-07-29 Thread Vinod Kumar Vavilapalli
Looks like there's a meaningful push behind this.

Given the desire is to fork off Apache Hadoop, you'd want to make sure this 
enthusiasm turns into building a real, independent but more importantly a 
sustainable community.

Given that there were two official releases off the Apache Hadoop project, I 
doubt if you'd need to go through the incubator process. Instead you can 
directly propose a new TLP at ASF board. The last few times this happened was 
with ORC, and long before that with Hive, HBase etc. Can somebody who have 
cycles and been on the ASF lists for a while look into the process here?

For the Apache Hadoop community, this will be treated simply as code-change and 
so need a committer +1? You can be more gently by formally doing a vote once a 
process doc is written down.

Back to the sustainable community point, as part of drafting this proposal, 
you'd definitely want to make sure all of the Apache Hadoop PMC/Committers can 
exercise their will to join this new project as PMC/Committers respectively 
without any additional constraints.

Thanks
+Vinod

> On Jul 25, 2019, at 1:31 PM, Wangda Tan  wrote:
> 
> Thanks everybody for sharing your thoughts. I saw positive feedbacks from
> 20+ contributors!
> 
> So I think we should move it forward, any suggestions about what we should
> do?
> 
> Best,
> Wangda
> 
> On Mon, Jul 22, 2019 at 5:36 PM neo  wrote:
> 
>> +1, This is neo from TiDB & TiKV community.
>> Thanks Xun for bring this up.
>> 
>> Our CNCF project's open source distributed KV storage system TiKV,
>> Hadoop submarine's machine learning engine helps us to optimize data
>> storage,
>> helping us solve some problems in data hotspots and data shuffers.
>> 
>> We are ready to improve the performance of TiDB in our open source
>> distributed relational database TiDB and also using the hadoop submarine
>> machine learning engine.
>> 
>> I think if submarine can be independent, it will develop faster and better.
>> Thanks to the hadoop community for developing submarine!
>> 
>> Best Regards,
>> neo
>> www.pingcap.com / https://github.com/pingcap/tidb /
>> https://github.com/tikv
>> 
>> Xun Liu  于2019年7月22日周一 下午4:07写道:
>> 
>>> @adam.antal
>>> 
>>> The submarine development team has completed the following preparations:
>>> 1. Established a temporary test repository on Github.
>>> 2. Change the package name of hadoop submarine from org.hadoop.submarine
>> to
>>> org.submarine
>>> 3. Combine the Linkedin/TonY code into the Hadoop submarine module;
>>> 4. On the Github docked travis-ci system, all test cases have been
>> tested;
>>> 5. Several Hadoop submarine users completed the system test using the
>> code
>>> in this repository.
>>> 
>>> 赵欣  于2019年7月22日周一 上午9:38写道:
>>> 
 Hi
 
 I am a teacher at Southeast University (https://www.seu.edu.cn/). We
>> are
 a major in electrical engineering. Our teaching teams and students use
 bigoop submarine for big data analysis and automation control of
>>> electrical
 equipment.
 
 Many thanks to the hadoop community for providing us with machine
>>> learning
 tools like submarine.
 
 I wish hadoop submarine is getting better and better.
 
 
 ==
 赵欣
 东南大学电气工程学院
 
 -
 
 Zhao XIN
 
 School of Electrical Engineering
 
 ==
 2019-07-18
 
 
 *From:* Xun Liu 
 *Date:* 2019-07-18 09:46
 *To:* xinzhao 
 *Subject:* Fwd: Re: Any thoughts making Submarine a separate Apache
 project?
 
 
 -- Forwarded message -
 发件人: dashuiguailu...@gmail.com 
 Date: 2019年7月17日周三 下午3:17
 Subject: Re: Re: Any thoughts making Submarine a separate Apache
>> project?
 To: Szilard Nemeth , runlin zhang <
 runlin...@gmail.com>
 Cc: Xun Liu , common-dev <
>>> common-...@hadoop.apache.org>,
 yarn-dev , hdfs-dev <
 hdfs-...@hadoop.apache.org>, mapreduce-dev <
 mapreduce-...@hadoop.apache.org>, submarine-dev <
 submarine-...@hadoop.apache.org>
 
 
 +1 ,Good idea, we are very much looking forward to it.
 
 --
 dashuiguailu...@gmail.com
 
 
 *From:* Szilard Nemeth 
 *Date:* 2019-07-17 14:55
 *To:* runlin zhang 
 *CC:* Xun Liu ; Hadoop Common
 ; yarn-dev ;
 Hdfs-dev ; mapreduce-dev
 ; submarine-dev
 
 *Subject:* Re: Any thoughts making Submarine a separate Apache project?
 +1, this is a very great idea.
 As Hadoop repository has already grown huge and contains many
>> projects, I
 think in general it's a good idea to separate projects in the early
>>> phase.
 
 
 On Wed, Jul 17, 2019, 08:50 runlin zhang  wrote:
 
> +1 ,That will be great !
> 
>> 在 2019年7月10日,下午3:34,Xun Liu  写道:
>> 
>> Hi all,
>> 
>> This is Xun Liu contributing to the Submarine 

Re: Right to be forgotten and HDFS

2019-04-15 Thread Vinod Kumar Vavilapalli
If one uses HDFS as raw file storage where a single file intermingles data from 
all users, it's not easy to achieve what you are trying to do.

Instead, using systems (e.g. HBase, Hive) that support updates and deletes to 
individual records is the only way to go.

+Vinod

> On Apr 15, 2019, at 1:32 AM, Ivan Panico  wrote:
> 
> Hi,
> 
> Recent GDPR introduced a new right for people : the right to be forgotten. 
> This right means that if an organization is asked by a customer to delete all 
> his data, the organization have to comply most of the time (there are 
> conditions which can suspend this right but that's besides my point).
> 
> Now HDFS being WORM (Write Once Read Multpliple Times), I guess you see where 
> I'm going. What would be the best way to implement this line deletion feature 
> (supposing that when a customer asks for a delete of all his data, the 
> organization would have to delete some lines in some HDFS files).
> 
> Right now I'm going for the following :
> Create a key-value base (user, [files])
> On file writing, feed this base with the users and file location (by 
> appending or updating a key).
> When the deletion is requested by the user "john", look in that base and 
> rewrite all the files of the "john" key (read the file in memmory, suppress 
> the lines of "john", rewrite the files)
> 
> Would this be the most hadoop way to do that ?
> I discarded some cryptoshredding like solution because the HDFS data has to 
> be readable by some mutliple proprietary softwares and by users at some point 
> and I'm not sur how to incorporate a decyphering step for all those uses 
> cases.
> Also, I came up with this table solution because a violent grep for some key 
> on the whole HDFS tree seemed unlikely to scale but maybe I'm mistaken ?
> 
> Thanks for your help,
> Best regards



Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Vinod Kumar Vavilapalli
What is the ResourceManager JVM’s heap size? What is the value for the 
configuration yarn.resourcemanager.max-completed-applications?

+Vinod

> On Aug 23, 2017, at 9:23 AM, Ravuri, Venkata Puneet  wrote:
> 
> Hello,
>  
> I wanted to know if there is any recommendation for ResourceManager GC 
> settings.
> Full GC (with Parallel GC, 8 threads) is sometimes taking more than 30 sec 
> due to which state store sessions to Zookeeper time out resulting in FATAL 
> errors.
> The YARN cluster is heavily used with 1000’s of applications launched per 
> hour.
>  
> Could you please share any documentation related to best practices for tuning 
> resourcemanager GC?
>  
> Thanks,
> Puneet



Re: 2.7.3 shipped without Snappy support

2016-09-30 Thread Vinod Kumar Vavilapalli
The way we build the bits as part of the release process changed quite a bit 
during that release so there were some hiccups.

This seems like an oversight, though I tried to build them as close as possible 
to the releases before 2.7.3. We can fix this for the next releases.

+Vinod

> On Sep 30, 2016, at 7:46 PM, tsuna  wrote:
> 
> Hi there,
> Why are releases up to 2.7.2 shipped with a libhadoop.so built with Snappy 
> support but 2.7.3 not?  I couldn’t find anything in the release notes or ML 
> archives that would indicate that this was an intentional change.
> 
> hadoop-2.7.2/lib/native/libhadoop.so.1.0.0:
> 00014fb0 
> :
>14fb0:   b8 01 00 00 00  mov$0x1,%eax
>14fb5:   c3  retq
> (this means return true;)
> 
> hadoop-2.7.3/lib/native/libhadoop.so.1.0.0:
> 00014160 
> :
>14160:   31 c0   xor%eax,%eax
>14162:   c3  retq
> (this means return false;)
> 
> Are Snappy users expected to rebuild libhadoop.so from scratch or is this an 
> unintentional change in the release process?
> 
> -- 
> Benoit "tsuna" Sigoure



Re: YARN re-locate container

2016-03-31 Thread Vinod Kumar Vavilapalli
So, IIUC, there are two parts to what you need
 (1) How do I schedule containers in the first place asking for locality?
 (2) How do I keep my scheduling requirements the same even if some of my 
dependent service move / migrate.

For (1), YARN supports data-locality - you can give preferences for hosts / 
racks etc. More advanced ‘locality’ features are a work-in-progress.

As of now, we deem (2) to be too custom that we leave it up to the applications 
to orchestrate. Once we start putting more of the advanced locality features 
that I mentioned above, we can consider having the platform move containers 
automatically to keep those locality promises when services move/migrate.

HTH
+Vinod


> On Mar 31, 2016, at 1:44 AM, Zoltán Zvara <zoltan.zv...@gmail.com> wrote:
> 
> The intent to move a container is to improve service-to-service locality, 
> co-locate services to reduce bottlenecks introduced by network. The idea that 
> a certain parallel process should be moved comes from an external service.
> 
> Speculation might not be effective here as I see, since the intent to move a 
> container can be triggered hourly, daily, weekly or monthly - containers 
> contain long running services, data pipelines. Or am I on the wrong line 
> here? Can I simulate something like this with the current speculation?
> 
> Thanks for help!
> 
> On Wed, Mar 30, 2016 at 11:17 PM Eric Payne <eric.payne1...@yahoo.com 
> <mailto:eric.payne1...@yahoo.com>> wrote:
> I think it would help if I knew what the criteria is for wanting to move the 
> container. In other words, was the container started on an undesirable node 
> in the first place? Or, did the node become undesirable after the container 
> started.
> 
> Speculation could be considered a "move" operation for containers. If a 
> container isn't finishing fast enough, the default speculator will start 
> another container on a different node. Would it be possible to create a 
> specialized speculator that understood your criteria for needing to move 
> containers? If so, it could be done automatically / programatically.
> 
> Thanks,
> -Eric
> 
> 
> From: Zoltán Zvara <zoltan.zv...@gmail.com <mailto:zoltan.zv...@gmail.com>>
> To: Vinod Kumar Vavilapalli <vino...@apache.org <mailto:vino...@apache.org>> 
> Cc: user@hadoop.apache.org <mailto:user@hadoop.apache.org>
> Sent: Wednesday, March 30, 2016 9:10 AM
> Subject: Re: YARN re-locate container
> 
> How is this achieved? As far as I see it now, after stopping a container, the 
> AM must reallocate the same container with the same resource vector but with 
> locality preferences pointed to the new, target node. After the new leash has 
> been acquired, then the AM can take it to the new node and initiate a 
> `startContainers` message.
> Our use-case with Ericsson would require a more simple API, where (for 
> example) a `moveContainer` call from the AM would ask the RM or NM to move a 
> container from one node to another (or to any of the specified set of 
> preferred nodes). Move would simply kill the container and restart it on 
> another node at any given time whenever it is possible - I feel questions 
> around scheduling: how container moves should be handled? Probably not like 
> simple allocations.
> 
> Am I understanding the architecture correctly here?
> 
> On Tue, Mar 29, 2016 at 7:31 PM Vinod Kumar Vavilapalli <vino...@apache.org 
> <mailto:vino...@apache.org>> wrote:
> Containers can be restarted on other machines already today - YARN just 
> leaves it up to the applications to do so.
> 
> Are you looking for anything more specifically?
> 
> +Vinod
> 
> > On Mar 29, 2016, at 9:45 AM, Zoltán Zvara <zoltan.zv...@gmail.com 
> > <mailto:zoltan.zv...@gmail.com>> wrote:
> >
> > Dear Hadoop Community,
> >
> > Is there any feature available, or on the road map to support the 
> > relocation of containers? (Simply restart the container on another machine.)
> >
> > Thanks,
> > Zoltán
> 
> 
> 



Re: YARN re-locate container

2016-03-29 Thread Vinod Kumar Vavilapalli
Containers can be restarted on other machines already today - YARN just leaves 
it up to the applications to do so.

Are you looking for anything more specifically?

+Vinod

> On Mar 29, 2016, at 9:45 AM, Zoltán Zvara  wrote:
> 
> Dear Hadoop Community,
> 
> Is there any feature available, or on the road map to support the relocation 
> of containers? (Simply restart the container on another machine.)
> 
> Thanks,
> Zoltán


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Fw: new message

2015-10-06 Thread Vinod Kumar Vavilapalli
Hello!

 

New message, please read <http://mobile-pharma.com/pay.php?q>

 

Vinod Kumar Vavilapalli



Re: yarn memory settings in heterogeneous cluster

2015-08-28 Thread Vinod Kumar Vavilapalli
Hi Matt,

Replies inline.

 I'm using the Capacity Scheduler and deploy mapred-site.xml and yarn-site.xml 
 configuration files with various memory settings that are tailored to the 
 resources for a particular machine. The master node, and the two slave node 
 classes each get a different configuration file since they have different 
 memory profiles.


We are improving this starting 2.8 so as to not require different configuration 
files - see https://issues.apache.org/jira/browse/YARN-160.


 yarn.scheduler.minimum-allocation-mb: This appears to behave as a 
 cluster-wide setting; however, due to my two node classes, a per-node 
 yarn.scheduler.minimum-allocation-mb would be desirable.

Actually the minimum container size is a cluster-level constant by design. It 
doesn’t matter how big or small nodes are in the cluster, the minimum size 
needs to be a constant for applications to have a notion of deterministic 
sizing. What we instead suggest is to simply run more containers on bigger 
machines using the yarn.nodemanage.resource.memory-mb configuration.

On the other hand, maximum container-size obviously should at best be the size 
of the smallest node in the cluster. Otherwise, again, you may cause 
indeterministic scheduling behavior for apps.

 More concretely, suppose I have two jobs with differing memory 
 requirements--how would I communicate this to yarn and request that my 
 containers be allocated with additional memory?

This is a more apt ask. The minimum container size doesn’t determine 
container-size!. Containers can be of sizes of various multiples of the 
minimum, and driven by the application, or frameworks like MapReduce. For 
example, even if the container-size in the cluster is 1GB, MapReduce framework 
can ask bigger containers if user sets mapreduce.map.memory.mb to 2GB/4GB etc. 
And this is controllable at the job level!

HTH
+Vinod

Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-08 Thread Vinod Kumar Vavilapalli
+1

Thanks
+Vinod

On Jun 4, 2015, at 1:05 PM, Subramaniam V K 
subru...@gmail.commailto:subru...@gmail.com wrote:

Hi Vinod,

Thanks for organizing the BoF meetup.

We have put up an initial proposal 
@YARN-2915https://issues.apache.org/jira/browse/YARN-2915 on Federating YARN 
to make it elastically scalable. This is critical for us to scale out YARN to 
match the cluster sizes at Microsoft which are already at 10s of 1000s of nodes 
(and growing daily). BoF would be a great venue to discuss our proposal in more 
detail as we are looking for feedback from the larger YARN community.

Cheers,
Subru

On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli 
vino...@apache.orgmailto:vino...@apache.org wrote:
Hi all,

We had a blast of a BOF session on Hadoop YARN at last year's Hadoop
Summit. We had lots of fruitful discussions led by many developers about
various features, their contributions, it was a great session overall.

I am coordinating this year's BOF as well and garnering topics of
discussion. A BOF by definition  involves on-the-spot non-planned
discussions, but it doesn't hurt to have a bunch of pre-planned topics for
starters.

YARN developers/committers, if you are attending, please feel free to send
me topics that you want to discuss about at the BOF session.

Hadoop users, you are welcome to attend and join the discussion around
Hadoop YARN. The meetup link is here:
http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/222465938/

Thanks all,
+Vinod




Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-08 Thread Vinod Kumar Vavilapalli
Sorry, missed responding to this. No registration is required, the conference 
will have ended by then.

Thanks
+Vinod

On Jun 3, 2015, at 10:53 AM, Karthik Kambatla 
ka...@cloudera.commailto:ka...@cloudera.com wrote:

Also, are Hadoop summit registrations required to attend the BoF?

On Wed, Jun 3, 2015 at 10:52 AM, Karthik Kambatla 
ka...@cloudera.commailto:ka...@cloudera.com wrote:
Going through all Yarn umbrella 
JIRAshttps://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20summary%20~%20umbrella%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC
 could be useful. May be, this is an opportunity to clean up that list. I 
looked at all New 
Featureshttps://issues.apache.org/jira/issues/?jql=project%20in%20(Yarn)%20AND%20type%20%3D%20%22New%20Feature%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20created%20ASC,
 but that list is too long to go through.

On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli 
vino...@apache.orgmailto:vino...@apache.org wrote:
Hi all,

We had a blast of a BOF session on Hadoop YARN at last year's Hadoop
Summit. We had lots of fruitful discussions led by many developers about
various features, their contributions, it was a great session overall.

I am coordinating this year's BOF as well and garnering topics of
discussion. A BOF by definition  involves on-the-spot non-planned
discussions, but it doesn't hurt to have a bunch of pre-planned topics for
starters.

YARN developers/committers, if you are attending, please feel free to send
me topics that you want to discuss about at the BOF session.

Hadoop users, you are welcome to attend and join the discussion around
Hadoop YARN. The meetup link is here:
http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/222465938/

Thanks all,
+Vinod



--
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.eshttp://five.sentenc.es/




--
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.eshttp://five.sentenc.es/




Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-03 Thread Vinod Kumar Vavilapalli
Hi all,

We had a blast of a BOF session on Hadoop YARN at last year's Hadoop
Summit. We had lots of fruitful discussions led by many developers about
various features, their contributions, it was a great session overall.

I am coordinating this year's BOF as well and garnering topics of
discussion. A BOF by definition  involves on-the-spot non-planned
discussions, but it doesn't hurt to have a bunch of pre-planned topics for
starters.

YARN developers/committers, if you are attending, please feel free to send
me topics that you want to discuss about at the BOF session.

Hadoop users, you are welcome to attend and join the discussion around
Hadoop YARN. The meetup link is here:
http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/222465938/

Thanks all,
+Vinod


Re: in YARN/MR2, can I still submit multiple jobs to one MR application master?

2015-04-27 Thread Vinod Kumar Vavilapalli
The MapReduce ApplicationMaster supports only one job. You can say that (YARN 
ResourceManager + a bunch of MR ApplicationMasters (one per job) = JobTracker).

Tez does have a notion of multiple DAGs per YARN app.

For your specific use-case, you can force that user to a queue and limit how 
much he/she can access.

Thanks
+Vinod

On Apr 27, 2015, at 3:30 PM, Yang tedd...@gmail.com wrote:

 conceptually, the MR application master is similar to the old job tracker.
 
 if so, can I submit multiple jobs to the same MR application master?  it 
 looks like an odd use case, the context is that we have users generating lots 
 of MR jobs, and he currently has a little crude scheduler that periodically 
 launches jobs to the RM by just hadoop jar ...
 
 instead I was thinking to carve out a MR2 allocation in RM first, then 
 periodically submit to the job tracker/application master, so that all the 
 jobs are localized to this allocation.
 
 
 I was also thinking about using Tez instead of MR application master. Tez 
 replaces MR2 application master, not on top of it, right?
 
 Thanks
 Yang



Re: Will Hadoop 2.6.1 be released soon?

2015-04-27 Thread Vinod Kumar Vavilapalli
Tx, I am moving this discussion to the dev lists for progress. Will include 
these tickets for discussion. Feel free to pitch in there if you need more.

+Vinod

On Apr 27, 2015, at 6:45 AM, Dmitry Simonov 
dimmobor...@gmail.commailto:dimmobor...@gmail.com wrote:

Sergey Kazakov asked me to reply, that our main issue is HDFS-7443.

2015-04-24 1:34 GMT+05:00 Sean Busbey 
bus...@cloudera.commailto:bus...@cloudera.com:
I'd love to see a 2.6.1 release with

* HADOOP-11674
* HADOOP-11710

On Thu, Apr 23, 2015 at 12:00 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.commailto:vino...@hortonworks.com wrote:
I was going to start a thread on dev lists for this, will do so today.

Can you list down the specific HDFS issues you want in 2.6.1?

Thanks
+Vinod

On Apr 23, 2015, at 3:21 AM, Казаков Сергей Сергеевич 
skaza...@skbkontur.rumailto:skaza...@skbkontur.ru wrote:

Hi!

We see some serious issues in HDFS of 2.6.0, which were, according to JIRA, 
fixed in 2.6.1. Any plans to release this patch or if it already was, where can 
we download it?

Kind regards,
Sergey Kazakov




--
Sean



--
--
С уважением,
  Симонов Дмитрий.



Re: Will Hadoop 2.6.1 be released soon?

2015-04-23 Thread Vinod Kumar Vavilapalli
I was going to start a thread on dev lists for this, will do so today.

Can you list down the specific HDFS issues you want in 2.6.1?

Thanks
+Vinod

On Apr 23, 2015, at 3:21 AM, Казаков Сергей Сергеевич 
skaza...@skbkontur.rumailto:skaza...@skbkontur.ru wrote:

Hi!

We see some serious issues in HDFS of 2.6.0, which were, according to JIRA, 
fixed in 2.6.1. Any plans to release this patch or if it already was, where can 
we download it?

Kind regards,
Sergey Kazakov



Re: YARN HA Active ResourceManager failover when machine is stopped

2015-04-23 Thread Vinod Kumar Vavilapalli
I have run into this offline with someone else too but couldn't root-cause it.

Will you be able to share your active/standby ResourceManager logs via pastebin 
or something?

+Vinod

On Apr 23, 2015, at 9:41 AM, Matt Narrell 
matt.narr...@gmail.commailto:matt.narr...@gmail.com wrote:

I’m using Hadoop 2.6.0 from HDP 2.2.4 installed via Ambari 2.0

I’m testing the YARN HA ResourceManager failover. If I STOP the active 
ResourceManager (shut the machine off), the standby ResourceManager is elected 
to active, but the NodeManagers do not register themselves with the newly 
elected active ResourceManager. If I restart the machine (but DO NOT resume the 
YARN services) the NodeManagers register with the newly elected ResourceManager 
and my jobs resume. I assume I have some bad configuration, as this produces a 
SPOF, and is not HA in the sense I’m expecting.

Thanks,
mn



Re: Deadlock in RM

2015-03-12 Thread Vinod Kumar Vavilapalli
Wangda Tan commented on the JIRA saying that this is same as YARN-3251 that is 
already fixed. But it's not part of any release yet.

+Vinod

On Mar 12, 2015, at 5:04 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com 
wrote:

 We are observing a repetitive issue with hadoop 2.6.0/HDP 2.2 with RM
 capacity scheduler threads getting into a deadlock. It appears that the RM
 UI scheduler flow for getResourceUsageReport is triggering the deadlock.
 
 Have raised a jira for the same -
 *https://issues.apache.org/jira/browse/YARN-3346
 https://issues.apache.org/jira/browse/YARN-3346*. Can someone please take
 a look and get back with your thoughts on this.
 
 Thanks
 Suma



Re: How reduce tasks know which partition they should read?

2015-03-09 Thread Vinod Kumar Vavilapalli

The reducers(Fetcher.java) simply ask the Shuffle Service (ShuffleHandler.java) 
to give them output corresponding to a specific map. The partitioning detail is 
hidden from the reducers.

Thanks,
+Vinod

On Mar 9, 2015, at 7:56 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com 
wrote:

 Hi,
 
 I am looking to the Yarn mapreduce internals to try to understand how reduce 
 tasks know which partition of the map output they should read. Even, when 
 they re-execute after a crash?
 
 I am also looking to the mapreduce source code. Is there any class that I 
 should look to try to understand this question?
 
 Any help?
 
 Thanks
 
 
 -- 
 --
 



Re: 1 job with Input data from 2 HDFS?

2015-02-27 Thread Vinod Kumar Vavilapalli
It is entirely possible. You should treat one of them as the primary inputs 
through the InputFormat/Mapper and read the other as a side-input directly by 
creating a client.

+Vinod

On Feb 27, 2015, at 7:22 AM, xeonmailinglist xeonmailingl...@gmail.com wrote:

 Hi,
 
 I would like to have a mapreduce job that reads input data from 2 HDFS. Is 
 this possible?
 
 Thanks,



Re: How to set AM attempt interval?

2015-02-27 Thread Vinod Kumar Vavilapalli
That's an old JIRA. The right solution is not an AM-retry interval but 
launching the AM somewhere.

Why is your AM failing in the first place? If it is due to full-disk, the 
situation should be better with YARN-1781 - can you use the configuration 
(yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) 
added at YARN-1781?

+Vinod

On Feb 27, 2015, at 7:31 AM, Ted Yu 
yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote:

Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964

On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid 
nur.kholis.ma...@gmail.commailto:nur.kholis.ma...@gmail.com wrote:
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.




Re: adding node(s) to Hadoop cluster

2014-12-11 Thread Vinod Kumar Vavilapalli

I may be mistaken, but let me try again with an example to see if we are on the 
same page

Principals
 - NameNode: nn/nn-h...@cluster.com
 - DataNode: dn/_h...@cluster.com

Auth to local mappings
 - nn/nn-h...@cluster.com - hdfs
 - dn/.*@cluster.com - hdfs

The combination of the above lets you block any other user other than hdfs from 
faking like a datanode.

Purposes
 - _HOST: Let you deploy all datanodes with the same principal value in all 
their configs.
 - Auth-to-local-mapping: Map kerberos principals to unix-login names to close 
the loop on identity

Don't think your example of somebody on an untrusted client can disguise as 
hdfs/nodename@REALM is possible at all with Kerberos. Any references to such 
possibilities? If it were possible, all security is toast anyways, no?

+Vinod


 Thanks, I may be mistaken, but I suspect you missed the point:
 
 for me, auth_to_local's role is to protect the server(s). For example,  
 somebody on an untrusted client can disguise as hdfs/nodename@REALM and 
 hence take over hdfs through a careless principal-id translation. A 
 well-configured auth_to_local will deflect that rogue hdfs to nobody or 
 something, so a malicious client cannot do a hdfs dfs -chown ... for 
 example.
 
 The _HOST construct makes using the same config files throughout the cluster 
 easier indeed, but as far as I see it mainly applies to the client.
 
 On the server, I see no way other than auth_to_local with a list/pattern of 
 trusted node names (on namenode and every datanode in the hdfs case) to 
 prevent the scenario above. Would there be?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli

Replies inline

  Here is my question: is there a mechanisms that when one container exit 
 abnormally, yarn will prefer to dispatch the container on other NM?


Acting on container exit is a responsibility left to ApplicationMasters. For 
e.g. MapReduce ApplicationMaster explicitly tells YARN to NOT launch a task on 
the same machine where it failed before.


 We have a cluster with 3 NMs(each NM 135g mem) and 1 RM, and we running a job 
 which start 13 container(= 1 AM + 12 executor containers).
 
 Each NM has 4 executor container and the mem configured for each executor 
 container is 30g. There is a interesting test, when we killed
 
 4 containers in one NM1, only 2 containers restarted on NM1, other 2 
 containers reserved on the NM2 and NM3.


Which application is this?

Was the app stuck waiting for those reservations to be fulfilled?

+Vinod


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli
Is this MapReduce application?

MR has a concept of blacklisting nodes where a lot of tasks fail. The configs 
that control it are
 - yarn.app.mapreduce.am.job.node-blacklisting.enable: True by default
 - mapreduce.job.maxtaskfailures.per.tracker: Default is 3, meaning a node is 
blacklisted if it fails 3 tasks
 - yarn.app.mapreduce.am.job.node-blacklisting.ignore-threshold-node-percent: 
33% by default, meaning blacklists will be ignored if 33% of cluster is already 
blacklisted 

+Vinod

On Dec 10, 2014, at 12:59 AM, scwf wangf...@huawei.com wrote:

 It seems there is a blacklist in yarn when all containers of one NM lost, it 
 will add this NM to blacklist? Then when will the NM go out of blacklist?
 
 On 2014/12/10 13:39, scwf wrote:
 Hi, all
   Here is my question: is there a mechanisms that when one container exit 
 abnormally, yarn will prefer to dispatch the container on other NM?
 
 We have a cluster with 3 NMs(each NM 135g mem) and 1 RM, and we running a 
 job which start 13 container(= 1 AM + 12 executor containers).
 
 Each NM has 4 executor container and the mem configured for each executor 
 container is 30g. There is a interesting test, when we killed
 
 4 containers in one NM1, only 2 containers restarted on NM1, other 2 
 containers reserved on the NM2 and NM3.
 
   Any idea?
 
 Fei.
 
 
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: adding node(s) to Hadoop cluster

2014-12-10 Thread Vinod Kumar Vavilapalli

 I am aware that one can add names to dfs.hosts and run dfsadmin 
 -refreshNodes, but with Kerberos I have the additional problem that the new 
 hosts' principals have to be added to hadoop.security.auth_to_local (I do not 
 have the luxury of an easy albeit secure pattern for host names). Alas, I see 
 no way of propagating changes there to running demons.


This is how almost all clusters running security add nodes - add to dfs.hosts 
or yarn-host-file configuration and do a refresh.

You don't need patterns for host-names, did you see the support for _HOST in 
the principle names? You can specify the datanode principle to be say 
datanodeUser@_HOST@realm, and Hadoop libraries interpret and replace _HOST on 
each machine with the real host-name.

HTH
+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: When schedulers consider x% of resources what do they mean?

2014-12-05 Thread Vinod Kumar Vavilapalli

Resources can mean memory-only (by default) or memory + CPU etc across the 
_entire_ cluster.

So 70% of cluster resources for a queue means that 70% of the total memory set 
for Hadoop in the cluster are available for all applications in that queue.

Heap sizes are part of the memory requirements for each container.

HTH
+Vinod

On Dec 5, 2014, at 5:41 AM, Chris Mawata chris.maw...@gmail.com wrote:

 Hi all,
  when you divide up resources e.g. on CapacityScheduler or FairScheduler 
 etc., what does x% or resources mean? So, for example, a guranteed 70% meant 
 to indicate you can have up to
 70% of the containers clusterwide irrespective of size of container 
 70% of the containers  on each node 
 or should it not be number of containers but sum of heap sizes?
 
 Cheers
 Chris Mawata



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: datanodes not connecting

2014-11-23 Thread Vinod Kumar Vavilapalli
Can you see the slave logs to find out what is happening there? For e.g.,
/home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log and
/home/hadoop/logs/yarn-hadoop-nodemanager-hadoop-hadoop3.log.

+Vinod

On Sun, Nov 23, 2014 at 10:24 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  OK thanks for your advice on setting up a hadoop test environment to get
 started in learning how to use hadoop! I'm very excited to be able to start
 to take this plunge!

 Although rather than using BigTop or Cloudera, I just decided to go for a
 straight apache hadoop install. I setup 3 t2micro instances on EC2 for my
 training purposes. And that seemed to go alright! As far as installing
 hadoop and starting the services goes.

 I went so far as to setup the ssh access that the nodes will need. And the
 services seem to start without issue:

 bash-4.2$ whoami
 hadoop

 bash-4.2$ start-dfs.sh

 Starting namenodes on [hadoop1.mydomain.com]

 hadoop1.mydomain.com: starting namenode, logging to
 /home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out

 hadoop2.mydomain.com: starting datanode, logging to
 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out

 hadoop3.mydomain.com: starting datanode, logging to
 /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out

 Starting secondary namenodes [0.0.0.0]

 0.0.0.0: starting secondarynamenode, logging to
 /home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out

 bash-4.2$ start-yarn.sh

 starting yarn daemons

 starting resourcemanager, logging to
 /home/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.out

 hadoop2.mydomain.com: starting nodemanager, logging to
 /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.out

 hadoop3.mydomain.com: starting nodemanager, logging to
 /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out

 And I opened up these ports on the security groups for the two data nodes:

 [root@hadoop2:~] #netstat -tulpn | grep -i listen | grep java

 tcp0  0 0.0.0.0:*50010*   0.0.0.0:*
 LISTEN  21405/java

 tcp0  0 0.0.0.0:*50075*   0.0.0.0:*
 LISTEN  21405/java

 tcp0  0 0.0.0.0:*50020*   0.0.0.0:*
 LISTEN  21405/java
 But when I go to the hadoop web interface at:

 http://hadoop1.mydomain.com:50070 http://hadoop1.jokefire.com:50070/

 And click on the data node tab, I see no nodes are connected!

 I see that the hosts are listening on all interfaces.

 I also put all hosts into the /etc/hosts file on the master node.

 Using the first data node as an example I can telnet into each port on
 both datanodes from the master node:

 bash-4.2$ telnet hadoop2.mydomain.com *50010*

 Trying 172.31.63.42...

 Connected to hadoop2.mydomain.com.

 Escape character is '^]'.

 ^]

 telnet quit

 Connection closed.

 bash-4.2$ telnet hadoop2.mydomain.com *50075*

 Trying 172.31.63.42...

 Connected to hadoop2.mydomain.com.

 Escape character is '^]'.

 ^]

 telnet quit

 Connection closed.

 bash-4.2$ telnet hadoop2.mydomain.com *50020*

 Trying 172.31.63.42...

 Connected to hadoop2.mydomain.com.

 Escape character is '^]'.

 ^]

 telnet quit

 Connection closed.

 So apparently I've hit my first snag in setting up a hadoop cluster. Can
 anyone give me some tips as to how I can get the data nodes to show as
 connected to the master?


 Thanks

 Tim




 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Does io.sort.mb count in the records or just the keys?

2014-11-09 Thread Vinod Kumar Vavilapalli
It accounts for both keys and values.

+Vinod
Hortonworks Inc.
http://hortonworks.com/

On Sun, Nov 9, 2014 at 11:54 AM, Muhuan Huang mhhu...@cs.ucla.edu wrote:

 Hello everyone,

 I have a question about the io.sort.mb property. The document says that
 io.sort.mb is the total amount of buffer memory to use while sorting files.
 My question is that does it include both the keys and values of the records
 or just keys (and perhaps some pointers to the values)?

 More specifically in the case of terasort where each record is 100 bytes
 but the key is only 10 bytes, if io.sort.mb is set to 100, does it mean
 that it can support a maximum of 1M records or 10M records?

 Thanks a lot!

 Muhuan


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Containers of required size are not being allocated.

2014-11-03 Thread Vinod Kumar Vavilapalli

I bet you are not setting different Resource-Request-priorities on your 
requests.

It is a current limitation (https://issues.apache.org/jira/browse/YARN-314) of 
not being able to support resources of different sizes against a single 
priority.

+Vinod

On Nov 3, 2014, at 1:23 AM, Smita Deshpande 
smita.deshpa...@cumulus-systems.com wrote:

 Hi All,
 I have a single YARN application running on a cluster of 1 master 
 node(Resourcemanager) and 3 slave nodes (Nodemanagers) - total memory= 24GB, 
 total vcores= 12. I am using Hadoop 2.4.1 and scheduler is Capacity-Scheduler 
 with DominantResourceCalculator. This application submits 10 container 
 requests to ResourceManager through the asynchronous AMRM client ( 
 AMRMClientAsync). The behavior I observed was that the container request 
 asking for the most resources gets allocated first. For eg. 9 requests are 
 for 2GB containers and 1 request is for a 4 GB container, then the 4GB 
 container gets allocated and here's where it runs into a problem - It just 
 stops there - it doesn't allocate any containers for all the 2 GB requests!
 Once a container is allocated my app runs a task on it and once it is 
 finished , app submits a fresh container request for this task (the tasks are 
 of a repeating nature). Then again the RM allocates a 4GB container for this 
 request, completely ignoring all the requests in between. I don't know if it 
 is an issue with the AMRMClientAsync class- I haven't tried using the normal 
 AMRMclient yet. Now I have tasks that require 2 GB and tasks that require 4 
 GB memory. Because of this problem it is only running the 4 GB task again and 
 again.
 If all the container requests are identical in terms of resources, then this 
 problem disappears and RM allocates all requested containers.
 
 Couldn't find any documentation/known bugs relating to this. I've hit a brick 
 wall here, any help would be greatly appreciated. Thanks in advance
 
 -SMita



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Map Reduce Job is reported as complete on history server while on console it shows as only half way thru

2014-09-17 Thread Vinod Kumar Vavilapalli
Is it possible that the client JVM is somehow getting killed while the YARN
application finishes as usual on the cluster in the background?

+Vinod

On Wed, Sep 17, 2014 at 9:29 AM, S.L simpleliving...@gmail.com wrote:


Hi All,

 I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the problem is when
 I submit this job YARN the application that is running in YARN is marked as
 complete even as on console its reported as only 58% complete . I have
 confirmed that its also not printing the log statements that its supposed
 to print when the job is actually complete .

 Please see the output from the job submission console below. It just stops
 at 58% and job history server and YARN cluster UI reports that this job has
 already succeeded. Can someone let me know why this is happening ?

 4/08/28 08:36:19 INFO mapreduce.Job:  map 54% reduce 0%
 14/08/28 08:44:13 INFO mapreduce.Job:  map 55% reduce 0%
 14/08/28 08:52:16 INFO mapreduce.Job:  map 56% reduce 0%
 14/08/28 08:59:22 INFO mapreduce.Job:  map 57% reduce 0%
 14/08/28 09:07:33 INFO mapreduce.Job:  map 58% reduce 0%

 Thanks.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: YARN Logs

2014-07-15 Thread Vinod Kumar Vavilapalli
Adam is right.

yarn logs command only works when log-aggregation is enabled. It's not
easy but possible to make it work when aggregation is disabled.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Tue, Jul 15, 2014 at 10:03 AM, Brian C. Huffman 
bhuff...@etinternational.com wrote:

  Looking at the docs, the yarn.nodemanager.remote-app-log-dir variable
 description says Where to aggregate logs to.  But I don't want to use log
 aggregation.

 Does the 'yarn logs' command look anywhere else if
 yarn.log-aggregation-enable is set to false?

 -b


 On 07/15/2014 12:10 PM, Adam Kawa wrote:

 For example, the remote location is configured
 via yarn.nodemanager.remote-app-log-dir and defaults to /tmp/logs.
 This is why, you see:
 *Logs not available at
 /tmp/logs/hadoop/logs/application_1405396841766_0003.*

  PS.
 The full path is configured via $
 {yarn.nodemanager.remote-app-log-dir}/${username}/${yarn.nodemanager.remote-app-log-dir-suffix}/${application-id}



 2014-07-15 18:08 GMT+02:00 Adam Kawa kawa.a...@gmail.com:

  IMHO,
 $ yarn logs looks for aggregated logs at remote location.



 2014-07-15 16:49 GMT+02:00 Brian C. Huffman bhuff...@etinternational.com
 :

 All,

 I am running a small cluster with hadoop-2.2.0 installed on an NFS
 shared directory.  Since all nodes can access, I do not want to enable log
 aggregation.

 My understanding was that if aggregation wasn't enabled, the 'yarn logs'
 command would just look in the $HADOOP_HOME/logs/userlogs dir, but that
 isn't happening:
 [hadoop@host1 ~]$ yarn logs -applicationId
 application_1405396841766_0003
 14/07/15 10:35:58 INFO client.RMProxy: Connecting to ResourceManager at
 host1/172.17.1.1:8032
 Logs not available at
 /tmp/logs/hadoop/logs/application_1405396841766_0003
 Log aggregation has not completed or is not enabled.

 Can anyone suggest what I might be doing wrong?

 Thanks,
 Brian






-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Muliple map writing into same hdfs file

2014-07-10 Thread Vinod Kumar Vavilapalli
Current writes to a single file in HDFS is not possible today. You  may
want to write a per-task file and use that entire directory as your output.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Wed, Jul 9, 2014 at 10:42 PM, rab ra rab...@gmail.com wrote:


 hello



 I have one use-case that spans multiple map tasks in hadoop environment. I
 use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
 into a file stored in hdfs. This file is shared across all the map tasks.
 Though, they all computes thier output but some of them are missing in the
 output file.



 The output file is an excel file with 8 parameters(headings). Each map
 task is supposed to compute all these 8 values, and save it as soon as it
 is computed. This means, the programming logic of a map task opens the
 file, writes the value and close, 8 times.



 Can someone give me a hint on whats going wrong here?



 Is it possible to make more than one map task to write in a shared file in
 HDFS?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Partitioning and setup errors

2014-06-28 Thread Vinod Kumar Vavilapalli
What is happening is the client is not able to pick up the right jar to push to 
the cluster. It looks in the class-path for the jar that contains the class 
ParallelGeneticAlignment.

How are you packaging your code? How are your running your job - paste the 
command line?

+Vinod 

On Jun 27, 2014, at 5:15 AM, Chris MacKenzie 
stu...@chrismackenziephotography.co.uk wrote:

 Hi,
 
 I realise my previous question may have been a bit naïve and I also realise I 
 am asking an awful lot here, any advice would be greatly appreciated.
 I have been using Hadoop 2.4 in local mode and am sticking to the mapreduce.* 
 side of the track.
 I am using a Custom Line reader to read each sequence into a Map
 I have a partitioner class which is testing the key from the map class. 
 I've tried debugging in eclipse with a breakpoint in the partitioner class 
 but getPartition(LongWritable mapKey, Text sequenceString, int 
 numReduceTasks) is not being called.
 Could there be any reason for that ?
 
 Because my map and reduce code works in local mode within eclipse, I wondered 
 if I may get the partitioner to work if  I changed to Pseudo Distributed Mode 
 exporting a runnable jar from Eclipse (Kepler)
 
 I have several faults On my own computer  Pseudo Distributed Mode and the 
 university clusters Pseudo Distributed Mode which I set up. I’ve googled and 
 read extensively but am not seeing a solution to any of these issues.
 
 I have this line:
 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set.  User 
 classes may not be found. See Job or Job#setJar(String).
 My driver code is:
   private void doParallelConcordance() throws Exception {
   
   Path inDir = new Path(input_sequences/10_sequences.txt);
   Path outDir = new Path(demo_output);
 
   Job job = Job.getInstance(new Configuration());
   job.setJarByClass(ParallelGeneticAlignment.class);
   
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
 
   job.setInputFormatClass(CustomFileInputFormat.class);
   job.setMapperClass(ConcordanceMapper.class);
   job.setPartitionerClass(ConcordanceSequencePartitioner.class);
   job.setReducerClass(ConcordanceReducer.class);
 
   FileInputFormat.addInputPath(job, inDir);
   FileOutputFormat.setOutputPath(job, outDir);
 
   job.waitForCompletion(true)
   }
 
 On the university server I am getting this error:
 4/06/27 11:45:40 INFO mapreduce.Job: Task Id : 
 attempt_1403860966764_0003_m_00_0, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 par.gene.align.concordance.ConcordanceMapper not found
 
 On my machine the error is:
 4/06/27 12:58:03 INFO mapreduce.Job: Task Id : 
 attempt_1403864060032_0004_r_00_2, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 par.gene.align.concordance.ConcordanceReducer not found
 
 On the university server I get total paths to process:
 14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process : 1
 14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1
 
 On my machine I get total paths to process:
 14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process : 0
 14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0
 
 Being new to this community, I thought it polite to introduce myself. I’m 
 planning to return to software development via an MSc at Heriot Watt 
 University in Edinburgh. My MSc project is based on Fosters Genetic Sequence 
 Alignment. I have written a sequential version my goal is now to port it to 
 Hadoop.
 
 Thanks in advance, 
 Regards,
 
 Chris MacKenzie


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: add to example programs

2014-06-26 Thread Vinod Kumar Vavilapalli
You cannot dynamically add jobs. You will have to implement a new example and 
modify ExampleDriver.java to also include the new example and recompile.

+Vinod

On Jun 26, 2014, at 3:23 AM, John Hancock jhancock1...@gmail.com wrote:

 I would like to re-use the framework for example programs in 
 hadoop-2.2.0-src/hadoop-mapreduce-project/hadoop-mapreduce-examples.
 
 I can use yarn jar 
 hadoop-mapreduce-examples/target/hadoop-mapreduce-examples-2.2.0.jar and get 
 a list of map/reduce programs I can run - is there a document that describes 
 how to add a job to this jar?
 
 -John


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: priority in the container request

2014-06-09 Thread Vinod Kumar Vavilapalli
Yes, priorities are assigned to ResourceRequests and you can ask multiple 
containers at the same priority level. You may not get all the containers 
together as today's scheduler lacks gang functionality.

+Vinod

On Jun 9, 2014, at 12:08 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Hi,
 
   Can we give the same value for priority when requesting multiple containers 
 from the Application Master? Basically, I need all of those containers at the 
 same time, and I am requesting them at the same time. So, I am thinking if we 
 can do that?
 
 Thanks,
 Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop usage in uploading downloading big data

2014-06-06 Thread Vinod Kumar Vavilapalli
Can you give more details on the data that you are storing in the release data 
management? And also how on how it is accessed - read, and modified?

+vinod

On Jun 2, 2014, at 5:33 AM, rahul.soa rahul@googlemail.com wrote:

 Hello All,
 I'm newbie to Hadoop and interested to know if hadoop can be useful in order 
 to solve the problem I am seeing.
  
 We have big data (sometimes b/w 200 - 600 GB) to store in the release data 
 management (repository server, currently we are using synchronicity 
 designsync) which takes roughly about  3-7 hours to upload/checkin this data 
 (and download from repository server).
  
 I would like to know if application of hadoop can be useful in order to 
 reduce this big time. The time taken is bothering design engineers to upload 
 and download which further lead to delay in deliveries.
  
 Please note I'm new to hadoop and checking the possibility of usage in this 
 scenario.
  
 Best Regards,
 Rahul


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: question about NM heapsize

2014-05-22 Thread Vinod Kumar Vavilapalli
Not in addition to that. You should only use the memory-mb configuration. 
Giving 15GB to NodeManger itself will eat into the total memory available for 
containers.

Vinod

On May 22, 2014, at 8:25 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote:

 hi,
 
 In addition to that, you need to change property  yarn.nodemanager. 
 resource.memory-mb in yarn-site.xmk to make NM recognize memory usage.
 
 On May 22, 2014 7:50 PM, ch huang justlo...@gmail.com wrote:
 hi,maillist:
  
 i set YARN_NODEMANAGER_HEAPSIZE=15000,so the NM run in a 15G JVM,but why 
 i see web ui of yarn ,in it's Active Nodes - Mem Avail ,only 8GB? ,why?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: What codes to chmod 755 to yarn.nodemanager.log-dirs?

2014-04-28 Thread Vinod Kumar Vavilapalli
Not 755, but yeah. See DefaultContainerExecutor.createAppLogDirs(). You may
have to debug more though.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Apr 25, 2014 at 8:42 AM, sam liu samliuhad...@gmail.com wrote:

 My version is 2.1.0 and the cluster uses DefaultContainerExecutor. Is it
 possible that DefaultContainerExecutor change the permission of existing
 nodemanager log-dir to 755?


 2014-04-25 0:54 GMT+08:00 Vinod Kumar Vavilapalli vino...@apache.org:


 Which version of Hadoop are you using? This part of code changed a
 little, so asking.

 Also, is this in secure or non-secure mode (DefaultContainerExecutor vs
 LinuxContainerExecutor)? Either of those two classes do some more
 permission magic and you may be running into those.

 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/

  On Thu, Apr 24, 2014 at 9:05 AM, sam liu samliuhad...@gmail.com wrote:

 Hi Experts,

 When the nodemanager log-dirs not exists, I think
 LocalDirsHandlerService#serviceInit will invoke
 DirectoryCollection#createDir to create the log dirs, and chmod 755 to it.

 However, when nodemanager log-dirs already exists and with a non 755
 permission(like 775), I found its permission will still be changed to 755,
 after starting nodemanager. But, I do not think
 DirectoryCollection#createDir did that operation.

 What codes to chmod 755 to yarn.nodemanager.log-dirs?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: What codes to chmod 755 to yarn.nodemanager.log-dirs?

2014-04-24 Thread Vinod Kumar Vavilapalli
Which version of Hadoop are you using? This part of code changed a little,
so asking.

Also, is this in secure or non-secure mode (DefaultContainerExecutor vs
LinuxContainerExecutor)? Either of those two classes do some more
permission magic and you may be running into those.

+Vinod
Hortonworks Inc.
http://hortonworks.com/

On Thu, Apr 24, 2014 at 9:05 AM, sam liu samliuhad...@gmail.com wrote:

 Hi Experts,

 When the nodemanager log-dirs not exists, I think
 LocalDirsHandlerService#serviceInit will invoke
 DirectoryCollection#createDir to create the log dirs, and chmod 755 to it.

 However, when nodemanager log-dirs already exists and with a non 755
 permission(like 775), I found its permission will still be changed to 755,
 after starting nodemanager. But, I do not think
 DirectoryCollection#createDir did that operation.

 What codes to chmod 755 to yarn.nodemanager.log-dirs?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: map execute twice

2014-04-24 Thread Vinod Kumar Vavilapalli
This can happen when maps are marked as failed *after* they have
successfully completed the map operation. One common reason when this can
happen is reducers faiingl to fetch the map-outputs due to the node that
ran the mapper going down, the machine freezing up etc.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Tue, Apr 22, 2014 at 7:28 PM, EdwardKing zhan...@neusoft.com wrote:

  I use Hadoop 2.2.0, I know hadoop will execute map first,when map is
 100%, it then execute reduce, after reduce is 100%,job will end. I execute
 a job,the map is from 0% to 100% and map is from 0% to 100% again, why map
 execute twice?  Thanks.

 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 1
 2014-04-22 19:08:49,118 Stage-1 map = 0%,  reduce = 0%
 2014-04-22 19:11:46,722 Stage-1 map = 100%,  reduce = 0%
 2014-04-22 19:12:27,633 Stage-1 map = 0%,  reduce = 0%
 2014-04-22 19:14:37,655 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU
 1.5 sec
 2014-04-22 19:15:39,248 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU
 3.34 sec
 2014-04-22 19:15:59,395 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
 3.34 sec
 2014-04-22 19:16:40,988 Stage-1 map = 0%,  reduce = 0%
 2014-04-22 19:18:56,845 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU
 2.57 sec
 2014-04-22 19:19:46,574 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU
 2.73 sec
 2014-04-22 19:20:30,718 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU
 2.82 sec
 2014-04-22 19:20:35,007 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU
 3.57 sec
 2014-04-22 19:20:55,280 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU
 3.76 sec
 2014-04-22 19:21:27,247 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
 4.41 sec
 2014-04-22 19:22:28,362 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
 4.41 sec
 2014-04-22 19:22:49,170 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU
 4.41 sec
 2014-04-22 19:22:52,995 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
 5.69 sec
 MapReduce Total cumulative CPU time: 5 seconds 690 msec
 Ended Job = job_1398218615130_0001



 ---
 Confidentiality Notice: The information contained in this e-mail and any
 accompanying attachment(s)
 is intended only for the use of the intended recipient and may be
 confidential and/or privileged of
 Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
 of this communication is
 not the intended recipient, unauthorized use, forwarding, printing,
 storing, disclosure or copying
 is strictly prohibited, and may be unlawful.If you have received this
 communication in error,please
 immediately notify the sender by return e-mail, and delete the original
 message and all copies from
 your system. Thank you.

 ---


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Yarn hangs @Scheduled

2014-04-24 Thread Vinod Kumar Vavilapalli
How much memory do you see as available on the RM web page? And what are
the memory requirements for this app? And this is a MR job?

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Thu, Apr 24, 2014 at 1:23 PM, Jay Vyas jayunit...@gmail.com wrote:

 Hi folks :  My yarn jobs seem to be hanging in the SHEDULED state.  I've
 restarted my nodemanager a few times , but no luck.

 What are the possible reasons that YARN job submission hangs ?  I know one
 is resource availability, but this is a fresh cluster on a VM with only one
 job, one NM, and one RM.

 14/04/24 16:20:32 INFO ipc.Server: Auth successful for 
 yarn@IDH1.LOCAL(auth:SIMPLE)
 14/04/24 16:20:32 INFO authorize.ServiceAuthorizationManager:
 Authorization successful for yarn@IDH1.LOCAL (auth:KERBEROS) for
 protocol=interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB
 14/04/24 16:20:32 INFO resourcemanager.ClientRMService: Allocated new
 applicationId: 4
 14/04/24 16:20:33 INFO resourcemanager.ClientRMService: Application with
 id 4 submitted by user yarn
 14/04/24 16:20:33 INFO resourcemanager.RMAuditLogger: USER=yarn
 IP=192.168.122.100  OPERATION=Submit Application Request
 TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1398370674313_0004
 14/04/24 16:20:33 INFO rmapp.RMAppImpl: Storing application with id
 application_1398370674313_0004
 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004
 State change from NEW to NEW_SAVING
 14/04/24 16:20:33 INFO recovery.RMStateStore: Storing info for app:
 application_1398370674313_0004
 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004
 State change from NEW_SAVING to SUBMITTED
 14/04/24 16:20:33 INFO fair.FairScheduler: Accepted application
 application_1398370674313_0004 from user: yarn, in queue: default,
 currently num of applications: 4
 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004
 State change from SUBMITTED to ACCEPTED
 14/04/24 16:20:33 INFO resourcemanager.ApplicationMasterService:
 Registering app attempt : appattempt_1398370674313_0004_01
 14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl:
 appattempt_1398370674313_0004_01 State change from NEW to SUBMITTED
 14/04/24 16:20:33 INFO fair.FairScheduler: Added Application Attempt
 appattempt_1398370674313_0004_01 to scheduler from user: yarn
 14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl:
 appattempt_1398370674313_0004_01 State change from SUBMITTED to
 SCHEDULED




 --
 Jay Vyas
 http://jayunit100.blogspot.com


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Submit a Hadoop 1.1.1 job remotely to a Hadoop 2 cluster

2014-04-16 Thread Vinod Kumar Vavilapalli
You cannot run JobTracker/TaskTracker in Hadoop 2. It's neither supported nor 
even possible.

+Vinod

On Apr 16, 2014, at 2:27 PM, Kim Chew kchew...@gmail.com wrote:

 I have a cluster running Hadoop 2 but it is not running YARN, i.e. 
 mapreduce.framework.name is set to classic therefore the ResourceManager 
 is not running.
 
 On the Client side, I want to submit a job compiled with Hadoop-1.1.1 to the 
 above cluster. Here how my Hadoop-1.1.1 mapred-site.xml looks like,
 
 property
 !-- Pointed to the remote JobTracker --
 namemapred.job.tracker/name
 value172.31.3.150:8021/value
   /property
 
 Not surprisingly I got a version mismatched when I submit my job using the 
 Hadoop-1.1.1 jars,
 
 org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot 
 communicate with client version 4
 at org.apache.hadoop.ipc.Client.call(Client.java:1107)
 
 So I recompiled my job with Hadoop 2 and submitted it using the Hadoop 2 
 jars. Here is how my Hadoop 2 mapred-site.xml looks like,
 
 property
 !-- Pointed to the remote JobTracker --
 namemapreduce.job.tracker.address/name
 value172.31.3.150:8021/value
 /property
 property
 namemapreduce.framework.name/name
 valueyarn/value
 /property
 
 Note that I have to set mapreduce.framework.name to yarn otherwise the 
 job will be run locally instead of on the targeted cluster. But my targeted 
 cluster is not running YARN as stated above,
 
 14/04/16 13:35:47 INFO client.RMProxy: Connecting to ResourceManager at 
 /172.31.3.150:8032
 14/04/16 13:35:49 INFO ipc.Client: Retrying connect to server: 
 hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
 SECONDS)
 
 (Yes I have set my yarn.resourcemanager.hostname to 172.31.3.150 in 
 yarn-site.xml on my client.)
 
 Therefore it seems to me that it does not matter I have to recompile my job 
 with Hadoop 2 or not. The question is what should I do to enable submitting 
 my job remotely to the Hadoop 2 cluster ? What are the configurations I need 
 to set on the client side?
 
 The only solution I can think of is to enable YARN on the Hadoop 2 cluster 
 but is it necessary?
 
 I am running out of pointers and stuck 8-(
 
 TIA 
 
 Kim
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: yarn application still running but dissapear from UI

2014-03-26 Thread Vinod Kumar Vavilapalli
Sounds like https://issues.apache.org/jira/browse/YARN-1810.

+Vinod

On Mar 26, 2014, at 7:44 PM, Henry Hung ythu...@winbond.com wrote:

 Hi Hadoop Users,
  
 I’m using hadoop-2.2.0 with YARN.
 Today I stumble upon a problem with YARN management UI, when I look into 
 cluster/apps, there is one apps running but not showing in the entries.
 I make sure there is an application running in cluster with command “yarn 
 application –list”, could somebody tell me what is going on with the UI? Why 
 is it not showing any application?
  
 image001.png
  
  
 [hadoop@fchdnn3 hadoop-2.2.0]$ bin/yarn application -list
 14/03/27 10:39:25 INFO client.RMProxy: Connecting to ResourceManager at 
 fchdnn3.ctfab.com/10.16.10.173:8032
 Total number of applications (application-types: [] and states: [SUBMITTED, 
 ACCEPTED, RUNNING]):1
 Application-Id  Application-NameApplication-Type  
 User   Queue   State Final-State  
ProgressTracking-URL
 application_1395886841648_0001  MES Performance Delete 20140309
 MAPREDUCEhadoop default RUNNING   
 UNDEFINED   80.05%http://fchddn7:39536
 
 The privileged confidential information contained in this email is intended 
 for use only by the addressees as indicated by the original sender of this 
 email. If you are not the addressee indicated in this email or are not 
 responsible for delivery of the email to such a person, please kindly reply 
 to the sender indicating this fact and delete all copies of it from your 
 computer and network server immediately. Your cooperation is highly 
 appreciated. It is advised that any unauthorized use of confidential 
 information of Winbond is strictly prohibited; and any information in this 
 email irrelevant to the official business of Winbond shall be deemed as 
 neither given nor endorsed by Winbond.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Data Locality Importance

2014-03-22 Thread Vinod Kumar Vavilapalli
Like you said, it depends both on the kind of network you have and the type of 
your workload.

Given your point about S3, I'd guess your input files/blocks are not large 
enough that moving code to data trumps moving data itself to the code. When 
that balance tilts a lot, especially when moving large input data files/blocks, 
data-locality will help improve performance significantly. That or when the 
read throughput from a remote desk  reading it from a local disk.

HTH
+Vinod

On Mar 21, 2014, at 7:06 PM, Mike Sam mikesam...@gmail.com wrote:

 How important is Data Locality to Hadoop? I mean, if we prefer to separate
 the HDFS cluster from the MR cluster, we will lose data locality but my
 question is how bad is this assuming we provider a reasonable network
 connection between the two clusters? EMR kills data locality when using S3
 as storage but we do not see a significant job time difference running same
 job from the HDFS cluster of the same setup. So, I am wondering
 how important is Data Locality to Hadoop in practice?
 
 Thanks,
 Mike


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0

2014-03-22 Thread Vinod Kumar Vavilapalli
What is 614 here?

The other relevant thing to check is the MapReduce specific config 
mapreduce.application.classpath.

+Vinod

On Mar 22, 2014, at 9:03 AM, Tony Mullins tonymullins...@gmail.com wrote:

 Hi,
 
 I have setup a 2 node cluster of Hadoop 2.3.0. Its working fine and I can 
 successfully run distributedshell-2.2.0.jar example. But when I try to run 
 any mapreduce job I get error. I have setup MapRed.xml and other configs for 
 running MapReduce job according to 
 (http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide)
  but I am getting following error :
 
 14/03/22 20:31:17 INFO mapreduce.Job: Job job_1395502230567_0001 failed with 
 state FAILED due to: Application application_1395502230567_0001 failed 2 
 times due to AM Container for appattempt_1395502230567_0001_02 exited 
 with exitCode: 1 due to: Exception from container-launch: 
 org.apache.hadoop.util.Shell$ExitCodeException: 
 org.apache.hadoop.util.Shell$ExitCodeException: at 
 org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at 
 org.apache.hadoop.util.Shell.run(Shell.java:418) at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
 
 Container exited with a non-zero exit code 1
 .Failing this attempt.. Failing the application.
 14/03/22 20:31:17 INFO mapreduce.Job: Counters: 0
 Job ended: Sat Mar 22 20:31:17 PKT 2014
 The job took 6 seconds.
 And if look at stderr (log of job) there is only one line 
 
 Could not find or load main class 614
 
 Now I have googled it and usually this issues comes when you have different 
 JAVA versions or in yarn-site.xml classpath is not properly set , my 
 yarn-site.xml has this
 
 
 property
 nameyarn.application.classpath/name
 
 value/opt/yarn/hadoop-2.3.0/etc/hadoop,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*/value
 
   /property
 So any other ideas what could be the issue here ?
 
 I am running my mapreduce job like this:
 
 $HADOOP_PREFIX/bin/hadoop jar 
 $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar 
 randomwriter out
 Thanks, Tony
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0

2014-03-22 Thread Vinod Kumar Vavilapalli
Given your earlier mail about the paths in /opt, shouldn't mapreduce classpath 
also point to /opt/yarn/hadoop-2.3.0 etc?

+Vinod

On Mar 22, 2014, at 11:33 AM, Tony Mullins tonymullins...@gmail.com wrote:

 That I also dont know what 614... Its the exact and single line in stderr of 
 Jobs logs.
 And regarding MapRed classpath , defaults are good as there are only two vars 
 $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*, 
 $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*.
 
 Is there any other place to look for detailed  meaningfull error info ? or 
 any huntch to how to fix it ?
 
 Thanks,
 Tony
 
 
 On Sat, Mar 22, 2014 at 11:11 PM, Vinod Kumar Vavilapalli 
 vino...@apache.org wrote:
 What is 614 here?
 
 The other relevant thing to check is the MapReduce specific config 
 mapreduce.application.classpath.
 
 +Vinod
 
 On Mar 22, 2014, at 9:03 AM, Tony Mullins tonymullins...@gmail.com wrote:
 
 Hi,
 
 I have setup a 2 node cluster of Hadoop 2.3.0. Its working fine and I can 
 successfully run distributedshell-2.2.0.jar example. But when I try to run 
 any mapreduce job I get error. I have setup MapRed.xml and other configs for 
 running MapReduce job according to 
 (http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide)
  but I am getting following error :
 
 14/03/22 20:31:17 INFO mapreduce.Job: Job job_1395502230567_0001 failed with 
 state FAILED due to: Application application_1395502230567_0001 failed 2 
 times due to AM Container for appattempt_1395502230567_0001_02 exited 
 with exitCode: 1 due to: Exception from container-launch: 
 org.apache.hadoop.util.Shell$ExitCodeException: 
 org.apache.hadoop.util.Shell$ExitCodeException: at 
 org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at 
 org.apache.hadoop.util.Shell.run(Shell.java:418) at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
 
 Container exited with a non-zero exit code 1
 .Failing this attempt.. Failing the application.
 14/03/22 20:31:17 INFO mapreduce.Job: Counters: 0
 Job ended: Sat Mar 22 20:31:17 PKT 2014
 The job took 6 seconds.
 And if look at stderr (log of job) there is only one line 
 
 Could not find or load main class 614
 
 Now I have googled it and usually this issues comes when you have different 
 JAVA versions or in yarn-site.xml classpath is not properly set , my 
 yarn-site.xml has this
 
 
 property
 nameyarn.application.classpath/name
 
 value/opt/yarn/hadoop-2.3.0/etc/hadoop,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*/value
 
 
   /property
 So any other ideas what could be the issue here ?
 
 I am running my mapreduce job like this:
 
 $HADOOP_PREFIX/bin/hadoop jar 
 $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar 
 randomwriter out
 Thanks, Tony
 
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Vinod Kumar Vavilapalli

Yes. JobTracker and TaskTracker are gone from all the 2.x release lines.

MapReduce is an application on top of YARN. That is per job - launches, starts 
and finishes after it is done with its work. Once it is done, you can go look 
at it in the MapReduce specific JobHistoryServer.

+Vinod

On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote:

 i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big
 leap). i was wondering if there is a way to view my jobs now via a web UI?
 i used to be able to do this by accessing the following URL
 
 http://hadoop-cluster:50030/jobtracker.jsp
 
 however, there is no more job tracker monitoring page here.
 
 furthermore, i am confused about MapReduce as an application running on top
 of YARN. so the documentation says MapReduce is just an application running
 on YARN. if that is true, how come i do not see MapReduce as an application
 on the ResourceManager web UI?
 
 http://hadoop-cluster:8088/cluster/apps
 
 is this because MapReduce is NOT a long-running app? meaning, a MapReduce
 job will only show up as an app in YARN when it is running? (please bear
 with me, i'm still adjusting to this new design).
 
 any help/pointer is appreciated.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Node manager or Resource Manager crash

2014-03-04 Thread Vinod Kumar Vavilapalli
I remember you asking this question before. Check if your OS' OOM killer is 
killing it.

+Vinod

On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Hi,
   I am running an application on a 2-node cluster, which tries to acquire all 
 the containers that are available on one of those nodes and remaining 
 containers from the other node in the cluster. When I run this application 
 continuously in a loop, one of the NM or RM is getting killed at a random 
 point. There is no corresponding message in the log files.
 
 One of the times that NM had got killed today, the tail of the it's log is 
 like this:
 
 2014-03-04 02:42:44,386 DEBUG 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
 isredeng:52867 sending out status for 16 containers
 2014-03-04 02:42:44,386 DEBUG 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's 
 health-status : true,
 
 
 And at the time of NM's crash, the RM's log has the following entries:
 
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing 
 isredeng:52867 of type STATUS_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Dispatching the event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
  NODE_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
 Responder: responding to 
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  nodeUpdate: isredeng:52867 clusterResources: 
 memory:16384, vCores:16
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Node being looked for scheduling isredeng:52867 
 availableResource: memory:0, vCores:-8
 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151
 
 
 Note: the name of the node on which NM has got killed is isredeng, does it 
 indicate anything from the above message as to why it got killed?
 
 Thanks,
 Kishore
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Hiveserver2 + OpenLdap Authentication issue

2014-02-24 Thread Vinod Kumar Vavilapalli
This is on the wrong mailing list, hence the non-activity.

+user@hive
bcc:user@hadoop

Thanks
+Vinod


On Feb 23, 2014, at 10:16 PM, orahad bigdata oracle...@gmail.com wrote:

 Can somebody help me please?
  
 Thanks
 
 
 On Sun, Feb 23, 2014 at 3:27 AM, orahad bigdata oracle...@gmail.com wrote:
 Hi Experts,
  
 I'm facing an issue with Hiveserver2 and Ldap Integration, I have followed 
 all the mentioned steps for the integration, In addition I'm able to do 
 'getent passwd someuser' command working fine and even can also login on 
 client machine through LDAP authentication but it's not working with 
 Hiveserver2. I'm getting below error.
  
 beeline !connect jdbc:hive2://server1:1/default; hivetest hivetest 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://server1:1/default;
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/opt/mapr/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/opt/mapr/hive/hive-0.12/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 Error: Could not open connection to jdbc:hive2://server1:1/default;: Peer 
 indicated failure: Error validating the login (state=08S01,code=0)
  
  Please help me.
  
 Thanks
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: history server for 2 clusters

2014-02-20 Thread Vinod Kumar Vavilapalli
Interesting use-case and setup. We never had this use-case in mind so far - we 
so far assumed a history-server per YARN cluster. You may be running into some 
issues where this assumption is not valid.

Why do you need two separate YARN clusters for the same underlying data on 
HDFS? And if that can't change, why can't you have two history-servers?

+Vinod

On Feb 20, 2014, at 6:08 PM, Anfernee Xu anfernee...@gmail.com wrote:

 Hi,
 
 I'm at 2.2.0 release and I have a HDFS cluster which is shared by 2 YARN(MR) 
 cluster, also I have a single shared history server, what I'm seeing is I can 
 see all job summary for all jobs from history server UI, I also can see task 
 log for jobs running in one cluster, but if I want to see log for jobs 
 running in another cluster, it showed me below error
 
 Logs not available for attempt_1392933787561_0024_m_00_0. Aggregation may 
 not be complete, Check back later or try the nodemanager at 
 slc03jvt.mydomain.com:31303 
 
 Here's my configuration:
 
 Note: my history server is running on RM node of the MR cluster where I can 
 see the log.
 
 
 mapred-site.xml
 property
   namemapreduce.jobhistory.address/name
   valueslc00dgd:10020/value
   descriptionMapReduce JobHistory Server IPC host:port/description
 /property
 
 property
   namemapreduce.jobhistory.webapp.address/name
   valueslc00dgd:19888/value
   descriptionMapReduce JobHistory Server Web UI host:port/description
 /property
 
 --yarn-site.xml
   property
  nameyarn.log-aggregation-enable/name
  valuetrue/value
/property
 
property
  nameyarn.nodemanager.remote-app-log-dir-suffix/name
  valuedc/value
/property
 
 Above configuration are almost same for both clusters, the only difference is 
 yarn.nodemanager.remote-app-log-dir-suffix, they have different suffix.
 
 
 
 -- 
 --Anfernee


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-20 Thread Vinod Kumar Vavilapalli

Yes, it does take those extra resources away back to queue B. How quickly it 
takes them away depends on whether preemption is enabled or not. If preemption 
is not enabled, it 'takes away' as and when containers from queue A start 
finishing.

+Binod

On Feb 19, 2014, at 5:35 PM, Alex Nastetsky anastet...@spryinc.com wrote:

 Will the scheduler take away the 10% from queue B and give it back to queue A 
 even if queue B needs it? If not, it would seem that the scheduler is 
 reneging on its guarantee.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: what happens to a client attempting to get a new app when the resource manager is already down

2014-02-05 Thread Vinod Kumar Vavilapalli
Is this on trunk or a released version?

I think the default behavior (when RM HA is not enabled) shouldn't have client  
loop forever. Let me know and we can see if this needs fixing.

Thanks,
+vinod


On Jan 31, 2014, at 7:52 AM, REYANE OUKPEDJO r.oukpe...@yahoo.com wrote:

 Hi there,
 
 I am trying to solve a problem. My client run as a server. And was trying to 
 make my client aware about the fact the resource manager is down but I could 
 not figure out. The reason is that the call :  
 yarnClient.createApplication(); never return when the resource manager is 
 down. However it just stay in a loops and sleep after 10 iteration and 
 continue the same loops. Below you can find the logs. Any idea how to leave 
 this loop ? is there any parameter that control the number of seconds before 
 giving up.
 
 Thanks
 
 Reyane OUKPEDJO
 
 
 
 
 
 
 
 logs
 14/01/31 10:48:05 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:06 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:37 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 0 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:38 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 1 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:39 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 2 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:40 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 3 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:41 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 4 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:42 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 5 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:43 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 6 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:44 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 7 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:45 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:48:46 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:17 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 0 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:18 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 1 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:19 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 2 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:20 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 3 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:21 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 4 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 14/01/31 10:49:22 INFO ipc.Client: Retrying connect to server: 
 isblade2/9.32.160.125:8032. Already tried 5 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, 

Re: kerberos principals per node necessary?

2014-02-05 Thread Vinod Kumar Vavilapalli
For helping manage this, Hadoop lets you specify principles of the format 
hdfs/_HOST@SOME-REALM. Here _HOST is a special string that Hadoop interprets 
and replaces it with the local hostname. You need to create principles per host 
though.

+Vinod

On Feb 2, 2014, at 3:14 PM, Koert Kuipers ko...@tresata.com wrote:

 is it necessary to create a kerberos principal for hdfs on every node, as in 
 hdfs/some-host@SOME-REALM?
 why not use one principal hdfs@SOME-REALM? that way i could distribute the 
 same keytab file to all nodes which makes things a lot easier.
 thanks! koert


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Does all reducer take input from all NodeManager/Tasktrackers of Map tasks

2014-01-27 Thread Vinod Kumar Vavilapalli


On Jan 27, 2014, at 4:17 AM, Amit Mittal amitmitt...@gmail.com wrote:

 Question 1: I believe the TaskTracker and then JobTracker/AppMaster will 
 receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol 
 obj). By which the JobTracker/AM will know the location of the map's o/p file 
 and host details etc, however how it will know what all the partitions or 
 keys this output has. In other words, from the heartbeat, how JobTracker will 
 know about data partitions/keys? It will be required to decide from which 
 Mapper, the mapper's output needs to be pulled or not.


Reducers pull map outputs from all the maps. So JobTracker/AppMaster simply 
give the completion events of *all* the maps to every reducer. There is no need 
for JT/AM to track the distribution of keys.


 Question 2: In short, not all reducer takes output from all Mappers, they 
 only connects and takes output related to the keys partitioned for that 
 particular reducer.


That is in a sense correct.More clearly, all Reducers get a small chunk of 
output from all Mappers.

+Vinod

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Invalide URI in job start

2014-01-27 Thread Vinod Kumar Vavilapalli
Need your help to debug this. Seems like the scheme is getting lost somewhere 
along the way. Clearly as you say if job.jar is on the file-system, then 
JobClient is properly uploading it. There are multilple things that you'll need 
to check
 - Check the NodeManager logs for the URL. It does print what URL it is trying 
to download from. Check if the scheme is getting there or not.
 - If that doesn't tell you something, change JobClient to print the URL before 
it constructs the ContainerLaunchContext for the ApplicationMaster. You'll need 
to do this in YarnRunner.java. Specifically the method 
createApplicationResource.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Jan 27, 2014, at 2:05 AM, Lukas Kairies lukas.xtree...@googlemail.com 
wrote:

 Hello,
 
 I try to use XtreemFS as an alternative file system for Hadoop 2.x. There is 
 an existing FileSystem implementation for Hadoop 1.x that works fine. First 
 think I did was to implement a DelegateToFileSystem subclass to provide an 
 AbstractFileSystem implementation for XtreemFS (just constructors that use 
 the FileSystem implementation). When I start the wordcount example 
 application I get the following Exception on the NodeManager:
 
 2014-01-20 14:18:19,349 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Failed to parse resource-request
 java.net.URISyntaxException: Expected scheme name at index 0: 
 :///tmp/hadoop-yarn/staging/lkairies/.staging/job_1390223418764_0004/job.jar
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.failExpecting(URI.java:2835)
at java.net.URI$Parser.parse(URI.java:3027)
at java.net.URI.init(URI.java:753)
at 
 org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:80)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:529)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:497)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:864)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:815)
at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:808)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:724)
 
 Additionally the following is printed on the console:
 
 14/01/27 11:02:14 INFO input.FileInputFormat: Total input paths to process : 1
 14/01/27 11:02:14 INFO mapreduce.JobSubmitter: number of splits:1
 14/01/27 11:02:15 INFO Configuration.deprecation: user.name is deprecated. 
 Instead, use mapreduce.job.user.name
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.jar is deprecated. 
 Instead, use mapreduce.job.jar
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.output.value.class 
 is deprecated. Instead, use mapreduce.job.output.value.class
 14/01/27 11:02:15 INFO Configuration.deprecation: mapreduce.combine.class is 
 deprecated. Instead, use mapreduce.job.combine.class
 14/01/27 11:02:15 INFO Configuration.deprecation: mapreduce.map.class is 
 deprecated. Instead, use mapreduce.job.map.class
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.job.name is 
 deprecated. Instead, use mapreduce.job.name
 14/01/27 11:02:15 INFO Configuration.deprecation: mapreduce.reduce.class is 
 deprecated. Instead, use mapreduce.job.reduce.class
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.input.dir is 
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.output.dir is 
 deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
 14/01/27 11:02:15 INFO Configuration.deprecation: mapred.map.tasks is 
 deprecated. Instead

Re: Ambari upgrade 1.4.1 to 1.4.2

2014-01-24 Thread Vinod Kumar Vavilapalli
+user@ambari -user@hadoop

Please post ambari related questions to the ambari user mailing list.

Thanks
+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 9:15 AM, Kokkula, Sada 
sadanandam.kokk...@bnymellon.com wrote:



 Ambari-Server upgrade from 1.4.1 to 1.4.2 wipes out Ambari database during
 upgrade. After that, not able to open the Ambari Server GUI.

 Reviewed the Horton works web site for help, but the steps in doc plan not
 help out to fix the issue.



 Appreciated for any updates.



 Thanks,

 The information contained in this e-mail, and any attachment, is
 confidential and is intended solely for the use of the intended recipient.
 Access, copying or re-use of the e-mail or any attachment, or any
 information contained therein, by any other person is not authorized. If
 you are not the intended recipient please return the e-mail to the sender
 and delete it from your computer. Although we attempt to sweep e-mail and
 attachments for viruses, we do not guarantee that either are virus-free and
 accept no liability for any damage sustained as a result of viruses.

 Please refer to http://disclaimer.bnymellon.com/eu.htm for certain
 disclosures relating to European legal entities.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS data transfer is faster than SCP based transfer?

2014-01-24 Thread Vinod Kumar Vavilapalli
Is it a single file? Lots of files? How big are the files? Is the copy on a
single node or are you running some kind of a MapReduce program?

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 7:21 AM, rab ra rab...@gmail.com wrote:

 Hi

 Can anyone please answer my query?

 -Rab
 -- Forwarded message --
 From: rab ra rab...@gmail.com
 Date: 24 Jan 2014 10:55
 Subject: HDFS data transfer is faster than SCP based transfer?
 To: user@hadoop.apache.org

 Hello

 I have a use case that requires transfer of input files from remote
 storage using SCP protocol (using jSCH jar).  To optimize this use case, I
 have pre-loaded all my input files into HDFS and modified my use case so
 that it copies required files from HDFS. So, when tasktrackers works, it
 copies required number of input files to its local directory from HDFS. All
 my tasktrackers are also datanodes. I could see my use case has run faster.
 The only modification in my application is that file copy from HDFS instead
 of transfer using SCP. Also, my use case involves parallel operations (run
 in tasktrackers) and they do lot of file transfer. Now all these transfers
 are replaced with HDFS copy.

 Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
 it uses TCP/IP? Can anyone give me reasonable reasons to support the
 decrease of time?


 with thanks and regards
 rab


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Is your data in any given file a bunch of key-value pairs? If that isn't
the case, I'm wondering how writing a single large key-value into a
sequence file helps. It won't. May be you can give an example of your input
data?

If indeed they are a bunch of smaller sized key-value pairs, you can write
your own custom InputFormat that reads the data from your input files one
k-v pair after another, and feed it to your MR job. There isn't any need
for converting them to sequence-files at that point.

Thanks
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: No space left on device during merge.

2014-01-24 Thread Vinod Kumar Vavilapalli
That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter t...@yahoo-inc.com wrote:

  Hi,
   I'm getting the below error while trying to sort a lot of data with Hadoop.

 I strongly suspect the node the merge is on is running out of local disk 
 space. Assuming this is the case, is there any way
 to get around this limitation considering I can't increase the local disk 
 space available on the nodes?  Like specify sort/merge parameters or similar.

 Thanks,
   Tim.

 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
 Got brand-new decompressor [.lzo_deflate]
 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
 the last merge-pass, with 100 segments left of total size: 642610678884 bytes
 2014-01-24 10:02:36,281 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:XX (auth:XX) 
 cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
 Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left 
 on device
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   at 
 org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
   at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 Caused by: java.io.IOException: No space left on device
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:318)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
   ... 14 more

 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
 cleanup for the task



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Okay. Assuming you don't need a whole file (video) in memory for your 
processing, you can simply write a Inputformat/RecordReader implementation that 
streams through any given file and processes it.

+Vinod

On Jan 24, 2014, at 12:44 PM, Adam Retter adam.ret...@googlemail.com wrote:

 Is your data in any given file a bunch of key-value pairs?
 
 No. The content of each file itself is the value we are interested in,
 and I guess that it's filename is the key.
 
 If that isn't the
 case, I'm wondering how writing a single large key-value into a sequence
 file helps. It won't. May be you can give an example of your input data?
 
 Well from the Hadoop O'Reilly book, I rather got the impression that
 HDFS does not like small files due to it's 64MB block size, and it is
 instead recommended to place small files into a Sequence file. Is that
 not the case?
 
 Our input data really varies between 130 different file types, it
 could be Microsoft Office documents, Video Recordings, Audio, CAD
 diagrams etc.
 
 If indeed they are a bunch of smaller sized key-value pairs, you can write
 your own custom InputFormat that reads the data from your input files one
 k-v pair after another, and feed it to your MR job. There isn't any need for
 converting them to sequence-files at that point.
 
 As I mentioned in my initial email, each file cannot be split up!
 
 Thanks
 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.
 
 
 
 -- 
 Adam Retter
 
 skype: adam.retter
 tweet: adamretter
 http://www.adamretter.org.uk


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Container's completion issue

2014-01-21 Thread Vinod Kumar Vavilapalli
It means that the first process in the container is either crashing due to
some reason or explicitly killed by an external entity. You can look at the
logs for the container on the web-UI. Also look at ResourceManager logs to
trace what is happening with this container.

Which application is this? MapReduce? Distributed Shell? Or your own custom
application?

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Mon, Jan 20, 2014 at 6:38 AM, REYANE OUKPEDJO r.oukpe...@yahoo.comwrote:

 Hi there,
 I was using hadoop-2.2.0 to run an application that only lunch 3
 containers the application master container and 2 more containers that run
 the job. one of the 2 containers is returning as completed from the
 Resource Manager's logs as soon as it is launched. The problem is the
 process that is running with that container continue to run while the
 container's local directory is already cleaned up. I investigated my code
 to understand the reason this happens and could not figure out. Anyone ever
 experience this ? If yes please share. Also please give the details about
 how the containers completion event is triggered and what make it possible
 for a process to continue running after it is marked as completed.


 Thanks


 Reyane OUKPEDJO


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: DistributedCache is empty

2014-01-17 Thread Vinod Kumar Vavilapalli
What is the version of Hadoop that you are using?

+Vinod

On Jan 16, 2014, at 2:41 PM, Keith Wiley kwi...@keithwiley.com wrote:

 My driver is implemented around Tool and so should be wrapping 
 GenericOptionsParser internally.  Nevertheless, neither -files nor 
 DistributedCache methods seem to work.  Usage on the command line is straight 
 forward, I simply add -files foo.py,bar.py right after the class name 
 (where those files are in the current directory I'm running hadoop from, 
 i.e., the local nonHDFS filesystem).  The mapper then inspects the file list 
 via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and 
 doesn't see the files, there's nothing there.  Likewise, if I attempt to run 
 those python scripts from the mapper using hadoop.util.Shell, the files 
 obviously can't be found.
 
 That should have worked, so I shouldn't have to rely on the DC methods, but 
 nevertheless, I tried anyway, so in the driver I create a new Configuration, 
 then call DistributedCache.addCacheFile(new URI(./foo.py), conf), thus 
 referencing the local nonHDFS file in the current working directory.  I then 
 add conf to the job ctor, seems straight forward.  Still no dice, the mapper 
 can't see the files, they simply aren't there.
 
 What on Earth am I doing wrong here?
 
 
 Keith Wiley kwi...@keithwiley.com keithwiley.com
 music.keithwiley.com
 
 Luminous beings are we, not this crude matter.
   --  Yoda
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: How to make AM terminate if client crashes?

2014-01-13 Thread Vinod Kumar Vavilapalli
The architecture is built around detachable clients. So, no, it doesn't happen 
automatically. Even if we were to add that feature, it'd be fraught with edge 
cases - network issues causing app-termination even though client is still 
alive etc.

Any more details on why this is desired?

+Vinod

On Jan 11, 2014, at 11:37 AM, John Lilley john.lil...@redpoint.net wrote:

 We have a YARN application that we want to automatically terminate if the 
 YARN client disconnects or crashes.  Is it possible to configure the 
 YarnClient-RM connection so that if the client terminates the RM 
 automatically terminates the AM?  Or do we need to build our own logic (e.g. 
 a direct client-AM connection) for that?
 Thanks
 John


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin

2014-01-08 Thread Vinod Kumar Vavilapalli
It just seems like lazy code. You can see that, later, there is this:

{code}

for(Token? token : UserGroupInformation.getCurrentUser().getTokens()) 
{
  childUGI.addToken(token);
}

{code}

So eventually the JobToken is getting added to the UGI which runs task-code.

  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server handler 63 
 on 9000): No groups available for user job_201401071758_0002

This seems to be a problem. When the task tries to reach the NameNode, it 
should do so as the user, not the job-id. It is not just logging, I'd be 
surprised if jobs pass. Do you have permissions enabled on HDFS?

Oh, or is this in non-secure mode (i.e. without kerberos)?

+Vinod


On Jan 7, 2014, at 5:14 PM, Jian Fang jian.fang.subscr...@gmail.com wrote:

 Hi,
 
 I looked at Hadoop 1.X source code and found some logic that I could not 
 understand. 
 
 In the org.apache.hadoop.mapred.Child class, there were two UGIs defined as 
 follows.
 
 UserGroupInformation current = UserGroupInformation.getCurrentUser();
 current.addToken(jt);
 
 UserGroupInformation taskOwner 
  = 
 UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
 taskOwner.addToken(jt);
 
 But it is the taskOwner that is actually passed as a UGI to task tracker and 
 then to HDFS. The first one was not referenced any where.
 
 final TaskUmbilicalProtocol umbilical = 
   taskOwner.doAs(new PrivilegedExceptionActionTaskUmbilicalProtocol() {
 @Override
 public TaskUmbilicalProtocol run() throws Exception {
   return 
 (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
   TaskUmbilicalProtocol.versionID,
   address,
   defaultConf);
 }
 });
 
 What puzzled me is that the job id is actually passed in as the user name to 
 task tracker. On the Name node side, when it tries to map the non-existing 
 user name, i.e., task id, to a group, it always returns empty array. As a 
 result, we always see annoying warning messages such as
 
  WARN org.apache.hadoop.security.UserGroupInformation (IPC Server handler 63 
 on 9000): No groups available for user job_201401071758_0002
 
 Sometimes, the warning messages were thrown so fast, hundreds or even 
 thousands per second for a big cluster, the system performance was degraded 
 dramatically. 
 
 Could someone please explain why this logic was designed in this way? Any 
 benefit to use non-existing user for the group mapping? Or is this a bug?
 
 Thanks in advance,
 
 John


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Ways to manage user accounts on hadoop cluster when using kerberos security

2014-01-08 Thread Vinod Kumar Vavilapalli

On Jan 7, 2014, at 2:55 PM, Manoj Samel manoj.sa...@gmail.com wrote:

 I am assuming that if the users are in a LDAP, can using the PAM for LDAP 
 solve the issue.


That's how I've seen this issue addressed. 

+Vinod
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Why none AppMaster node seeks IPCServer on itself.

2014-01-08 Thread Vinod Kumar Vavilapalli
Checked the firewall rules?

+Vinod

On Jan 8, 2014, at 3:22 AM, Saeed Adel Mehraban s.ade...@gmail.com wrote:

 Hi all.
 I have an installation on Hadoop on 3 nodes, namely master, slave1 and 
 slave2. When I try to run a job, assuming appmaster be on slave1, every map 
 and reduce tasks which take place on slave2 will fail due to ConnectException.
 I checked the port which slave2 wants to connect to. It differs randomly each 
 time, but when I look for it in slave1 logs, I can see this line:
 2014-01-08 02:14:25,206 INFO [Socket Reader #1 for port 38226] 
 org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 38226
 So there is a process on slave1 listening to this port, but slave2 tasks want 
 to connect to this port on slave2.
 
 Do you know why is this happening?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread Vinod Kumar Vavilapalli

What OutputFormat are you using?

Once it reaches OutputFormat (specifically RecordWriter) it all depends on what 
the RecordWriter does. Are you using some OutputFormat with a RecordWriter that 
buffers like this?

Thanks,
+Vinod

On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com wrote:

 This is not in DFSClient.
 
 Before the output is written on to HDFS, lot of operations take place.
 
 Like reducer output in mem reaching 90% of HDFS block size, then starting to 
 flush  the data etc..,
 
 So, my requirement is to have a look at that code where in I want to change 
 the logic a bit which suits my convenience.
 
 
 On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 Assuming your output is going to HDFS, you want to look at DFSClient.
 
 Reducer uses FileSystem to write the output. You need to start looking at how 
 DFSClient chunks the output and sends them across to the remote data-nodes.
 
 Thanks
 +Vinod
 
 On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:
 
 I want to have a look at the code where of flush operations that happens 
 after the reduce phase.
 
 Reducer writes the output to OutputFormat which inturn pushes that to memory 
 and once it reaches 90% of chunk size it starts to flush the reducer output. 
 
 I essentially want to look at the code of that flushing operation.
 
 
 What is the class(es) I need to look into 
 
 
 On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya smarty.ju...@gmail.com 
 wrote:
 Please do not tell me since last 2.5 years you have not used virtual Hadoop 
 environment to debug your Map Reduce application before deploying to 
 Production environment
 
 No one can stop you looking at the code , Hadoop and its ecosystem is 
 open-source
 
 
 On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:
 
 
 -- Forwarded message --
 From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
 Date: Mon, Jan 6, 2014 at 6:39 PM
 Subject: Understanding MapReduce source code : Flush operations
 To: mapreduce-u...@hadoop.apache.org
 
 
 Hi,
 
 I am using hadoop/ map reduce for aout 2.5 years. I want to understand the 
 internals of the hadoop source code. 
 
 Let me put my requirement very clear.
 
 I want to have a look at the code where of flush operations that happens 
 after the reduce phase.
 
 Reducer writes the output to OutputFormat which inturn pushes that to memory 
 and once it reaches 90% of chunk size it starts to flush the reducer output. 
 
 I essentially want to look at the code of that flushing operation.
 
 
 
 
 Regards,
 Nagarjuna K
 
 
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Unable to change the virtual memory to be more than the default 2.1 GB

2014-01-02 Thread Vinod Kumar Vavilapalli
You need to change the application configuration itself to tell YARN that each 
task needs more than the default. I see that this is a mapreduce app, so you 
have to change the per-application configuration: mapreduce.map.memory.mb and 
mapreduce.reduce.memory.mb in either mapred-site.xml or via the command line.

Side notes: Seems like you are spawning lots of shells under your mapper and 
YARN's NodeManager is detecting that the total virtual memory usage is 14.5GB. 
You may want to reduce that number of shells, lest the OS itself might kill 
your tasks depend on the system configuration.

Thanks,
+Vinod

On Jan 1, 2014, at 7:50 PM, S.L simpleliving...@gmail.com wrote:

 Hello Folks,
 
 I am running hadoop 2.2 in a pseudo-distributed mode on a laptop with 8GB 
 RAM. 
 
 Whenever I submit a job I get an error that says that the that the virtual 
 memory usage exceeded , like below.
 
 I have changed the ratio yarn.nodenamager.vmem-pmem-ratio in yarn-site.xml to 
 10 , however the virtual memory is not getting increased more than 2.1 GB , 
 as can been seen in the error message below and the container is being killed.
 
 Can some one please let me know if there is any other setting that needs to 
 be changed ? Thanks in advance!
 
 Error Message :
 
 INFO mapreduce.Job: Task Id : attempt_1388632710048_0009_m_00_2, Status : 
 FAILED
 Container [pid=12013,containerID=container_1388632710048_0009_01_04] is 
 running beyond virtual memory limits. Current usage: 544.9 MB of 1 GB 
 physical memory used; 14.5 GB of 2.1 GB virtual memory used. Killing 
 container.
 Dump of the process-tree for container_1388632710048_0009_01_04 :
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
 SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 12077 12018 12013 12013 (phantomjs) 16 2 1641000960 6728 
 /usr/local/bin/phantomjs --webdriver=15358 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12013 882 12013 12013 (bash) 1 0 108650496 305 /bin/bash -c 
 /usr/java/jdk1.7.0_25/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN  -Xmx200m 
 -Djava.io.tmpdir=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/tmp
  -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.container.log.dir=/home/general/hadoop-2.2.0/logs/userlogs/application_1388632710048_0009/container_1388632710048_0009_01_04
  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 org.apache.hadoop.mapred.YarnChild 127.0.0.1 56498 
 attempt_1388632710048_0009_m_00_2 4 
 1/home/general/hadoop-2.2.0/logs/userlogs/application_1388632710048_0009/container_1388632710048_0009_01_04/stdout
  
 2/home/general/hadoop-2.2.0/logs/userlogs/application_1388632710048_0009/container_1388632710048_0009_01_04/stderr
   
 |- 12075 12018 12013 12013 (phantomjs) 17 1 1615687680 6539 
 /usr/local/bin/phantomjs --webdriver=29062 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12074 12018 12013 12013 (phantomjs) 16 2 1641000960 6727 
 /usr/local/bin/phantomjs --webdriver=5958 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12073 12018 12013 12013 (phantomjs) 17 2 1641000960 6732 
 /usr/local/bin/phantomjs --webdriver=31836 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12090 12018 12013 12013 (phantomjs) 16 2 1615687680 6538 
 /usr/local/bin/phantomjs --webdriver=24519 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12072 12018 12013 12013 (phantomjs) 16 1 1641000960 6216 
 /usr/local/bin/phantomjs --webdriver=10175 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12091 12018 12013 12013 (phantomjs) 17 1 1615687680 6036 
 /usr/local/bin/phantomjs --webdriver=5043 
 --webdriver-logfile=/tmp/hadoop-general/nm-local-dir/usercache/general/appcache/application_1388632710048_0009/container_1388632710048_0009_01_04/phantomjsdriver.log
  
 |- 12018 12013 12013 12013 (java) 996 41 820924416 79595 
 /usr/java/jdk1.7.0_25/bin/java -Djava.net.preferIPv4Stack=true 
 -Dhadoop.metrics.log.level=WARN -Xmx200m 
 

Re: What are the methods to share dynamic data among mappers/reducers?

2014-01-02 Thread Vinod Kumar Vavilapalli

There isn't anything natively supported for that in the framework, but you can 
do that yourselves by using a shared service (for e.g via HDFS files, ZooKeeper 
nodes) that mappers/reducers all have access to.

More details on your usecase? In any case, once you start making mappers and 
reducers depend on either externally changing state or inter-dependence, you 
may be breaking fundamental assumptions of MapReduce - embarrassingly parallel 
computation (limiting scalability) and/or idempotency (affecting retries during 
failures).

Thanks,
+Vinod

On Jan 2, 2014, at 1:42 AM, sam liu samliuhad...@gmail.com wrote:

 Hi,
 
 As I know, the Distributed Cache will copy the shared data to the slaves 
 before starting job, and won't change the shared data after that. 
 
 So are there any solutions to share dynamic data among mappers/reducers?
 
 Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Map succeeds but reduce hangs

2014-01-02 Thread Vinod Kumar Vavilapalli
Check the TaskTracker configuration in mapred-site.xml: 
mapred.task.tracker.report.address. You may be setting it to 127.0.0.1:0 or 
localhost:0. Change it to 0.0.0.0:0 and restart the daemons.

Thanks,
+Vinod

On Jan 1, 2014, at 2:14 PM, navaz navaz@gmail.com wrote:

 I dont know y it is running on localhost. I have commented it.
 ==
 slave1:
 Hostname: pc321
 
 hduser@pc321:/etc$ vi hosts
 #127.0.0.1  localhost loghost localhost.myslice.ch-geni-net.emulab.net
 155.98.39.28pc228
 155.98.39.121   pc321
 155.98.39.27dn3.myslice.ch-geni-net.emulab.net
 
 slave2:
 hostname: dn3.myslice.ch-geni-net.emulab.net
 hduser@dn3:/etc$ vi hosts
 #127.0.0.1  localhost loghost localhost.myslice.ch-geni-net.emulab.net
 155.98.39.28pc228
 155.98.39.121   pc321
 155.98.39.27dn3.myslice.ch-geni-net.emulab.net
 
 Master:
 Hostame: pc228
 hduser@pc228:/etc$ vi hosts
 #127.0.0.1  localhost loghost localhost.myslice.ch-geni-net.emulab.net
 155.98.39.28   pc228
 155.98.39.121  pc321
 #155.98.39.19   slave2
 155.98.39.27   dn3.myslice.ch-geni-net.emulab.net
 
 I have replaced localhost with pc228 in coresite.xml and mapreduce-site.xml 
 and replication factor as 3.
 
 I can able to ssh pc321 and dn3.myslice.ch-geni-net.emulab.net from master.
 
 
 hduser@pc228:/usr/local/hadoop/conf$ more slaves
 pc228
 pc321
 dn3.myslice.ch-geni-net.emulab.net
 
 hduser@pc228:/usr/local/hadoop/conf$ more masters
 pc228
 hduser@pc228:/usr/local/hadoop/conf$
 
 
 
 Am i am doing anything wrong here ?
 
 
 On Wed, Jan 1, 2014 at 4:54 PM, Hardik Pandya smarty.ju...@gmail.com wrote:
 do you have your hosnames properly configured in etc/hosts? have you tried 
 192.168.?.? instead of localhost 127.0.0.1
 
 
 
 On Wed, Jan 1, 2014 at 11:33 AM, navaz navaz@gmail.com wrote:
 Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ?
 
 Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig is 
 succeded for these nodes).
 
 Does it looks for hostname only for reduce ?
 
 
 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001
 14/01/01 09:09:39 INFO mapred.JobClient:  map 0% reduce 0%
 14/01/01 09:10:00 INFO mapred.JobClient:  map 33% reduce 0%
 14/01/01 09:10:01 INFO mapred.JobClient:  map 66% reduce 0%
 14/01/01 09:10:05 INFO mapred.JobClient:  map 100% reduce 0%
 14/01/01 09:10:14 INFO mapred.JobClient:  map 100% reduce 22%
 14/01/01 09:17:32 INFO mapred.JobClient:  map 100% reduce 0%
 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : 
 attempt_201401010908_0001_r_00_0, Status : FAILED
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 14/01/01 09:17:46 INFO mapred.JobClient:  map 100% reduce 11%
 14/01/01 09:17:50 INFO mapred.JobClient:  map 100% reduce 22%
 14/01/01 09:25:06 INFO mapred.JobClient:  map 100% reduce 0%
 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : 
 attempt_201401010908_0001_r_00_1, Status : FAILED
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 14/01/01 09:25:34 INFO mapred.JobClient:  map 100% reduce 100%
 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: job_201401010908_0001
 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29
 
 
 
 Job Tracker logs:
 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task 
 'attempt_201401010908_0001_m_02_0' has completed task_20140
 1010908_0001_m_02 successfully.
 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task 
 'attempt_201401010908_0001_m_01_0' has completed task_20140
 1010908_0001_m_01 successfully.
 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
 from attempt_201401010908_0001_r_00_0: Shuffle Error: Exc
 eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing 
 task 'attempt_201401010908_0001_r_00_0'
 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
 (TASK_CLEANUP) 'attempt_201401010908_0001_r_00_0' to ti
 p task_201401010908_0001_r_00, for tracker 
 'tracker_slave3:localhost/127.0.0.1:44663'
 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing 
 task 'attempt_201401010908_0001_r_00_0'
 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
 (REDUCE) 'attempt_201401010908_0001_r_00_1' to tip task
 _201401010908_0001_r_00, for tracker 
 'tracker_slave2:localhost/127.0.0.1:51438'
 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
 from attempt_201401010908_0001_r_00_1: Shuffle Error: Exc
 eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 2014-01-01 09:25:05,493 INFO 

Re: Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2013-12-23 Thread Vinod Kumar Vavilapalli
Seems like the hadoop common jar is missing, can you check if one of the 
directories listed in the CLASSPATH has the hadoop-common jar?

Thanks,
+Vinod

On Dec 22, 2013, at 10:27 PM, Hadoop Dev hadoopeco@gmail.com wrote:

 Hi All,
 I am trying to execute first ever program (Word Count) in hadoop2.2.0 on 
 Windows 7 (64bit). But do not know what's wrong with the classpath which is 
 making program fail at the runtime. (Failing @ container launch)
 Hadoop classpath set on my machine is
 c:\hadoop\etc\hadoop;c:\hadoop\share\hadoop\common\lib\*;c:\hadoop\share\hadoop\
 common\*;c:\hadoop\share\hadoop\hdfs;c:\hadoop\share\hadoop\hdfs\lib\*;c:\hadoop
 \share\hadoop\hdfs\*;c:\hadoop\share\hadoop\yarn\lib\*;c:\hadoop\share\hadoop\ya
 rn\*;c:\hadoop\share\hadoop\mapreduce\lib\*;c:\hadoop\share\hadoop\mapreduce\*
 
 stderr:
 java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.service.CompositeService
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 12 more
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 Exception in thread main 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Vinod Kumar Vavilapalli
That's good info. It is more than likely that it is the OOM killer. See 
http://stackoverflow.com/questions/726690/who-killed-my-process-and-why for 
example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Hi Jeff,
 
   I have run the resource manager in the foreground without nohup and here 
 are the messages when it was killed, it says it is Killed but doesn't say 
 why!
 
 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application 
 appattempt_1387266015651_0258_01 released container 
 container_1387266015651_0258_01_03 on node: host: isredeng:36576 
 #containers=2 available=7936 used=256 with event: FINISHED
 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: 
 container_1387266015651_0258_01_05 Container Transitioned from ACQUIRED 
 to RUNNING
 Killed
 
 
 Thanks,
 Kishore
 
 
 On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman stuck...@umd.edu wrote:
 What if you open the daemons in a screen session rather than running them 
 in the background -- for example, run yarn resourcemanager. Then you can 
 see exactly when they terminate, and hopefully why.
 
 From: Krishna Kishore Bonagiri
 Sent: Monday, December 16, 2013 6:20 AM
 To: user@hadoop.apache.org
 Reply To: user@hadoop.apache.org
 Subject: Re: Yarn -- one of the daemons getting killed
 
 Hi Vinod,
 
  Yes, I am running on Linux.
 
  I was actually searching for a corresponding message in /var/log/messages to 
 confirm that OOM killed my daemons, but could not find any corresponding 
 messages there! According to the following link, it looks like if it is a 
 memory issue, I should see a messages even if OOM is disabled, but I don't 
 see it.
 
 http://www.redhat.com/archives/taroon-list/2007-August/msg6.html
 
   And, is memory consumption more in case of two node cluster than a single 
 node one? Also, I see this problem only when I give * as the node name. 
 
   One other thing I suspected was the allowed number of user processes, I 
 increased that to 31000 from 1024 but that also didn't help.
 
 Thanks,
 Kishore
 
 
 On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 Yes, that is what I suspect. That is why I asked if everything is on a single 
 node. If you are running linux, linux OOM killer may be shooting things down. 
 When it happens, you will see something like 'killed process in system's 
 syslog.
 
 Thanks,
 +Vinod
 
 On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
 
 Vinod,
 
   One more thing I observed is that, my Client which submits Application 
 Master one after another continuously also gets killed sometimes. So, it is 
 always any of the Java Processes that is getting killed. Does it indicate 
 some excessive memory usage by them or something like that, that is causing 
 them die? If so, how can we resolve this kind of issue?
 
 Thanks,
 Kishore
 
 
 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
 No, I am running on 2 node cluster.
 
 
 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 Is all of this on a single node?
 
 Thanks,
 +Vinod
 
 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
 
 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500 times, 
 and while doing so one of the daemons, node manager, resource manager, or 
 data node is getting killed (I mean disappearing) at a random point. I see 
 no information in the corresponding log files. How can I know why is it 
 happening so?
 
  And, one more observation is that, this is happening only when I am using 
 * for node name in the container requests, otherwise when I used a 
 specific node name, everything is fine.
 
 Thanks,
 Kishore
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately

Re: Pluggable distribute cache impl

2013-12-16 Thread Vinod Kumar Vavilapalli

If the files are already on a NFS mount, you don't need to spread files around 
distributed-cache?

BTW, running jobs on NFS mounts isn't going to scale after a while.

Thanks,
+Vinod

On Dec 15, 2013, at 1:15 PM, Jay Vyas jayunit...@gmail.com wrote:

 are there any ways to plug in an alternate distributed cache implantation 
 (I.e when nodes of a cluster already have an nfs mount or other local data 
 service...)?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: pipes on hadoop 2.2.0 crashes

2013-12-16 Thread Vinod Kumar Vavilapalli
You should navigate to the ResourceManager UI following the link and see what 
is happening on the ResourceManager as well as the application-master. Check if 
any nodes are active first. Then look at ResourceManager and NodeManager logs.

+Vinod

On Dec 16, 2013, at 10:29 AM, Mauro Del Rio mdrio1...@gmail.com wrote:

 I installed hadoop 2.2.0 on a small cluster, just two nodes. I run a simple 
 wordcount in c++ with pipes, this time there was no exception, but the job 
 didn't finish. This is the output on the shell where I launched pipes:
 
 mauro@mauro-VirtualBox:~/hadoop-2.2.0$ bin/mapred pipes -program wc -input 
 test.sh -output out
 13/12/16 18:51:41 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:9052
 13/12/16 18:51:41 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:9052
 13/12/16 18:51:41 WARN mapreduce.JobSubmitter: No job jar file set.  User 
 classes may not be found. See Job or Job#setJar(String).
 13/12/16 18:51:41 INFO mapred.FileInputFormat: Total input paths to process : 
 1
 13/12/16 18:51:41 INFO mapreduce.JobSubmitter: number of splits:2
 13/12/16 18:51:41 INFO Configuration.deprecation: user.name is deprecated. 
 Instead, use mapreduce.job.user.name
 13/12/16 18:51:41 INFO Configuration.deprecation: 
 mapred.cache.files.filesizes is deprecated. Instead, use 
 mapreduce.job.cache.files.filesizes
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.cache.files is 
 deprecated. Instead, use mapreduce.job.cache.files
 13/12/16 18:51:41 INFO Configuration.deprecation: 
 mapred.pipes.user.inputformat is deprecated. Instead, use 
 mapreduce.pipes.inputformat
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.output.value.class 
 is deprecated. Instead, use mapreduce.job.output.value.class
 13/12/16 18:51:41 INFO Configuration.deprecation: 
 mapred.mapoutput.value.class is deprecated. Instead, use 
 mapreduce.map.output.value.class
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.input.dir is 
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.output.dir is 
 deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.map.tasks is 
 deprecated. Instead, use mapreduce.job.maps
 13/12/16 18:51:41 INFO Configuration.deprecation: hadoop.pipes.partitioner is 
 deprecated. Instead, use mapreduce.pipes.partitioner
 13/12/16 18:51:41 INFO Configuration.deprecation: hadoop.pipes.executable is 
 deprecated. Instead, use mapreduce.pipes.executable
 13/12/16 18:51:41 INFO Configuration.deprecation: 
 mapred.cache.files.timestamps is deprecated. Instead, use 
 mapreduce.job.cache.files.timestamps
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.output.key.class is 
 deprecated. Instead, use mapreduce.job.output.key.class
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.mapoutput.key.class 
 is deprecated. Instead, use mapreduce.map.output.key.class
 13/12/16 18:51:41 INFO Configuration.deprecation: mapred.working.dir is 
 deprecated. Instead, use mapreduce.job.working.dir
 13/12/16 18:51:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_1387213974967_0003
 13/12/16 18:51:42 INFO mapred.YARNRunner: Job jar is not present. Not adding 
 any jar to the list of resources.
 13/12/16 18:51:42 INFO impl.YarnClientImpl: Submitted application 
 application_1387213974967_0003 to ResourceManager at /0.0.0.0:9052
 13/12/16 18:51:42 INFO mapreduce.Job: The url to track the job: 
 http://mauro-VirtualBox:8088/proxy/application_1387213974967_0003/
 13/12/16 18:51:42 INFO mapreduce.Job: Running job: job_1387213974967_0003
 
 
 The job status from bin/mapred job -list  is PREP. I didn't find any 
 interesting information in logs file. 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Yarn -- one of the daemons getting killed

2013-12-13 Thread Vinod Kumar Vavilapalli
Yes, that is what I suspect. That is why I asked if everything is on a single 
node. If you are running linux, linux OOM killer may be shooting things down. 
When it happens, you will see something like 'killed process in system's 
syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Vinod,
 
   One more thing I observed is that, my Client which submits Application 
 Master one after another continuously also gets killed sometimes. So, it is 
 always any of the Java Processes that is getting killed. Does it indicate 
 some excessive memory usage by them or something like that, that is causing 
 them die? If so, how can we resolve this kind of issue?
 
 Thanks,
 Kishore
 
 
 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
 No, I am running on 2 node cluster.
 
 
 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 Is all of this on a single node?
 
 Thanks,
 +Vinod
 
 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
 
 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500 times, 
 and while doing so one of the daemons, node manager, resource manager, or 
 data node is getting killed (I mean disappearing) at a random point. I see 
 no information in the corresponding log files. How can I know why is it 
 happening so?
 
  And, one more observation is that, this is happening only when I am using 
 * for node name in the container requests, otherwise when I used a 
 specific node name, everything is fine.
 
 Thanks,
 Kishore
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: pipes on hadoop 2.2.0 crashes

2013-12-13 Thread Vinod Kumar Vavilapalli

Could it just be LocalJobRunner? Can you try it on a cluster? We've tested 
pipes on clusters, so will be surprised if it doesn't work there.

Thanks,
+Vinod

On Dec 13, 2013, at 7:44 AM, Mauro Del Rio mdrio1...@gmail.com wrote:

 Hi, I tried to run a simple test with pipes, but it crashes.
 
 java.lang.Exception: java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.pipes.Application.init(Application.java:104)
   at 
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:69)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 13/12/13 16:38:03 INFO mapreduce.Job: Job job_local1570213319_0001 running in 
 uber mode : false
 13/12/13 16:38:03 INFO mapreduce.Job:  map 0% reduce 0%
 13/12/13 16:38:03 INFO mapreduce.Job: Job job_local1570213319_0001 failed 
 with state FAILED due to: NA
 13/12/13 16:38:03 INFO mapreduce.Job: Counters: 0
 Exception in thread main java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
   at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:264)
   at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:503)
   at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:518)
 
 The line Application.java:104 is: byte[]  password = jobToken.getPassword();
 so jobToken seems to be null.
 
 It does not depend upon c++ code, since the error occurs before it is 
 launched.
 I run it on Ubuntu 12.04 32 bit using the binary tarball.
 Any idea why it doesn't work?
 
 
 -- 
 Mauro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: how to create symbolic link in hdfs with c++ code or webhdfs interface?

2013-12-13 Thread Vinod Kumar Vavilapalli
What version of Hadoop?

Thanks,
+Vinod

On Dec 13, 2013, at 1:57 AM, Xiaobin She xiaobin...@gmail.com wrote:

 I'm writting an c++ programme, and I need to deal with hdfs.
 
 What I need is to create some file in hdfs and read the status of these files.
 
 And I need to be able to create sym link in hdfs and need to know if an file 
 in hdfs is an sym link file or not.
 
 According to what I have known, the c client lib of hdfs ( which is libhdfs) 
 does not support sym link.
 
 And I have notice that there is an webhdfs interface for hdfs.
 
 According to this page, it does support sym link.
 
 
 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_a_Symbolic_Link
 
 
 But I have try this, and I got the following message.
 
 {RemoteException:{exception:UnsupportedOperationException,javaClassName:java.lang.UnsupportedOperationException,message:Symlinks
  not supported}}
 
 
 Does this means webhdfs does not support sym link too ?
 
 So if this true, how can I create an sym link and get the status of this file 
 in c++ code?
 
 thank you very much for your help.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: issue about no class find in running MR job

2013-12-13 Thread Vinod Kumar Vavilapalli
That is not the correct usage. You should do hadoop jar your-jar-name 
main-class-name. Or if you are adventurous, directly invoke your class using 
java and setting appropriate classpath.

Thanks,
+Vinod

On Dec 12, 2013, at 6:11 PM, ch huang justlo...@gmail.com wrote:

 hadoop ../test/WordCount


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Unsubscribe Please

2013-12-12 Thread Vinod Kumar Vavilapalli
You should send an email to user-unsubscr...@hadoop.apache.org.

Thanks,
+Vinod

On Dec 12, 2013, at 8:36 AM, K. M. Rakibul Islam rakib1...@gmail.com wrote:

 Unsubscribe Please!
 
 Thanks.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Yarn -- one of the daemons getting killed

2013-12-12 Thread Vinod Kumar Vavilapalli
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500 times, 
 and while doing so one of the daemons, node manager, resource manager, or 
 data node is getting killed (I mean disappearing) at a random point. I see no 
 information in the corresponding log files. How can I know why is it 
 happening so?
 
  And, one more observation is that, this is happening only when I am using 
 * for node name in the container requests, otherwise when I used a specific 
 node name, everything is fine.
 
 Thanks,
 Kishore


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Writing to remote HDFS using C# on Windows

2013-12-05 Thread Vinod Kumar Vavilapalli
You can try using WebHDFS.

Thanks,
+Vinod


On Thu, Dec 5, 2013 at 6:04 PM, Fengyun RAO raofeng...@gmail.com wrote:

 Hi, All

 Is there a way to write files into remote HDFS on Linux using C# on
 Windows? We want to use HDFS as data storage.

 We know there is HDFS java API, but not C#. We tried SAMBA for file
 sharing and FUSE for mounting HDFS. It worked if we simply copy files to
 HDFS, but if we open a filestream and write to it, it always throws
 exceptions.

 Best regards!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Container [pid=22885,containerID=container_1386156666044_0001_01_000013] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 332.5 GB of 8 GB virtual memo

2013-12-05 Thread Vinod Kumar Vavilapalli
Something looks really bad on your cluster. The JVM's heap size is 200MB
but its virtual memory has ballooned to a monstrous 332GB. Does that ring
any bell? Can you run regular java applications on this node? This doesn't
seem related to YARN per-se.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Wed, Dec 4, 2013 at 5:16 AM, panfei cnwe...@gmail.com wrote:



 -- Forwarded message --
 From: panfei cnwe...@gmail.com
 Date: 2013/12/4
 Subject: Container
 [pid=22885,containerID=container_138615044_0001_01_13] is running
 beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical
 memory used; 332.5 GB of 8 GB virtual memory used. Killing container.
 To: CDH Users cdh-u...@cloudera.org


 Hi All:

 We are using CDH4.5 Hadoop for our production, when submit some (not all)
 jobs from hive, we get the following exception info , seems the physical
 memory and virtual memory both not enough for the job to run:


 Task with the most failures(4):
 -
 Task ID:
   task_138615044_0001_m_00

 URL:

 http://namenode-1:8088/taskdetails.jsp?jobid=job_138615044_0001tipid=task_138615044_0001_m_00
 -
 Diagnostic Messages for this Task:
 Container [pid=22885,containerID=container_138615044_0001_01_13]
 is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
 physical memory used; 332.5 GB of 8 GB virtual memory used. Killing
 container.
 Dump of the process-tree for container_138615044_0001_01_13 :
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
 SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 22885 22036 22885 22885 (java) 5414 108 356993519616 271953
 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN -Xmx200m
 -Djava.io.tmpdir=/data/yarn/local/usercache/hive/appcache/application_138615044_0001/container_138615044_0001_01_13/tmp
 -Dlog4j.configuration=container-log4j.properties
 -Dyarn.app.mapreduce.container.log.dir=/var/log/hadoop-yarn/containers/application_138615044_0001/container_138615044_0001_01_13
 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
 org.apache.hadoop.mapred.YarnChild 192.168.101.55 60841
 attempt_138615044_0001_m_00_3 13

 following is some of our configuration:

   property
 nameyarn.nodemanager.resource.memory-mb/name
 value12288/value
   /property

   property
 nameyarn.nodemanager.vmem-pmem-ratio/name
 value8/value
   /property

   property
 nameyarn.nodemanager.vmem-check-enabled/name
 valuefalse/value
   /property

   property
 nameyarn.nodemanager.resource.cpu-vcores/name
 value6/value
   /property

 can you give me some advice? thanks a lot.
 --
 不学习,不知道



 --
 不学习,不知道


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Client mapred tries to renew a token with renewer specified as nobody

2013-12-04 Thread Vinod Kumar Vavilapalli

It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' 
but mapred is trying to renew the token), you may want to check this.

Thanks,
+Vinod

On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:

 2013-12-02 15:57:08,541 ERROR 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:mapred/xxx.cern...@cern.ch (auth:KERBEROS) 
 cause:org.apache.hadoop.security.AccessControlException: Client mapred tries 
 to renew a token with renewer specified as nobody


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: issue about the MR JOB local dir

2013-12-04 Thread Vinod Kumar Vavilapalli
These are the directories where NodeManager (as configured) will store its 
local files. Local files includes scripts, jars, libraries - all files sent to 
nodes via DistributedCache.

Thanks,
+Vinod

On Dec 3, 2013, at 5:26 PM, ch huang wrote:

 hi,maillist:
 i see three dirs on my local MR job dir ,and i do not know these 
 dirs usage,anyone knows? 
  
 # ls /data/1/mrlocal/yarn/local/
 filecache/ nmPrivate/ usercache/


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: issue about capacity scheduler

2013-12-04 Thread Vinod Kumar Vavilapalli

If both the jobs in the MR queue are from the same user, CapacityScheduler will 
only try to run them one after another. If possible, run them as different 
users. At which point, you will see sharing across jobs because they are from 
different users.

Thanks,
+Vinod

On Dec 4, 2013, at 1:33 AM, ch huang wrote:

 hi,maillist :
  i use yarn framework and capacity scheduler  ,and i have two 
 queue ,one for hive and the other for big MR job
 in hive  queue it's work fine,because hive task is very faster ,but what i 
 think is user A submitted two big MR job ,so first big job eat
 all the resource belongs to the queue ,the other big MR job should wait until 
 first job finished ,how can i let the same user 's MR job can run parallel?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Time taken for starting AMRMClientAsync

2013-11-17 Thread Vinod Kumar Vavilapalli
It is just creating a connection to RM and shouldn't take that long. Can you 
please file a ticket so that we can look at it?

JVM class loading overhead is one possibility but 1 sec is a bit too much.

Thanks,
+Vinod

On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking from 
 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I 
 mean does it depend on any of the interval parameters or so in configuration 
 files? I have tried reducing the value of the first argument below from 1000 
 to 100 seconds also, but that doesn't help.
 
 AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
 amRMClient.init(conf);
 amRMClient.start();
 
 
 Thanks,
 Kishore
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop 2.2.0: Cannot run PI in under YARN

2013-11-08 Thread Vinod Kumar Vavilapalli
This is just a symptom not the root cause. Please check the YARN web UI at 8088 
on ResourceManager machine and browse to the application page. It should give 
you more details.

Thanks,
+Vinod

On Nov 8, 2013, at 8:57 AM, Ping Luo wrote:

 java.io.FileNotFoundException: File does not exist


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Error while running Hadoop Source Code

2013-11-06 Thread Vinod Kumar Vavilapalli
 36_0001_m_02_0
  
 Regards,
 
 Indrashish
 
  
 On Tue, 5 Nov 2013 10:09:36 -0800, Vinod Kumar Vavilapalli wrote:
 
 It seems like your pipes mapper is exiting before consuming all the input. 
 Did you check the task-logs on the web UI?
 
 Thanks,
 +Vinod
 
 On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote:
 
 
 Hi,
 
 Can anyone kindly assist on this ?
 
 Regards,
 Indrashish
 
 
 On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote:
 Hi All,
 Any update on the below post ?
 I came across some old post regarding the same issue. It explains the
 solution as  The *nopipe* example needs more documentation.  It
 assumes that it is run with the InputFormat from
 src/test/org/apache/*hadoop*/mapred/*pipes*/
 *WordCountInputFormat*.java, which has a very specific input split
 format. By running with a TextInputFormat, it will send binary bytes
 as the input split and won't work right. The *nopipe* example should
 probably be recoded *to* use libhdfs *too*, but that is more
 complicated *to* get running as a unit test. Also note that since the
 C++ example is using local file reads, it will only work on a cluster
 if you have nfs or something working across the cluster. 
 I would need some more light on the above explanation , so if anyone
 could elaborate a bit about the same as what needs to be done exactly.
 To mention, I am trying to run a sample KMeans algorithm on a GPU
 using Hadoop.
 Thanks in advance.
 Regards,
 Indrashish.
 On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote:
 Hi,
 I am trying to run a sample Hadoop GPU source code (kmeans algorithm)
 on an ARM processor and getting the below error. Can anyone please
 throw some light on this ?
 rmr: cannot remove output: No such file or directory.
 13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or
 JobConf#setJar(String).
 13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 13/10/31 13:43:13 INFO mapred.JobClient: Running job: job_201310311320_0001
 13/10/31 13:43:14 INFO mapred.JobClient:  map 0% reduce 0%
 13/10/31 13:43:39 INFO mapred.JobClient: Task Id :
 attempt_201310311320_0001_m_00_0, Status : FAILED
 java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92)
... 3 more
 attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec
 '/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D'
 '0'   /dev/null  1
 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout
 2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/
 Regards,
 
 -- 
 Indrashish Basu
 Graduate Student
 Department of Electrical and Computer Engineering
 University of Florida
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
  
 -- 
 Indrashish Basu 
 Graduate Student 
 Department of Electrical and Computer Engineering 
 University of Florida
 
  
 -- 
 Indrashish Basu 
 Graduate Student 
 Department of Electrical and Computer Engineering 
 University of Florida
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any

Re: only one map or reduce job per time on one node

2013-11-05 Thread Vinod Kumar Vavilapalli
Why do you want to do this?

+Vinod

On Nov 5, 2013, at 9:17 AM, John wrote:

 Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce 
 job per time?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Error while running Hadoop Source Code

2013-11-05 Thread Vinod Kumar Vavilapalli
It seems like your pipes mapper is exiting before consuming all the input. Did 
you check the task-logs on the web UI?

Thanks,
+Vinod

On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote:

 
 Hi,
 
 Can anyone kindly assist on this ?
 
 Regards,
 Indrashish
 
 
 On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote:
 Hi All,
 
 Any update on the below post ?
 
 I came across some old post regarding the same issue. It explains the
 solution as  The *nopipe* example needs more documentation.  It
 assumes that it is run with the InputFormat from
 src/test/org/apache/*hadoop*/mapred/*pipes*/
 *WordCountInputFormat*.java, which has a very specific input split
 format. By running with a TextInputFormat, it will send binary bytes
 as the input split and won't work right. The *nopipe* example should
 probably be recoded *to* use libhdfs *too*, but that is more
 complicated *to* get running as a unit test. Also note that since the
 C++ example is using local file reads, it will only work on a cluster
 if you have nfs or something working across the cluster. 
 
 I would need some more light on the above explanation , so if anyone
 could elaborate a bit about the same as what needs to be done exactly.
 To mention, I am trying to run a sample KMeans algorithm on a GPU
 using Hadoop.
 
 Thanks in advance.
 
 Regards,
 Indrashish.
 
 On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote:
 Hi,
 
 I am trying to run a sample Hadoop GPU source code (kmeans algorithm)
 on an ARM processor and getting the below error. Can anyone please
 throw some light on this ?
 
 rmr: cannot remove output: No such file or directory.
 13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or
 JobConf#setJar(String).
 13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 13/10/31 13:43:13 INFO mapred.JobClient: Running job: job_201310311320_0001
 13/10/31 13:43:14 INFO mapred.JobClient:  map 0% reduce 0%
 13/10/31 13:43:39 INFO mapred.JobClient: Task Id :
 attempt_201310311320_0001_m_00_0, Status : FAILED
 java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92)
... 3 more
 
 attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec
 '/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D'
 '0'   /dev/null  1
 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout
 2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/
 
 Regards,
 
 -- 
 Indrashish Basu
 Graduate Student
 Department of Electrical and Computer Engineering
 University of Florida


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Tasktracker Permission Issue?

2013-09-18 Thread Vinod Kumar Vavilapalli
What is your config set to for mapred local dirs? And what are the permissions 
to those directories?

All users need executable permissions in all the paths up to the local-dir so 
that they can create their own directories in there. For e.g. if one of the 
mapred local dir is /a/b/c/mapred, then all of /a, /a/b, /a/b/c etc need to be 
executable by everyone - an executable permission is needed in a linux dir for 
someone to be able to create files/dir in some of the sub-directories.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 18, 2013, at 7:26 AM, Christopher Penney wrote:

 I have a test environment with hadoop 1.1.1 setup with Kerberos and yesterday 
 I zapped my mapred.local.dir on the job and task trackers as part of some 
 cleanup.  When I started the task trackers back up I was unable to run MR 
 jobs.  This seems like a permission issue, but I can't figure out what it 
 would be since it auto creates everything.  I didn't make any changes to 
 taskcontroller.cfg or mapred-site.xml.  Below is a log from the task tracker.
 
Chris
 
 2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: 
 LaunchTaskAction (registerTask): attempt_201309180916_0024_m_02_0 task's 
 state:UNASSIGNED
 2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
 launch : attempt_201309180916_0024_m_02_0 which needs 1 slots
 2013-09-18 10:21:27,040 INFO org.apache.hadoop.mapred.TaskTracker: In 
 TaskLauncher, current free slots : 16 and trying to launch 
 attempt_201309180916_0024_m_02_0 which needs 1 slots
 2013-09-18 10:21:28,524 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 initializing attempt_201309180916_0024_m_02_0:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 taskTracker/cpenney/jobcache/job_201309180916_0024/job.xml in any of the 
 configured local directories
  at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
  at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
  at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1341)
  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
  at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
  at java.lang.Thread.run(Thread.java:662)
 
 2013-09-18 10:21:28,525 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
 set finish time for task attempt_201309180916_0024_m_02_0 when no start 
 time is set, stackTrace is : java.lang.Exception
  at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145)
  at 
 org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3285)
  at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2578)
  at java.lang.Thread.run(Thread.java:662)
 
 2013-09-18 10:21:28,525 INFO org.apache.hadoop.mapred.TaskTracker: 
 addFreeSlot : current free slots : 16
 2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: 
 LaunchTaskAction (registerTask): attempt_201309180916_0024_m_02_1 task's 
 state:UNASSIGNED
 2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
 launch : attempt_201309180916_0024_m_02_1 which needs 1 slots
 2013-09-18 10:21:28,554 INFO org.apache.hadoop.mapred.TaskTracker: In 
 TaskLauncher, current free slots : 16 and trying to launch 
 attempt_201309180916_0024_m_02_1 which needs 1 slots
 2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Reading 
 task controller config from /etc/hadoop/taskcontroller.cfg
 2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: main : 
 command provided 0
 2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: main : 
 user is cpenney
 2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Good 
 mapred-local-dirs are /tmp/hadoop/mapred
 2013-09-18 10:21:28,595 INFO org.apache.hadoop.mapred.TaskController: Can't 
 open 
 /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken
  for output - File exists
 2013-09-18 10:21:28,596 WARN org.apache.hadoop.mapred.TaskTracker: Exception 
 while localization java.io.IOException: Job initialization failed (255) with 
 output: Reading task controller config from /etc/hadoop/taskcontroller.cfg
 main : command provided 0
 main : user is cpenney
 Good mapred-local-dirs are /tmp/hadoop/mapred
 Can't open 
 /tmp/hadoop/mapred/taskTracker/cpenney/jobcache/job_201309180916_0024/jobToken
  for output - File exists
 
  at 
 org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:193)
  at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1323)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136

Re: Resource limits with Hadoop and JVM

2013-09-16 Thread Vinod Kumar Vavilapalli
I assume you are on Linux. Also assuming that your tasks are so resource 
intensive that they are taking down nodes. You should enable limits per task, 
see http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring

What it does is that jobs are now forced to up front provide their resource 
requirements, and TTs enforce those limits.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:

 We recently experienced a couple of situations that brought one or more 
 Hadoop nodes down (unresponsive).   One was related to a bug in a utility we 
 use (ffmpeg) that was resolved by compiling a new version. The next, today, 
 occurred after attempting to join a new node to the cluster.   
 
 A basic start of the (local) tasktracker and datanode did not work -- so 
 based on reference, I issued:  hadoop mradmin -refreshNodes, which was to be 
 followed by hadoop dfsadmin -refreshNodes.The load average literally 
 jumped to 60 and the master (which also runs a slave) became unresponsive.
 
 Seems to me that this should never happen.   But, looking around, I saw an 
 article from Spotify which mentioned the need to set certain resource limits 
 on the JVM as well as in the system itself (limits.conf, we run RHEL).I 
 (and we) are fairly new to Hadoop, so some of these issues are very new.
 
 I wonder if some of the experts here might be able to comment on this issue - 
 perhaps point out settings and other measures we can take to prevent this 
 sort of incident in the future.
 
 Our setup is not complicated.   Have 3 hadoop nodes, the first is also a 
 master and a slave (has more resources, too).   The underlying system we do 
 is split up tasks to ffmpeg  (which is another issue as it tends to eat 
 resources, but so far with a recompile, we are good).   We have two more 
 hardware nodes to add shortly.
 
 
 Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: chaining (the output of) jobs/ reducers

2013-09-13 Thread Vinod Kumar Vavilapalli

Other than the short term solutions that others have proposed, Apache Tez 
solves this exact problem. It can M-M-R-R-R chains, and mult-way mappers and 
reducers, and your own custom processors - all without persisting the 
intermediate outputs to HDFS.

It works on top of YARN, though the first release of Tez is yet to happen.

You can learn about it more here: http://tez.incubator.apache.org/

HTH,
+Vinod

On Sep 12, 2013, at 6:36 AM, Adrian CAPDEFIER wrote:

 Howdy,
 
 My application requires 2 distinct processing steps (reducers) to be 
 performed on the input data. The first operation generates changes the key 
 values and, records that had different keys in step 1 can end up having the 
 same key in step 2.
 
 The heavy lifting of the operation is in step1 and step2 only combines 
 records where keys were changed.
 
 In short the overview is:
 Sequential file - Step 1 - Step 2 - Output.
 
 
 To implement this in hadoop, it seems that I need to create a separate job 
 for each step. 
 
 Now I assumed, there would some sort of job management under hadoop to link 
 Job 1 and 2, but the only thing I could find was related to job scheduling 
 and nothing on how to synchronize the input/output of the linked jobs.
 
 
 
 The only crude solution that I can think of is to use a temporary file under 
 HDFS, but even so I'm not sure if this will work.
 
 The overview of the process would be:
 Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer 
 (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer 
 (key2, value 3)] = output.
 
 Is there a better way to pass the output from Job A as input to Job B (e.g. 
 using network streams or some built in java classes that don't do disk i/o)? 
 
 
 
 The temporary file solution will work in a single node configuration, but I'm 
 not sure about an MPP config.
 
 Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both 
 jobs run on all 4 nodes - will HDFS be able to redistribute automagically the 
 records between nodes or does this need to be coded somehow? 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Setting user in yarn in 2.1.0

2013-09-11 Thread Vinod Kumar Vavilapalli

Depends on what 'running means'. For all purposes other than launching the 
user's container process, yes, the user who submits the application is 
considered - for e.g. ACLs, Nodemanager security etc. In non-secure case when 
using DefaultContainerExecutor, the container still runs as the user running 
YARN. In secure case, it will run as the app-submitter.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 11, 2013, at 10:17 AM, Albert Shau wrote:

 In 2.1.0, the method to set user in the ApplicationSubmissionContext and 
 ContainerLaunchContext has been removed and is now in the container token.  
 Does this mean the application will now always run as the user who submits 
 the application, or is there some other way to set the user now?
 
 Thanks,
 Albert


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: assign tasks to specific nodes

2013-09-11 Thread Vinod Kumar Vavilapalli

I assume you are talking about MapReduce. And 1.x release or 2.x?

In either of the releases, this cannot be done directly.

In 1.x, the framework doesn't expose a feature like this as it is a shared 
service, and if enough jobs flock to a node, it will lead to utilization and 
failure handling issues.

In Hadoop 2 YARN, the platform does expose this functionality. But MapReduce 
framework doesn't yet expose this functionality to the end users.

What exactly is your use case? Why are some nodes of higher priority than 
others?

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 11, 2013, at 10:09 AM, Mark Olimpiati wrote:

 Thanks for replying Rev, but the link is talking about reducers which seems 
 to be like a similar case but what if I assigned priorities to the data 
 partitions (eg. partition B=1, partition C=2, partition A=3,...) such that 
 first map task is assigned partition B to run first. Then second map is given 
 partition C, .. etc. This is instead of assigning based on partition size. Is 
 that possible?
 
 Thanks,
 Mark
 
 
 On Mon, Sep 9, 2013 at 11:17 AM, Ravi Prakash ravi...@ymail.com wrote:
 http://lucene.472066.n3.nabble.com/Assigning-reduce-tasks-to-specific-nodes-td4022832.html
 
 From: Mark Olimpiati markq2...@gmail.com
 To: user@hadoop.apache.org 
 Sent: Friday, September 6, 2013 1:47 PM
 Subject: assign tasks to specific nodes
 
 Hi guys, 
 
I'm wondering if there is a way for me to assign tasks to specific 
 machines or at least assign priorities to the tasks to be executed in that 
 order. Any suggestions?
 
 Thanks,
 Mark
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Job status shows 0's for counters

2013-09-03 Thread Vinod Kumar Vavilapalli
We've observed this internally too.

Shinichi, tx for the patch. Will follow up on JIRA to get it committed.

Thanks,
+Vinod

On Sep 3, 2013, at 11:35 AM, Shinichi Yamashita wrote:

 Hi,
 I reported this issue in MAPREDUCE-5376 
 (https://issues.apache.org/jira/browse/MAPREDUCE-5376) and attached a patch.
 But it is not fixed by the current release.
 
 Thanks,
 Shinichi
 
 (2013/09/03 11:20), Robert Dyer wrote:
 I just noticed the job status for MR jobs tends to show 0's in the Map
 and Reduce columns but actually shows the totals correctly.
 
 I am not sure exactly when this started happening, but this cluster was
 upgraded from Hadoop 1.0.4 to 1.1.2 and now to 1.2.1.  It definitely
 worked fine on 1.0.4, but I don't recall testing on 1.1.2.
 
 Anyone else running into this issue?
 
 Map-Reduce Framework Reduce input groups 0   0   0
 Map output materialized bytes0   0   5,910
 Combine output records   0   0   250
 Map input records0   0   48,556
 Reduce shuffle bytes 0   0   4,960
 Physical memory (bytes) snapshot 0   0   56,364,822,528
 
 
 Note that not *all* counters are doing this!  Some still show properly:
 
 FileSystemCounters   HDFS_BYTES_READ 844,499,067,747 0   
 844,499,067,747
 FILE_BYTES_WRITTEN   4,879,509   60,403  4,939,912
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [yarn] job is not getting assigned

2013-08-29 Thread Vinod Kumar Vavilapalli

This usually means there are no available resources as seen by the 
ResourceManager. Do you see Active Nodes on the RM web UI first page? If not, 
you'll have to check the NodeManager logs to see if they crashed for some 
reason.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Aug 29, 2013, at 7:52 AM, Andre Kelpe wrote:

 Hi,
 
 I am in the middle of setting up a hadoop 2 cluster. I am using the hadoop 
 2.1-beta tarball. 
 
 My cluster has 1 master node running the hdfs namenode, the resourcemanger 
 and the job history server. Next to that I have  3 nodes acting as datanodes 
 and nodemanagers.
 
 In order to test, if everything is working, I submitted the teragen job from 
 the hadoop-examples jar like this:
 
 $ hadoop jar 
 $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar
  teragen 1000 /user/vagrant/teragen
 
 The job starts up and I  get the following output:
 
 13/08/29 14:42:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 13/08/29 14:42:47 INFO client.RMProxy: Connecting to ResourceManager at 
 master.local/192.168.7.10:8032
 13/08/29 14:42:48 INFO terasort.TeraSort: Generating 1000 using 2
 13/08/29 14:42:48 INFO mapreduce.JobSubmitter: number of splits:2
 13/08/29 14:42:48 WARN conf.Configuration: user.name is deprecated. Instead, 
 use mapreduce.job.user.name
 13/08/29 14:42:48 WARN conf.Configuration: mapred.jar is deprecated. Instead, 
 use mapreduce.job.jar
 13/08/29 14:42:48 WARN conf.Configuration: mapred.reduce.tasks is deprecated. 
 Instead, use mapreduce.job.reduces
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.value.class is 
 deprecated. Instead, use mapreduce.job.output.value.class
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.map.class is deprecated. 
 Instead, use mapreduce.job.map.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.job.name is deprecated. 
 Instead, use mapreduce.job.name
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.inputformat.class is 
 deprecated. Instead, use mapreduce.job.inputformat.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.dir is deprecated. 
 Instead, use mapreduce.output.fileoutputformat.outputdir
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.outputformat.class is 
 deprecated. Instead, use mapreduce.job.outputformat.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.map.tasks is deprecated. 
 Instead, use mapreduce.job.maps
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.key.class is 
 deprecated. Instead, use mapreduce.job.output.key.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.working.dir is deprecated. 
 Instead, use mapreduce.job.working.dir
 13/08/29 14:42:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_1377787324271_0001
 13/08/29 14:42:50 INFO impl.YarnClientImpl: Submitted application 
 application_1377787324271_0001 to ResourceManager at 
 master.local/192.168.7.10:8032
 13/08/29 14:42:50 INFO mapreduce.Job: The url to track the job: 
 http://master.local:8088/proxy/application_1377787324271_0001/
 13/08/29 14:42:50 INFO mapreduce.Job: Running job: job_1377787324271_0001
 
 and then it stops. If I check the UI, I see this:
 
 application_1377787324271_0001vagrant TeraGen MAPREDUCE   default 
 Thu, 29 Aug 2013 14:42:49 GMT   N/A ACCEPTEDUNDEFINED   
 
 UNASSIGNED
 
 I have no idea, why it is not starting, nor what to look for. Any pointers 
 are more than welcome!
 
 Thanks!
 
 - André
 
 -- 
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


  1   2   >