Re: Capacity scheduler properties

2015-01-15 Thread Wangda Tan
You can check HDP 2.2's document:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/capacity_scheduler/index.html

HTH,
Wangda

On Thu, Jan 15, 2015 at 4:22 AM, Jakub Stransky stransky...@gmail.com
wrote:

 Hello,

 I am configuring capacity scheduler all seems ok but I cannot find what is
 the meaning of the following property

 yarn.scheduler.capacity.root.unfunded.capacity

 I just found that everywhere is set to 50 and description is No
 description.

 Can anybody clarify or point to where to find relevant documentation?

 Thx
 Jakub




Re: Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
Wow, pretty awesome documentation!


Thx

On 15 January 2015 at 19:53, Wangda Tan wheele...@gmail.com wrote:

 You can check HDP 2.2's document:

 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/capacity_scheduler/index.html

 HTH,
 Wangda

 On Thu, Jan 15, 2015 at 4:22 AM, Jakub Stransky stransky...@gmail.com
 wrote:

 Hello,

 I am configuring capacity scheduler all seems ok but I cannot find what
 is the meaning of the following property

 yarn.scheduler.capacity.root.unfunded.capacity

 I just found that everywhere is set to 50 and description is No
 description.

 Can anybody clarify or point to where to find relevant documentation?

 Thx
 Jakub





-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky


Re: Capacity scheduler properties

2015-01-15 Thread Fabio
Actually this made me curious, but I don't see any reference to that 
specific conf entry in the doc, at least at a first text search.
Since the Hortonworks' appears to be the only real documentation, I 
would suggest you to download the source code and find where and how 
that particular parameter is used in there (unfortunately, it is often 
the only way).
It must be a new entry in the config of the CS, so hopefully it will 
have a better description in the next versions of Hadoop.


Regards

Fabio

On 01/15/2015 10:52 PM, Jakub Stransky wrote:

Wow, pretty awesome documentation!


Thx

On 15 January 2015 at 19:53, Wangda Tan wheele...@gmail.com 
mailto:wheele...@gmail.com wrote:


You can check HDP 2.2's document:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/capacity_scheduler/index.html

HTH,
Wangda

On Thu, Jan 15, 2015 at 4:22 AM, Jakub Stransky
stransky...@gmail.com mailto:stransky...@gmail.com wrote:

Hello,

I am configuring capacity scheduler all seems ok but I cannot
find what is the meaning of the following property

yarn.scheduler.capacity.root.unfunded.capacity

I just found that everywhere is set to 50 and description is
No description.

Can anybody clarify or point to where to find relevant
documentation?

Thx
Jakub





--
Jakub Stransky
cz.linkedin.com/in/jakubstransky http://cz.linkedin.com/in/jakubstransky





Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

2015-01-15 Thread unmesha sreeveni
Is there any way..
Waiting for a reply.I have posted the question every where..but none is
responding back.
I feel like this is the right place to ask doubts. As some of u may came
across the same issue and get stuck.

On Thu, Jan 15, 2015 at 12:34 PM, unmesha sreeveni unmeshab...@gmail.com
wrote:

 Yes, One of my friend is implemeting the same. I know global sharing of
 Data is not possible across Hadoop MapReduce. But I need to check if that
 can be done somehow in hadoop Mapreduce also. Because I found some papers
 in KNN hadoop also.
 And I trying to compare the performance too.

 Hope some pointers can help me.


 On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:


 have you considered implementing using something like spark?  That could
 be much easier than raw map-reduce

 On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni unmeshab...@gmail.com
  wrote:

 In KNN like algorithm we need to load model Data into cache for
 predicting the records.

 Here is the example for KNN.


 [image: Inline image 1]

 So if the model will be a large file say1 or 2 GB we will be able to
 load them into Distributed cache.

 The one way is to split/partition the model Result into some files and
 perform the distance calculation for all records in that file and then find
 the min ditance and max occurance of classlabel and predict the outcome.

 How can we parttion the file and perform the operation on these
 partition ?

 ie  1 record Distance parttition1,partition2,
  2nd record Distance parttition1,partition2,...

 This is what came to my thought.

 Is there any further way.

 Any pointers would help me.

 --
 *Thanks  Regards *


 *Unmesha Sreeveni U.B*
 *Hadoop, Bigdata Developer*
 *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
 http://www.unmeshasreeveni.blogspot.in/






 --
 *Thanks  Regards *


 *Unmesha Sreeveni U.B*
 *Hadoop, Bigdata Developer*
 *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
 http://www.unmeshasreeveni.blogspot.in/





-- 
*Thanks  Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
Hello,

I am configuring capacity scheduler all seems ok but I cannot find what is
the meaning of the following property

yarn.scheduler.capacity.root.unfunded.capacity

I just found that everywhere is set to 50 and description is No
description.

Can anybody clarify or point to where to find relevant documentation?

Thx
Jakub