RE: ANSWER PLEASE

2015-10-28 Thread #TANG SHANJIANG#
`The correct answer should be A and D.

From: Sajid Mohammed [mailto:sajid.had...@gmail.com]
Sent: 2015年10月28日 PM 7:33
To: user@hadoop.apache.org
Subject: ANSWER PLEASE

You have a cluster running with the fair Scheduler enabled. There are currently 
no jobs running on the cluster, and you submit a job A, so that only job A is 
running on the cluster. A while later, you submit Job B. now Job A and Job B 
are running on the cluster at the same time. How will the Fair Scheduler handle 
these two jobs? (Choose 2)

A. When Job B gets submitted, it will get assigned tasks, while job A continues 
to run with fewer tasks.
B. When Job B gets submitted, Job A has to finish first, before job B can gets 
scheduled.
C. When Job A gets submitted, it doesn't consumes all the task slots.
D. When Job A gets submitted, it consumes all the task slots.


Re: ANSWER PLEASE

2015-10-28 Thread Daniel Jankovic
not to mention using CAPS

On Wed, Oct 28, 2015 at 12:50 PM, Kai Voigt  wrote:

> No, the correct answer is „Don’t cheat on a Cloudera exam“ :-) This has
> been reported to certificat...@cloudera.com
>
> Looks like you won’t get that certificate...
>
> Am 28.10.2015 um 11:46 schrieb t...@bentzn.com:
>
> The correct answer would be:
>
> do your own homework :-D
>
>
>
>
> --
> -Original Besked-
> Fra: "Sajid Mohammed" 
> Til: user@hadoop.apache.org
> Dato: 28-10-2015 11:32
> Emne: ANSWER PLEASE
>
> *You have a cluster running with the fair Scheduler enabled. There are
> currently no jobs running on the cluster, and you submit a job A, so that
> only job A is running on the cluster. A while later, you submit Job B. now
> Job A and Job B are running on the cluster at the same time. How will the
> Fair Scheduler handle these two jobs? (Choose 2)*
>
>
> A. When Job B gets submitted, it will get assigned tasks, while job A
> continues to run with fewer tasks.
>
> B. When Job B gets submitted, Job A has to finish first, before job B can
> gets scheduled.
>
> C. When Job A gets submitted, it doesn't consumes all the task slots.
>
> D. When Job A gets submitted, it consumes all the task slots.
>
>
> --
> *Kai Voigt* Am Germaniahafen 1 k...@123.org
> 24143 Kiel +49 160 96683050
> Germany @KaiVoigt
>
>


Re: ANSWER PLEASE

2015-10-28 Thread Kai Voigt
No, the correct answer is „Don’t cheat on a Cloudera exam“ :-) This has been 
reported to certificat...@cloudera.com

Looks like you won’t get that certificate...

> Am 28.10.2015 um 11:46 schrieb t...@bentzn.com:
> 
> The correct answer would be:
> 
> do your own homework :-D
> 
> 
> 
> 
> -Original Besked-
> Fra: "Sajid Mohammed" >
> Til: user@hadoop.apache.org 
> Dato: 28-10-2015 11:32
> Emne: ANSWER PLEASE
> 
> You have a cluster running with the fair Scheduler enabled. There are 
> currently no jobs running on the cluster, and you submit a job A, so that 
> only job A is running on the cluster. A while later, you submit Job B. now 
> Job A and Job B are running on the cluster at the same time. How will the 
> Fair Scheduler handle these two jobs? (Choose 2)
> 
> 
> 
> A. When Job B gets submitted, it will get assigned tasks, while job A 
> continues to run with fewer tasks.
> 
> B. When Job B gets submitted, Job A has to finish first, before job B can 
> gets scheduled.
> 
> C. When Job A gets submitted, it doesn't consumes all the task slots.
> 
> D. When Job A gets submitted, it consumes all the task slots.
> 

Kai Voigt   Am Germaniahafen 1  
k...@123.org
24143 Kiel  
+49 160 96683050
Germany 
@KaiVoigt



Re: ANSWER PLEASE

2015-10-28 Thread Tenghuan He
AD

On Wed, Oct 28, 2015 at 7:32 PM, Sajid Mohammed 
wrote:

> *You have a cluster running with the fair Scheduler enabled. There are
> currently no jobs running on the cluster, and you submit a job A, so that
> only job A is running on the cluster. A while later, you submit Job B. now
> Job A and Job B are running on the cluster at the same time. How will the
> Fair Scheduler handle these two jobs? (Choose 2)*
>
>
> A. When Job B gets submitted, it will get assigned tasks, while job A
> continues to run with fewer tasks.
>
> B. When Job B gets submitted, Job A has to finish first, before job B can
> gets scheduled.
>
> C. When Job A gets submitted, it doesn't consumes all the task slots.
>
> D. When Job A gets submitted, it consumes all the task slots.
>


ANSWER PLEASE

2015-10-28 Thread Sajid Mohammed
*You have a cluster running with the fair Scheduler enabled. There are
currently no jobs running on the cluster, and you submit a job A, so that
only job A is running on the cluster. A while later, you submit Job B. now
Job A and Job B are running on the cluster at the same time. How will the
Fair Scheduler handle these two jobs? (Choose 2)*


A. When Job B gets submitted, it will get assigned tasks, while job A
continues to run with fewer tasks.

B. When Job B gets submitted, Job A has to finish first, before job B can
gets scheduled.

C. When Job A gets submitted, it doesn't consumes all the task slots.

D. When Job A gets submitted, it consumes all the task slots.


Re: ANSWER PLEASE

2015-10-28 Thread Alexander Alten-Lorenz
LOL - thats gorgeous! Well spoken, Kai :) 

> On Oct 28, 2015, at 12:50 PM, Kai Voigt  wrote:
> 
> No, the correct answer is „Don’t cheat on a Cloudera exam“ :-) This has been 
> reported to certificat...@cloudera.com 
> 
> Looks like you won’t get that certificate...
> 
>> Am 28.10.2015 um 11:46 schrieb t...@bentzn.com :
>> 
>> The correct answer would be:
>> 
>> do your own homework :-D
>> 
>> 
>> 
>> 
>> -Original Besked-
>> Fra: "Sajid Mohammed" > >
>> Til: user@hadoop.apache.org 
>> Dato: 28-10-2015 11:32
>> Emne: ANSWER PLEASE
>> 
>> You have a cluster running with the fair Scheduler enabled. There are 
>> currently no jobs running on the cluster, and you submit a job A, so that 
>> only job A is running on the cluster. A while later, you submit Job B. now 
>> Job A and Job B are running on the cluster at the same time. How will the 
>> Fair Scheduler handle these two jobs? (Choose 2)
>> 
>> 
>> 
>> A. When Job B gets submitted, it will get assigned tasks, while job A 
>> continues to run with fewer tasks.
>> 
>> B. When Job B gets submitted, Job A has to finish first, before job B can 
>> gets scheduled.
>> 
>> C. When Job A gets submitted, it doesn't consumes all the task slots.
>> 
>> D. When Job A gets submitted, it consumes all the task slots.
>> 
> 
> Kai Voigt Am Germaniahafen 1  
> k...@123.org 
>   24143 Kiel  
> +49 160 96683050
>   Germany 
> @KaiVoigt
> 



AM is unable to launch

2015-10-28 Thread Sanjeev Verma
My job is failing and only I can see a log

java.lang.Exception:Unknown container. Container either has not started or
has already completed or doesn’t belong to this node at all

there is no error exception in NM and RM logs, after reviewing the logs I
can see my app master container getting killed after it localized by RM.
any clue whats going wrong here?


Hive showing SemanticException [Error 10002]: Line 3:21 Invalid column reference 'mbdate

2015-10-28 Thread Kumar Jayapal
Hello,


Can some please help. When I execute hive query with as case statement I
get this error " Error while compiling statement: FAILED: SemanticException
[Error 10002]: Line 3:21 Invalid column reference 'mbdate'

Here is the query :
select  a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem,

(CASE WHEN to_date(a.mbdate) =  d.today_ly  THEN (a.mbdsun) END) as TODAY_LY
FROM items a
JOIN ivsdays d
ON a.mbdate = d.cldatei
Join ivsref r
ON r.company = a.mbcmpy
AND r.warehouse = a.mbwhse
AND r.itemnumber = a.mbitem

WHERE
a.mbcmpy = 1
AND a.mbdept = 20

group by
   a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem, Today_ly

ORDER by
1,2,3,4,5

Same query work in Impala. I had checked mbdate column is present in the
table.



Here is the hue log :

[27/Oct/2015 14:53:21 -0700] dbms ERRORBad status for request
TExecuteStatementReq(confOverlay={},
sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='L1:\x9c3KB\x94\xaf\x8c\xfa\x8d\x98\x97\xe1Q',
guid='+o\x00\xe8\xc5\x12C\xab\xbb\xb5KV\xe0\xf5\x93\xc9')), runAsync=True,
statement='select  a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem, \n\n(CASE WHEN
to_date(a.mbdate) =  d.today_ly  THEN (a.mbdsun) END) as TODAY_LY\n\nFROM
tlog.item_detail a\nJOIN Adv_analytics.ivsdays d\nON a.mbdate =
d.cldatei\nJoin adv_analytics.ivsref r\nON r.company = a.mbcmpy\nAND
r.warehouse = a.mbwhse \nAND r.itemnumber = a.mbitem\n\n\nWHERE\na.mbcmpy =
1\nAND a.mbdept = 20\n\n\ngroup by \n   a.mbcmpy, a.mbwhse, a.mbdept,
a.mbitem, Today_ly\n\nORDER by\n1,2,3,4,5'):
TExecuteStatementResp(status=TStatus(errorCode=10002, errorMessage="Error
while compiling statement: FAILED: SemanticException [Error 10002]: Line
3:21 Invalid column reference 'mbdate'", sqlState='42000',
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error while
compiling statement: FAILED: SemanticException [Error 10002]: Line 3:21
Invalid column reference 'mbdate':17:16",
'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:315',
'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:102',
'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:171',
'org.apache.hive.service.cli.operation.Operation:run:Operation.java:257',
'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:398',
'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:385',
'org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:258',
'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:490',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745',
"*org.apache.hadoop.hive.ql.parse.SemanticException:Line 3:21 Invalid
column reference 'mbdate':32:16",
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genAllExprNodeDesc:SemanticAnalyzer.java:10299',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genExprNodeDesc:SemanticAnalyzer.java:10247',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genSelectPlan:SemanticAnalyzer.java:3720',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genSelectPlan:SemanticAnalyzer.java:3499',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPostGroupByBodyPlan:SemanticAnalyzer.java:8761',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genBodyPlan:SemanticAnalyzer.java:8716',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPlan:SemanticAnalyzer.java:9573',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPlan:SemanticAnalyzer.java:9466',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genOPTree:SemanticAnalyzer.java:9902',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:analyzeInternal:SemanticAnalyzer.java:9913',
'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:analyzeInternal:SemanticAnalyzer.java:9830',
'org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer:analyze:BaseSemanticAnalyzer.java:222',
'org.apache.hadoop.hive.ql.Driver:compile:Driver.java:422',
'org.apache.hadoop.hive.ql.Driver:compile:Driver.java:306',
'org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:',
'org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1105',
'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:100'],
statusCode=3), operationHandle=None)
Traceback (most recent call 

Re: Hive showing SemanticException [Error 10002]: Line 3:21 Invalid column reference 'mbdate

2015-10-28 Thread sreebalineni .
Check if the query works without join and alias reference,if yes then the
problem is with alias name,i recently faced the same problem i think adding
as just before giving alias name workef
On 28 Oct 2015 20:22, "Kumar Jayapal"  wrote:

> Hello,
>
>
> Can some please help. When I execute hive query with as case statement I
> get this error " Error while compiling statement: FAILED:
> SemanticException [Error 10002]: Line 3:21 Invalid column reference 'mbdate'
>
> Here is the query :
> select  a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem,
>
> (CASE WHEN to_date(a.mbdate) =  d.today_ly  THEN (a.mbdsun) END) as
> TODAY_LY
> FROM items a
> JOIN ivsdays d
> ON a.mbdate = d.cldatei
> Join ivsref r
> ON r.company = a.mbcmpy
> AND r.warehouse = a.mbwhse
> AND r.itemnumber = a.mbitem
>
> WHERE
> a.mbcmpy = 1
> AND a.mbdept = 20
>
> group by
>a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem, Today_ly
>
> ORDER by
> 1,2,3,4,5
>
> Same query work in Impala. I had checked mbdate column is present in the
> table.
>
>
>
> Here is the hue log :
>
> [27/Oct/2015 14:53:21 -0700] dbms ERRORBad status for request
> TExecuteStatementReq(confOverlay={},
> sessionHandle=TSessionHandle(sessionId=THandleIdentifier(secret='L1:\x9c3KB\x94\xaf\x8c\xfa\x8d\x98\x97\xe1Q',
> guid='+o\x00\xe8\xc5\x12C\xab\xbb\xb5KV\xe0\xf5\x93\xc9')), runAsync=True,
> statement='select  a.mbcmpy, a.mbwhse, a.mbdept, a.mbitem, \n\n(CASE WHEN
> to_date(a.mbdate) =  d.today_ly  THEN (a.mbdsun) END) as TODAY_LY\n\nFROM
> tlog.item_detail a\nJOIN Adv_analytics.ivsdays d\nON a.mbdate =
> d.cldatei\nJoin adv_analytics.ivsref r\nON r.company = a.mbcmpy\nAND
> r.warehouse = a.mbwhse \nAND r.itemnumber = a.mbitem\n\n\nWHERE\na.mbcmpy =
> 1\nAND a.mbdept = 20\n\n\ngroup by \n   a.mbcmpy, a.mbwhse, a.mbdept,
> a.mbitem, Today_ly\n\nORDER by\n1,2,3,4,5'):
> TExecuteStatementResp(status=TStatus(errorCode=10002, errorMessage="Error
> while compiling statement: FAILED: SemanticException [Error 10002]: Line
> 3:21 Invalid column reference 'mbdate'", sqlState='42000',
> infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error while
> compiling statement: FAILED: SemanticException [Error 10002]: Line 3:21
> Invalid column reference 'mbdate':17:16",
> 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:315',
> 'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:102',
> 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:171',
> 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:257',
> 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:398',
> 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:385',
> 'org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:258',
> 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:490',
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313',
> 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298',
> 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
> 'java.lang.Thread:run:Thread.java:745',
> "*org.apache.hadoop.hive.ql.parse.SemanticException:Line 3:21 Invalid
> column reference 'mbdate':32:16",
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genAllExprNodeDesc:SemanticAnalyzer.java:10299',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genExprNodeDesc:SemanticAnalyzer.java:10247',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genSelectPlan:SemanticAnalyzer.java:3720',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genSelectPlan:SemanticAnalyzer.java:3499',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPostGroupByBodyPlan:SemanticAnalyzer.java:8761',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genBodyPlan:SemanticAnalyzer.java:8716',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPlan:SemanticAnalyzer.java:9573',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genPlan:SemanticAnalyzer.java:9466',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:genOPTree:SemanticAnalyzer.java:9902',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:analyzeInternal:SemanticAnalyzer.java:9913',
> 'org.apache.hadoop.hive.ql.parse.SemanticAnalyzer:analyzeInternal:SemanticAnalyzer.java:9830',
> 

check decommission status

2015-10-28 Thread ram kumar
Hi,

Is there a java api to get decommission status for a particular data node?

Thanks.


Re: lzo error while running mr job

2015-10-28 Thread Kiru Pakkirisamy
Harish,Thank you very much for your valuable/assertive suggestion :-)I was able 
to identify the problem and fix it.Else where in the code, we were setting a 
different mapred-site.xml in the configuration.I still do not know why it is 
using the DefaultCodec for compression (instead of the one I set - 
SnappyCodec), but I am hopeful I will get there. Thanks again. Regards, - kiru
  From: Harsh J 
 To: "user@hadoop.apache.org"  
 Sent: Tuesday, October 27, 2015 8:34 AM
 Subject: Re: lzo error while running mr job
   
The stack trace is pretty certain you do, as it clearly tries to load a class 
not belonging within Apache Hadoop. Try looking at the XML files the 
application uses? Perhaps you've missed all the spots.
If I had to guess, given the JobSubmitter entry in the trace, it'd be in the 
submitting host's /etc/hadoop/conf/* files, or in the dir pointed by 
$HADOOP_CONF_DIR (if thats specifically set). Alternatively, it'd be in the 
code.
If you have control over the code, you can also make it dump the XML before 
submit via: job.getConfiguration().writeXml(System.out);. The XML dump will 
carry the source of all properties along with their value.


On Tue, Oct 27, 2015 at 8:52 PM Kiru Pakkirisamy  
wrote:


| Harish,We don't have lzo in the io.compression.codecs list.That is what is 
puzzling me.Regards, Kiru  
|  From:"Harsh J" 
Date:Mon, Oct 26, 2015 at 11:39 PM
Subject:Re: lzo error while running mr job

 |

 |


| 
|  Every codec in the io.compression.codecs list of classes will be 
initialised, regardless of actual further use. Since the Lzo*Codec classes 
require the native library to initialise, the failure is therefore expected.
On Tue, Oct 27, 2015 at 11:42 AM Kiru Pakkirisamy  
wrote:

I am seeing a weird error after we moved to the new hadoop mapreduce java 
packages in 2.4We are not using lzo (as in io.compression.codecs) but we still 
get this error. Does it mean we have to have lzo installed even though we are 
not using ? Thanks.
Regards,- kiru
2015-10-27 00:18:57,994 ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader | 
Could not load native gpl libraryjava.lang.UnsatisfiedLinkError: no 
gplcompression in java.library.path at 
java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) ~[?:1.7.0_85] at 
java.lang.Runtime.loadLibrary0(Runtime.java:849) ~[?:1.7.0_85] at 
java.lang.System.loadLibrary(System.java:1088) ~[?:1.7.0_85] at 
com.hadoop.compression.lzo.GPLNativeCodeLoader.(GPLNativeCodeLoader.java:31)
 [flow-trunk.242-470787.jar:?] at 
com.hadoop.compression.lzo.LzoCodec.(LzoCodec.java:60) 
[flow-trunk.242-470787.jar:?] at java.lang.Class.forName0(Native Method) 
[?:1.7.0_85] at java.lang.Class.forName(Class.java:278) [?:1.7.0_85] at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1834)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1799) 
[flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:175)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:159)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:243)
 [flow-trunk.242-470787.jar:?] at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) 
[flow-trunk.242-470787.jar:?]
 Regards, - kiru
 |

 |




  

RE: How do I customize data placement on DataNodes (DN) of Hadoop cluster?

2015-10-28 Thread Naganarasimha G R (Naga)
Hi Praveen and Salil,

If the data is being written from one of the cluster nodes then preference 
would be given for local node irrespective of the Rack being configured.
If its written remotely(not from one of cluster nodes) then there is 
possibility of blocks getting distributed.
Further you can think of having some custom BlockPlacementPolicy by extending 
BlockPlacementPolicydefault and configuring "dfs.block.replicator.classname" if 
required.

+ Naga


From: praveen S [mylogi...@gmail.com]
Sent: Tuesday, October 27, 2015 17:44
To: user@hadoop.apache.org
Subject: Re: How do I customize data placement on DataNodes (DN) of Hadoop 
cluster?


May be Using rack concept might work

On 27 Oct 2015 17:32, "Norah Jones" 
> wrote:
Hi,

Let we change the default block size to 32 MB and replication factor to 1. Let 
Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to 
place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) 
each and DN3 and DN4 contain 1 block (32 MB) each.

Can it be possible? How to accomplish it?

Thanks,
Salil