Re: how to use Yarn API to find task/attempt status

2016-03-09 Thread Jeff Zhang
If it is for M/R, then maybe this is what you want
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/JobStatus.html



On Thu, Mar 10, 2016 at 1:58 PM, Frank Luo  wrote:

> Let’s say there are 10 standard M/R jobs running. How to find how many
> tasks are done/running/pending?
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Wednesday, March 09, 2016 9:33 PM
> *To:* Frank Luo
> *Cc:* user@hadoop.apache.org
> *Subject:* Re: how to use Yarn API to find task/attempt status
>
>
>
> I don't think it is related with yarn. Yarn don't know about task/task
> attempt, it only knows containers. So it should be your application to
> provide such function.
>
>
>
> On Thu, Mar 10, 2016 at 11:29 AM, Frank Luo  wrote:
>
> Anyone had a similar issue and knows the answer?
>
>
>
> *From:* Frank Luo
> *Sent:* Wednesday, March 09, 2016 1:59 PM
> *To:* 'user@hadoop.apache.org'
> *Subject:* how to use Yarn API to find task/attempt status
>
>
>
> I have a need to programmatically find out how many tasks are pending in
> Yarn. Is there a way to do it through a Java API?
>
>
>
> I looked at YarnClient, but not able to find what I need.
>
>
>
> Thx in advance.
>
>
>
> Frank Luo
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>



-- 
Best Regards

Jeff Zhang


Re: how to use Yarn API to find task/attempt status

2016-03-09 Thread Sultan Alamro

You still can see the tasks status through the web interfaces.

Look at the end of this page
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

> On Mar 10, 2016, at 12:58 AM, Frank Luo  wrote:
> 
> Let’s say there are 10 standard M/R jobs running. How to find how many tasks 
> are done/running/pending?
>  
> From: Jeff Zhang [mailto:zjf...@gmail.com] 
> Sent: Wednesday, March 09, 2016 9:33 PM
> To: Frank Luo
> Cc: user@hadoop.apache.org
> Subject: Re: how to use Yarn API to find task/attempt status
>  
> I don't think it is related with yarn. Yarn don't know about task/task 
> attempt, it only knows containers. So it should be your application to 
> provide such function. 
>  
> On Thu, Mar 10, 2016 at 11:29 AM, Frank Luo  wrote:
> Anyone had a similar issue and knows the answer?
>  
> From: Frank Luo 
> Sent: Wednesday, March 09, 2016 1:59 PM
> To: 'user@hadoop.apache.org'
> Subject: how to use Yarn API to find task/attempt status
>  
> I have a need to programmatically find out how many tasks are pending in 
> Yarn. Is there a way to do it through a Java API?
>  
> I looked at YarnClient, but not able to find what I need.
>  
> Thx in advance.
>  
> Frank Luo
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.
> 
> 
> 
>  
> --
> Best Regards
> 
> Jeff Zhang
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.


RE: how to use Yarn API to find task/attempt status

2016-03-09 Thread Frank Luo
Let’s say there are 10 standard M/R jobs running. How to find how many tasks 
are done/running/pending?

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Wednesday, March 09, 2016 9:33 PM
To: Frank Luo
Cc: user@hadoop.apache.org
Subject: Re: how to use Yarn API to find task/attempt status

I don't think it is related with yarn. Yarn don't know about task/task attempt, 
it only knows containers. So it should be your application to provide such 
function.

On Thu, Mar 10, 2016 at 11:29 AM, Frank Luo 
> wrote:
Anyone had a similar issue and knows the answer?

From: Frank Luo
Sent: Wednesday, March 09, 2016 1:59 PM
To: 'user@hadoop.apache.org'
Subject: how to use Yarn API to find task/attempt status

I have a need to programmatically find out how many tasks are pending in Yarn. 
Is there a way to do it through a Java API?

I looked at YarnClient, but not able to find what I need.

Thx in advance.

Frank Luo

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.



--
Best Regards

Jeff Zhang

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


how to use Yarn API to find task/attempt status

2016-03-09 Thread Frank Luo
I have a need to programmatically find out how many tasks are pending in Yarn. 
Is there a way to do it through a Java API?

I looked at YarnClient, but not able to find what I need.

Thx in advance.

Frank Luo

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


Re: Impala

2016-03-09 Thread Juri Yanase Triantaphyllou
Thanks. I will do it!


Juri



-Original Message-
From: Sean Busbey 
To: Nagalingam, Karthikeyan 
Cc: Kumar Jayapal ; user ; 
cdh-user 
Sent: Wed, Mar 9, 2016 12:53 pm
Subject: Re: Impala



You should join the mailing list for Apache Impala (incubating) and ask your 
question over there:


http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/




On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan 
 wrote:


Hello,
 
I am new to impala, my goal is to test join, aggregation against 2Million and 
10Million records. can you please provide some documentation or website for 
starter ?
 
Regards,
Karthikeyan Nagalingam,
Technical Marketing Engineer ( Big Data Analytics)
Mobile: 919-376-6422







-- 


busbey






A Mapreduce job failed. Need Help!

2016-03-09 Thread Juri Yanase Triantaphyllou

Dear Hadoop users:


I followed theinstructions of “Build and Install Hadoop 2.x or newer on 
Windows” and was ableto build and install Hadoop 2.7.2 on a PC running 
Windowns10. 
Then, I tried to runa wordcount job on a single node environment, but this 
mapreduce job failed. Couldsomeone please give me any feedback about my case? 
Do you have any ideas of whyI am failing?
 
If you needmore information, I would be glad to send you!
Thank youfor your help in advance!
 
--Juri
 
 
Here is somerelevant information:
Yarn nodemanagershows: 
16/03/0815:24:25 INFO localizer.ResourceLocalizationService: Localizer failed
java.lang.NullPointerException
 
Yarnresaucemanager shows:
16/03/0815:24:27 WARN resourcemanager.RMAuditLogger: USER=Van  
OPERATION=Application Finished - FailedTARGET=RMAppManagerRESULT=FAILURE  
DESCRIPTION=Appfailed with state: FAILED  PERMISSIONS=Application 
application_1457472180766_0001 failed 2 timesdue to AM Container for 
appattempt_1457472180766_0001_02 exited with  exitCode: -1000
For moredetailed output, check application 
trackingpage:http://Juri:8088/cluster/app/application_1457472180766_0001 Then, 
click onlinks to logs of each attempt.
 
16/03/0815:24:27 INFO 
resourcemanager.RMAppManager$ApplicationSummary:appId=application_1457472180766_0001,name=wordcount,user=Van,queue=default,state=FAILED,trackingUrl=http://Juri:8088/cluster/app/application_1457472180766_0001,appMasterHost=N/A,startTime=1457472264161,finishTime=1457472267648,finalStatus=FAILED,memorySeconds=3991,vcoreSeconds=1,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE
16/03/0815:24:29 INFO ipc.Server: Socket Reader #1 for port 8032: 
readAndProcess fromclient 192.168.1.72 threw exception [java.io.IOException: An 
existingconnection was forcibly closed by the remote host]
java.io.IOException:An existing connection was forcibly closed by the remote 
host
 
Hadoopnamenode shows:
16/03/08  15:24:29 INFO ipc.Server: Socket Reader #1 forport 9000: 
readAndProcess from client 127.0.0.1 threw exception[java.io.IOException: An 
existing connection was forcibly closed by the remotehost]
java.io.IOException:An existing connection was forcibly closed by the remote 
host
 
The log messageshows:
Failedredirect for container_1457472180766_0001_02_01
Failed whiletrying to construct the redirect url to the log server. Log Server 
url may notbe configured
java.lang.Exception:Unknown container. Container either has not started or has 
already completed ordoesn't belong to this node at all.


Re: Impala

2016-03-09 Thread Sean Busbey
You should join the mailing list for Apache Impala (incubating) and ask
your question over there:

http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/

On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan <
karthikeyan.nagalin...@netapp.com> wrote:

> Hello,
>
>
>
> I am new to impala, my goal is to test join, aggregation against 2Million
> and 10Million records. can you please provide some documentation or website
> for starter ?
>
>
>
> Regards,
>
> Karthikeyan Nagalingam,
>
> Technical Marketing Engineer ( Big Data Analytics)
>
> Mobile: 919-376-6422
>



-- 
busbey


Impala

2016-03-09 Thread Nagalingam, Karthikeyan
Hello,

I am new to impala, my goal is to test join, aggregation against 2Million and 
10Million records. can you please provide some documentation or website for 
starter ?

Regards,
Karthikeyan Nagalingam,
Technical Marketing Engineer ( Big Data Analytics)
Mobile: 919-376-6422


Re: Showing negative numbers for Hadoop resource manager web interface

2016-03-09 Thread Chathuri Wimalasena
Thank you for quick response..

Regards,
Chathuri

On Wed, Mar 9, 2016 at 10:40 AM, Dmytro Kabakchei <
dmitry.kabakc...@gmail.com> wrote:

> Hi,
> Checkout https://issues.apache.org/jira/browse/YARN-3933
> It isn't resolved yet, but gives an idea what is going on. Also the patch
> is available.
>
> Kind regards,
> Dmytro Kabakchei
>
>
> On 09.03.2016 17:27, Chathuri Wimalasena wrote:
>
> Hi All,
>
> We have a hadoop cluster running using hadoop 2.5.1. In resource manager
> web interface, under Cluster Metrics, we can see some negative numbers for
> fields like "Containers Running", "Memory Used" etc. Please see the
> attached image.
>
> Could you please tell me what can be the reason for these negative
> numbers.
>
> Thanks,
> Chathuri
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>
>


Re: Showing negative numbers for Hadoop resource manager web interface

2016-03-09 Thread Dmytro Kabakchei

Hi,
Checkout https://issues.apache.org/jira/browse/YARN-3933
It isn't resolved yet, but gives an idea what is going on. Also the 
patch is available.


Kind regards,
Dmytro Kabakchei


On 09.03.2016 17:27, Chathuri Wimalasena wrote:

Hi All,

We have a hadoop cluster running using hadoop 2.5.1. In resource 
manager web interface, under Cluster Metrics, we can see some negative 
numbers for fields like "Containers Running", "Memory Used" etc. 
Please see the attached image.


Could you please tell me what can be the reason for these negative 
numbers.


Thanks,
Chathuri


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org




oozie java action issue

2016-03-09 Thread Immanuel Fredrick
2016-03-09 03:54:32,070 ERROR [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error
writing History Event:
org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@7594e5ec
java.nio.channels.ClosedChannelException
at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1765)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:108)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
at org.apache.avro.io.JsonEncoder.flush(JsonEncoder.java:73)
at
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:84)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:1281)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:326)
at java.lang.Thread.run(Thread.java:745)
2016-03-09 03:54:32,070 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
attempt attempt_1456313053210_0589_m_00_0
2016-03-09 03:54:32,071 ERROR [eventHandlingThread]
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
Thread[eventHandlingThread,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.nio.channels.ClosedChannelException
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:585)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:326)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1765)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:108)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
at org.apache.avro.io.JsonEncoder.flush(JsonEncoder.java:73)
at
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:84)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:1281)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
... 2 more
2016-03-09 03:54:32,072 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1456313053210_0589_m_00 Task Transitioned from RUNNING to SUCCEEDED
2016-03-09 03:54:32,074 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2016-03-09 03:54:32,074 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1456313053210_0589Job Transitioned from RUNNING to COMMITTING
2016-03-09 03:54:32,075 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing
the event EventType: JOB_COMMIT
2016-03-09 03:54:32,075 ERROR [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: could not
create failure file.
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:837)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1720)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1662)
at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:404)
at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:400)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:400)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:343)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:917)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:898)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:795)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.touchz(CommitterEventHandler.java:265)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:280)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

[Error]Run Spark job as hdfs user from oozie workflow

2016-03-09 Thread Divya Gehlot
Hi,
I have non secure  Hadoop 2.7.2 cluster on EC2 having Spark 1.5.2
When I am submitting my spark scala script through shell script using Oozie
workflow.
I am submitting job as hdfs user but It is running as user = "yarn" so all
the output should get store under user/yarn directory only .

When I googled and got YARN-2424
 for non secure cluster
I changed the settings as per this docs

and when I ran my Oozie workflow as hdfs user  got below error

Application application_1457494230162_0004 failed 2 times due to AM
Container for appattempt_1457494230162_0004_02 exited with exitCode:
-1000
For more detailed output, check application tracking page:
http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8088/cluster/app/application_1457494230162_0004Then
,
click on links to logs of each attempt.
Diagnostics: Application application_1457494230162_0004 initialization
failed (exitCode=255) with output: main : command provided 0
main : run as user is hdfs
main : requested yarn user is hdfs
Can't create directory
/hadoop/yarn/local/usercache/hdfs/appcache/application_1457494230162_0004 -
Permission denied
Did not create any app directories
Failing this attempt. Failing the application.

After changing the settiing when I start spark shell
I got error saying that Error starting SQLContext -Yarn application has
ended

Has anybody ran into these kind of issues?
Would really appreciate if you could guide me to the steps/docs to resolve
it.


Thanks,
Divya