NodeManager High CPU due to high GC

2016-01-22 Thread Randy Fox
Hi,

We just upgraded to using Yarn on Hadoop 2.6.0 – CDH5.4.5
We are running a large job – 200K mappers, 100K reducers and we can’t get 
through the shuffle phase.  The node managers are 800% cpu and high GC.  The 
reducers get socket timouts after 1.5 hours of running and only getting a few 
percent of the data from the mappers.  This job took about 30 hours total 12 in 
mappers on MRv1 with no issues.

I have looked for configs that might help or issues filed and anyone that has 
seen this and I have come up with nothing.
Anyone have ideas on things to try or explain why the node managers are in GC 
hell and why the data is just not flowing from mappers to reducers?

Thanks in advanced,

Randy


YARN queues become unusable and jobs are stuck in ACCEPTED state

2016-01-22 Thread Matt Cheah
Hi,

I¹ve sporadically been seeing an issue when using Hadoop YARN. I¹m using
Hadoop 2.5.0, CDH5.3.3.

When I¹ve configured the stack to use the fair scheduler protocol, after
some period of time of the cluster being alive and running jobs, I¹m
noticing that when I submit a job, the job will be stuck in the ACCEPTED
state even though the cluster has sufficient resources to spawn an
application master container as well as the queue I¹m submitting to having
sufficient resources available. Furthermore, all jobs submitted to that
queue will be stuck in the ACCEPTED state. I can unblock job submission by
going into the allocation XML file, renaming the queue, and submitting jobs
to that renamed queue instead. However the queue has only changed name, and
all of its other settings have been preserved.

It is clearly untenable for me to have to change the queues that I¹m using
sometimes. This appears to happen irrespective of the settings of the queue,
e.g. Its weight or its minimum resource share. The events leading up to this
occurrence are strictly unpredictable and I have no concrete way to
reproduce the issue. The logs don¹t show anything interesting either; the
resource manager just states that it schedules an attempt for the
application submitted to the bad queue, but the attempt¹s application master
is never allocated to a container anywhere.

I have looked around the YARN bug base and couldn¹t find any similar issues.
I¹ve also used jstack to inspect the Resource Manager process, but nothing
is obviously wrong there. I was wondering if anyone has encountered a
similar issue before. I apologize that the description is vague, but it¹s
the best way I can describe it.

Thanks,

-Matt Cheah




smime.p7s
Description: S/MIME cryptographic signature


Re: Yarn app: Cannot run "java -jar" container

2016-01-22 Thread Kristoffer Sjögren
Thanks for the tip Hitesh - that's really helpful.

On Fri, Jan 22, 2016 at 7:47 PM, Hitesh Shah  wrote:
> Ideally, the “yarn logs -application” command should give you the logs for 
> the container in question and the stdout/stderr there usually gives you a 
> good indication on what is going wrong.
>
> Second more complex option:
>- Set yarn.nodemanager.delete.debug-delay-sec to say 1200 or a large 
> enough value. Restart all NMs.
>- Run your application.
>- Find the node on which your container failed.
>- Search through the yarn nodemanager local-dirs to find the 
> launch_container.sh for your container.
>- Look at its contents to see if things are being setup correctly. Run it 
> manually to debug.
>
> — Hitesh
>
> On Jan 22, 2016, at 5:52 AM, Kristoffer Sjögren  wrote:
>
>> Hi
>>
>> I'm trying to run a yarn 2.7.1 application using a basic boilerplate
>> [1]. But I have trouble running the container with an executable jar
>> file using the following args list.
>>
>>List arg = Collections.singletonList(
>>  "/usr/jdk64/jdk1.8.0_40/bin/java -jar app-1.0.0-SNAPSHOT.jar" +
>>" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
>>" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"
>>);
>>
>> I'm not really sure why it doesn't work because the diagnostics
>> message on tells me exit code 1 (see below). Neither the stdout or
>> stderr files appear, and I have tried to pipe them to /tmp/stdout etc.
>>
>> How do I debug this error? Is the diagnostics message the only way? I
>> have tried a gazillion different combinations of running the
>> container, and the process is very time consuming and frustrating when
>> there isn't any information to debug.
>>
>> Any tips or pointers on how to trace this error down?
>>
>> Cheers,
>> -Kristoffer
>>
>> [1] 
>> https://github.com/hortonworks/simple-yarn-app/tree/master/src/main/java/com/hortonworks/simpleyarnapp
>>
>>
>> Stack trace: ExitCodeException exitCode=1:
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
>> at org.apache.hadoop.util.Shell.run(Shell.java:487)
>> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> Container exited with a non-zero exit code 1
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Yarn app: Cannot run "java -jar" container

2016-01-22 Thread Hitesh Shah
Ideally, the “yarn logs -application” command should give you the logs for the 
container in question and the stdout/stderr there usually gives you a good 
indication on what is going wrong. 

Second more complex option: 
   - Set yarn.nodemanager.delete.debug-delay-sec to say 1200 or a large enough 
value. Restart all NMs.
   - Run your application. 
   - Find the node on which your container failed. 
   - Search through the yarn nodemanager local-dirs to find the 
launch_container.sh for your container. 
   - Look at its contents to see if things are being setup correctly. Run it 
manually to debug. 

— Hitesh

On Jan 22, 2016, at 5:52 AM, Kristoffer Sjögren  wrote:

> Hi
> 
> I'm trying to run a yarn 2.7.1 application using a basic boilerplate
> [1]. But I have trouble running the container with an executable jar
> file using the following args list.
> 
>List arg = Collections.singletonList(
>  "/usr/jdk64/jdk1.8.0_40/bin/java -jar app-1.0.0-SNAPSHOT.jar" +
>" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
>" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"
>);
> 
> I'm not really sure why it doesn't work because the diagnostics
> message on tells me exit code 1 (see below). Neither the stdout or
> stderr files appear, and I have tried to pipe them to /tmp/stdout etc.
> 
> How do I debug this error? Is the diagnostics message the only way? I
> have tried a gazillion different combinations of running the
> container, and the process is very time consuming and frustrating when
> there isn't any information to debug.
> 
> Any tips or pointers on how to trace this error down?
> 
> Cheers,
> -Kristoffer
> 
> [1] 
> https://github.com/hortonworks/simple-yarn-app/tree/master/src/main/java/com/hortonworks/simpleyarnapp
> 
> 
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
> at org.apache.hadoop.util.Shell.run(Shell.java:487)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> Container exited with a non-zero exit code 1
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?

2016-01-22 Thread Rex X
The two SequentialTextFiles correspond to two Hive tables, say tableA and
tableB below on

hdfs://hive/tableA//MM/DD/*/part-0
and
hdfs://hive/tableB//MM/DD/*/part-0

Both of them are partitioned by date, for example,

hdfs://hive/tableA/2016/01/01/*/part-0

Now we want to do a left outer join on tableA.id=tableB.id, for a date
range, for example, from 2015/12/01 to 2016/01/09.

Within Hive it is pretty easy

select * from tableA a left outer join tableB b
on a.id=b.id
where a.dt is between '20151201' and '20160109'
and b.dt is between '20151201' and '20160109';


What's the best way to do Outer join and Inner join of these two
SequentialTextFiles using Hadoop streaming and Python ?

Any comments will be appreciated!


Re: What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-22 Thread Rex X
Hi LLoyd,

The metadata of this table are STRINGs, BIGINTs, and TINYINTs, 2000
attributes in total.

I need to transform the data that cannot be done with built-in functions of
Hive.

Thank you.

Best,
Rex


On Fri, Jan 22, 2016 at 1:58 AM, Namikaze Minato 
wrote:

> Hello. We don't have any information about your data.
>
> I don't think we can help you with this. Also, I cannot understand what
> you are trying to achieve. Please also tell us why you are using hadoop
> streaming instead of hive to do your operations.
>
> Regards,
> LLoyd
>
> On 22 January 2016 at 06:30, Rex X  wrote:
>
>> The given sequential files correspond to an external Hive table.
>>
>> They are stored in
>> /tableName/part-0
>> /tableName/part-1
>> ...
>>
>> There are about 2000 attributes in the table. Now I want to process the
>> data using Hadoop streaming and mapReduce. The first step is to find the
>> offset and length for each attribute.
>>
>> What is the best way to get this information?
>>
>>
>>
>


Yarn app: Cannot run "java -jar" container

2016-01-22 Thread Kristoffer Sjögren
Hi

I'm trying to run a yarn 2.7.1 application using a basic boilerplate
[1]. But I have trouble running the container with an executable jar
file using the following args list.

List arg = Collections.singletonList(
  "/usr/jdk64/jdk1.8.0_40/bin/java -jar app-1.0.0-SNAPSHOT.jar" +
" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"
);

I'm not really sure why it doesn't work because the diagnostics
message on tells me exit code 1 (see below). Neither the stdout or
stderr files appear, and I have tried to pipe them to /tmp/stdout etc.

How do I debug this error? Is the diagnostics message the only way? I
have tried a gazillion different combinations of running the
container, and the process is very time consuming and frustrating when
there isn't any information to debug.

Any tips or pointers on how to trace this error down?

Cheers,
-Kristoffer

[1] 
https://github.com/hortonworks/simple-yarn-app/tree/master/src/main/java/com/hortonworks/simpleyarnapp


Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Unsubscribe

2016-01-22 Thread Hitansu
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org




Sent from my iPhone

> On Jan 22, 2016, at 4:43 PM, chandu banavaram  
> wrote:
> 
> unsubscribe me
> 
>> On Fri, Jan 22, 2016 at 4:01 PM, Doris Donley 
>>  wrote:
> 


Re: Unsubscribe

2016-01-22 Thread chandu banavaram
unsubscribe me

On Fri, Jan 22, 2016 at 4:01 PM, Doris Donley <
do...@dorisdonley1953.onmicrosoft.com> wrote:

>
>


Unsubscribe

2016-01-22 Thread Doris Donley



Re: What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-22 Thread Namikaze Minato
Hello. We don't have any information about your data.

I don't think we can help you with this. Also, I cannot understand what you
are trying to achieve. Please also tell us why you are using hadoop
streaming instead of hive to do your operations.

Regards,
LLoyd

On 22 January 2016 at 06:30, Rex X  wrote:

> The given sequential files correspond to an external Hive table.
>
> They are stored in
> /tableName/part-0
> /tableName/part-1
> ...
>
> There are about 2000 attributes in the table. Now I want to process the
> data using Hadoop streaming and mapReduce. The first step is to find the
> offset and length for each attribute.
>
> What is the best way to get this information?
>
>
>