Re: Drawbacks of Hadoop Pipes

2014-03-04 Thread Silvina Caíno Lores
Hi there,

I've been working with pipes for some months and I've finally managed to
get it working as I wanted with some legacy code I had. However, I had many
many issues regarding not only my implementation (it had to be adapted in
several ways to fit pipes, it is very restrictive) but pipes itself (bugs,
obscure errors and lack of proper logging with the subsequent mad
debugging).

I also tried streaming but I found it even more complex to debug and I
found some deal-breaker errors that I couldn't overcome regarding buffering
and such. I also tried a SWIG interface to wrap my code into a Java
library, I'd never recommend that for you might end up introducing a lot of
memory issues and potential bugs into your already working code, and you
basically don't get anything useful from it.

I've never worked with CUDA though, but it shouldn't be any different from
my Hadoop Pipes deployment besides the specific libraries you need. Be
prepared to deal with configuration issues and many esoteric logs,
nevertheless.

My advise, based in my experience, is that you should be 99% sure that your
original code is solid before migrating to Hadoop Pipes, you will have
enough problems there anyway.

Good luck on your work :)
Regards,
Silvina


On 3 March 2014 16:11, Basu,Indrashish indrash...@ufl.edu wrote:


 Hello,

 Anyone can help regarding the below query.

 Regards,
 Indrashish


 On Sat, 01 Mar 2014 13:52:11 -0500, Basu,Indrashish wrote:

 Hello,

 I am trying to execute a CUDA benchmark in a Hadoop Framework and
 using Hadoop Pipes for invoking the CUDA code which is written in a
 C++ interface from the Hadoop Framework. I am just a bit interested in
 knowing what can be the drawbacks of using Hadoop Pipes for this and
 whether the implementation of Hadoop Streaming and JNI interface will
 be a better choice. I am a bit unclear on this, so if anyone can throw
 some light on this and clarify.

 Regards,
 Indrashish


 --
 Indrashish Basu
 Graduate Student
 Department of Electrical and Computer Engineering
 University of Florida



Re: Unable to export hadoop trunk into eclipse

2014-03-04 Thread nagarjuna kanamarlapudi
Yes I installed..

mvn clean install -DskipTests  was successful. Only import to eclipse is
failing.


On Tue, Mar 4, 2014 at 12:51 PM, Azuryy Yu azury...@gmail.com wrote:

 Have you installed protobuf on your computer?

 https://code.google.com/p/protobuf/downloads/list



 On Tue, Mar 4, 2014 at 3:08 PM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:

 Hi Ted,

 I didn't do that earlier.

 Now , I did it
 mvn:eclipse eclipse
  and tried importing the projects same into eclipse. Now, this is
 throwing the following errors


 1. No marketplace entries found to handle Execution compile-protoc, in
 hadoop-common/pom.xml in Eclipse.  Please see Help for more information.
 2. No marketplace entries found to handle Execution compile-protoc, in
 hadoop-hdfs/src/contrib/bkjournal/pom.xml in Eclipse.  Please see Help for
 more information.


 Any idea  ??


 On Tue, Mar 4, 2014 at 10:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you run the following command under the root of your workspace ?

 mvn eclipse:eclipse

 On Mar 3, 2014, at 9:18 PM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:

 Hi,
 I checked out the hadoop trunck from
 http://svn.apache.org/repos/asf/hadoop/common/trunk.

 I set up protobuf-2.5.0 and then did mvn  build.
 mvn clean install -DskipTests .. worked well. Maven build was
 Successful.

 So, I tried importing the project into eclipse.

 It is showing errors in pom.xml of hadoop-common project. Below are the
 errors .. Can some one help me here ?

 Plugin execution not covered by lifecycle configuration:
 org.apache.hadoop:hadoop-maven-plugins:
  3.0.0-SNAPSHOT:version-info (execution: version-info, phase:
 generate-resources


 The error is at line 299  of pom.xml in hadoop-common project.


  execution
 idversion-info/id
 phasegenerate-resources/phase
 goals
   goalversion-info/goal
 /goals
 configuration
   source
 directory${basedir}/src/main/directory
 includes
   includejava/**/*.java/include
   includeproto/**/*.proto/include
 /includes
   /source
 /configuration
   /execution
   execution

 There are multiple projects which failed of that error, hadoop-common is
 one such project.

 Regards,
 Nagarjuna K






decommissioning a node

2014-03-04 Thread John Lilley
Our cluster has a node that reboot randomly.  So I've gone to Ambari, 
decommissioned its HDFS service, stopped all services, and deleted the node 
from the cluster.  I expected and fsck to immediately show under-replicated 
blocks, but everything comes up fine.  How do I tell the cluster that this node 
is really gone, and it should start replicating the missing blocks?
Thanks
John




RE: decommissioning a node

2014-03-04 Thread John Lilley
OK, restarting all services now fsck shows under-replication.  Was it the 
NameNode restart?
John

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 5:47 AM
To: user@hadoop.apache.org
Subject: decommissioning a node

Our cluster has a node that reboot randomly.  So I've gone to Ambari, 
decommissioned its HDFS service, stopped all services, and deleted the node 
from the cluster.  I expected and fsck to immediately show under-replicated 
blocks, but everything comes up fine.  How do I tell the cluster that this node 
is really gone, and it should start replicating the missing blocks?
Thanks
John




Re: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto overrides final method getUnknownFields

2014-03-04 Thread Margusja

Thank you for replay, I got it work.

[hduser@vm38 ~]$ /usr/lib/hadoop-yarn/bin/yarn version
Hadoop 2.2.0.2.0.6.0-101
Subversion g...@github.com:hortonworks/hadoop.git -r 
b07b2906c36defd389c8b5bd22bebc1bead8115b

Compiled by jenkins on 2014-01-09T05:18Z
Compiled with protoc 2.5.0
From source with checksum 704f1e463ebc4fb89353011407e965
This command was run using 
/usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar

[hduser@vm38 ~]$

The main problem I think was I had yarn binary in two places and I used 
wrong one that didn't use my yarn-site.xml.
Every time I look into .staging/job.../job.xml there were values from 
sourceyarn-default.xml/source even I set them in yarn-site.xml.


Typical mess up :)

Tervitades, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
-BEGIN PUBLIC KEY-
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-END PUBLIC KEY-

On 04/03/14 05:14, Rohith Sharma K S wrote:

Hi

   The reason for  org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto 
overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet is 
hadoop is compiled with protoc-2.5.0 version, but in the classpath lower version of 
protobuf is present.

1. Check MRAppMaster classpath, which version of protobuf is in classpath. 
Expected to have 2.5.0 version.



Thanks  Regards
Rohith Sharma K S



-Original Message-
From: Margusja [mailto:mar...@roo.ee]
Sent: 03 March 2014 22:45
To: user@hadoop.apache.org
Subject: Re: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto 
overrides final method getUnknownFields

Hi

2.2.0 and 2.3.0 gave me the same container log.

A little bit more details.
I'll try to use external java client who submits job.
some lines from maven pom.xml file:
  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-client/artifactId
version2.3.0/version
  /dependency
  dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-core/artifactId
  version1.2.1/version
  /dependency

lines from external client:
...
2014-03-03 17:36:01 INFO  FileInputFormat:287 - Total input paths to process : 1
2014-03-03 17:36:02 INFO  JobSubmitter:396 - number of splits:1
2014-03-03 17:36:03 INFO  JobSubmitter:479 - Submitting tokens for job:
job_1393848686226_0018
2014-03-03 17:36:04 INFO  YarnClientImpl:166 - Submitted application
application_1393848686226_0018
2014-03-03 17:36:04 INFO  Job:1289 - The url to track the job:
http://vm38.dbweb.ee:8088/proxy/application_1393848686226_0018/
2014-03-03 17:36:04 INFO  Job:1334 - Running job: job_1393848686226_0018
2014-03-03 17:36:10 INFO  Job:1355 - Job job_1393848686226_0018 running in uber 
mode : false
2014-03-03 17:36:10 INFO  Job:1362 -  map 0% reduce 0%
2014-03-03 17:36:10 INFO  Job:1375 - Job job_1393848686226_0018 failed with 
state FAILED due to: Application application_1393848686226_0018 failed 2 times 
due to AM Container for
appattempt_1393848686226_0018_02 exited with  exitCode: 1 due to:
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
  at org.apache.hadoop.util.Shell.run(Shell.java:379)
  at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
  at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
  at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
  at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
...

Lines from namenode:
...
14/03/03 19:12:42 INFO namenode.FSEditLog: Number of transactions: 900 Total 
time for transactions(ms): 69 Number of transactions batched in
Syncs: 0 Number of syncs: 542 SyncTimes(ms): 9783
14/03/03 19:12:42 INFO BlockStateChange: BLOCK* addToInvalidates:
blk_1073742050_1226 90.190.106.33:50010
14/03/03 19:12:42 INFO hdfs.StateChange: BLOCK* allocateBlock:
/user/hduser/input/data666.noheader.data.
BP-802201089-90.190.106.33-1393506052071
blk_1073742056_1232{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]}
14/03/03 19:12:44 INFO hdfs.StateChange: BLOCK* InvalidateBlocks: ask
90.190.106.33:50010 to delete 

Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
I have a file system with some missing/corrupt blocks.  However, running hdfs 
fsck -delete also fails with errors.  How do I get around this?
Thanks
John

[hdfs@metallica yarn]$ hdfs fsck -delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
Connecting to namenode via http://anthrax.office.datalever.com:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.57.110 for path 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld at Tue Mar 04 
06:05:40 MST 2014
.
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200714

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200741

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: CORRUPT 
blockpool BP-1827033441-192.168.57.112-1384284857542 block blk_1074200778

/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld: MISSING 3 
blocks of total size 299116266 B.Status: CORRUPT
Total size:299116266 B
Total dirs:0
Total files:   1
Total symlinks:0
Total blocks (validated):  3 (avg. block size 99705422 B)
  
  CORRUPT FILES:1
  MISSING BLOCKS:   3
  MISSING SIZE: 299116266 B
  CORRUPT BLOCKS:   3
  
Minimally replicated blocks:   0 (0.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 0.0
Corrupt blocks:3
Missing replicas:  0
Number of data-nodes:  8
Number of racks:   1
FSCK ended at Tue Mar 04 06:05:40 MST 2014 in 1 milliseconds
FSCK ended at Tue Mar 04 06:05:40 MST 2014 in 1 milliseconds
fsck encountered internal errors!


Fsck on path '/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld' 
FAILED


Question on DFS Balancing

2014-03-04 Thread divye sheth
Hi,

I am new to the mailing list.

I am using Hadoop 0.20.2 with an append r1056497 version. The question I
have is related to balancing. I have a 5 datanode cluster and each node has
2 disks attached to it. The second disk was added when the first disk was
reaching its capacity.

Now the scenario that I am facing is, when the new disk was added hadoop
automatically moved over some data to the new disk. But over the time I
notice that data is no longer being written to the second disk. I have also
faced an issue on the datanode where the first disk had 100% utilization.

How can I overcome such scenario, is it not hadoop's job to balance the
disk utilization between multiple disks on single datanode?

Thanks
Divye Sheth


Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Hi,
  I am running an application on a 2-node cluster, which tries to acquire
all the containers that are available on one of those nodes and remaining
containers from the other node in the cluster. When I run this application
continuously in a loop, one of the NM or RM is getting killed at a random
point. There is no corresponding message in the log files.

One of the times that NM had got killed today, the tail of the it's log is
like this:

2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
isredeng:52867 sending out status for 16 containers
2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,


And at the time of NM's crash, the RM's log has the following entries:

2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
isredeng:52867 of type STATUS_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
Responder: responding to
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
nodeUpdate: isredeng:52867 clusterResources:
memory:16384, vCores:16
2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Node being looked for scheduling isredeng:52867
availableResource: memory:0, vCores:-8
2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


Note: the name of the node on which NM has got killed is isredeng, does it
indicate anything from the above message as to why it got killed?

Thanks,
Kishore


Meaning of messages in log and debugging

2014-03-04 Thread Yves Weissig
Hello list,

I'm currently debugging my Hadoop MR application and I have some general
questions to the messages in the log and the debugging process.

- What does Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143 mean? What does 143 stand
for?

- I also see the following exception in the log: Exception from
container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744). What does this mean? It
originates from a Diagnostics report from a container and the log4j
message level is set to INFO.

- Are there any related links which describe the life cycle of a container?

- Is there a golden rule to debug a Hadoop MR application?

- My application is very memory intense... is there any way to profile
the memory consumption of a single container?

Thanks!
Best regards
Yves



signature.asc
Description: OpenPGP digital signature


Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread John Pauley
Outside hadoop: avro-1.7.6
Inside hadoop:  avro-mapred-1.7.6-hadoop2

From: Stanley Shi s...@gopivotal.commailto:s...@gopivotal.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Monday, March 3, 2014 at 8:30 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: [hadoop] AvroMultipleOutputs 
org.apache.avro.file.DataFileWriter$AppendWriteException

which avro version are you using when running outside of hadoop?

Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]


On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
john.pau...@threattrack.commailto:john.pau...@threattrack.com wrote:
This is cross posted to avro-user list 
(http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e).

Hello all,

I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The issue 
occurs when using a schema that has a union of null and a fixed (among other 
complex types), default to null, and it is not null.  Please find the full 
stack trace below and a sample map/reduce job that generates an Avro container 
file and uses that for the m/r input.  Note that I can serialize/deserialize 
without issue using GenericDatumWriter/GenericDatumReader outside of hadoop…  
Any insight would be helpful.

Stack trace:
java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in 
union null of union in field baz of com.foo.bar.simple_schema
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
at 
org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)

Sample m/r job:
mr_job
package com.tts.ox.mapreduce.example.avro;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import 

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
More information from the NameNode log.  I don't understand... it is saying 
that I cannot delete the corrupted file until the NameNode leaves safe mode, 
but it won't leave safe mode until the file system is no longer corrupt.  How 
do I get there from here?
Thanks
john

2014-03-04 06:02:51,584 ERROR namenode.NameNode 
(NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting corrupted 
file /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node is in 
safe mode.
The reported blocks 169302 needs additional 36 blocks to reach the threshold 
1. of total blocks 169337.
Safe mode will be turned off automatically
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 6:08 AM
To: user@hadoop.apache.org
Subject: Need help: fsck FAILs, refuses to clean up corrupt fs

I have a file system with some missing/corrupt blocks.  However, running hdfs 
fsck -delete also fails with errors.  How do I get around this?
Thanks
John

[hdfs@metallica yarn]$ hdfs fsck -delete 

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
Ah... found the answer.  I had to manually leave safe mode to delete the 
corrupt files.
john

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 9:33 AM
To: user@hadoop.apache.org
Subject: RE: Need help: fsck FAILs, refuses to clean up corrupt fs

More information from the NameNode log.  I don't understand... it is saying 
that I cannot delete the corrupted file until the NameNode leaves safe mode, 
but it won't leave safe mode until the file system is no longer corrupt.  How 
do I get there from here?
Thanks
john

2014-03-04 06:02:51,584 ERROR namenode.NameNode 
(NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting corrupted 
file /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete 
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node is in 
safe mode.
The reported blocks 169302 needs additional 36 blocks to reach the threshold 
1. of total blocks 169337.
Safe mode will be turned off automatically
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)
at 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 6:08 AM
To: 

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread divye sheth
You can force namenode to leave safemode.

hadoop dfsadmin -safemode leave

Then run the hadoop fsck.

Thanks
Divye Sheth
On Mar 4, 2014 10:03 PM, John Lilley john.lil...@redpoint.net wrote:

  More information from the NameNode log.  I don't understand... it is
 saying that I cannot delete the corrupted file until the NameNode leaves
 safe mode, but it won't leave safe mode until the file system is no longer
 corrupt.  How do I get there from here?

 Thanks

 john



 2014-03-04 06:02:51,584 ERROR namenode.NameNode
 (NamenodeFsck.java:deleteCorruptedFile(446)) - Fsck: error deleting
 corrupted file
 /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld

 org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
 /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld. Name node
 is in safe mode.

 The reported blocks 169302 needs additional 36 blocks to reach the
 threshold 1. of total blocks 169337.

 Safe mode will be turned off automatically

 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1063)

 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3141)

 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3101)

 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3085)

 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.deleteCorruptedFile(NamenodeFsck.java:443)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:426)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.check(NamenodeFsck.java:289)

 at
 org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:206)

 at
 org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at
 org.apache.hadoop.hdfs.server.namenode.FsckServlet.doGet(FsckServlet.java:58)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

 at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)

 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)

 at
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)

 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

 at
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

 at
 org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)

 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

 at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)

 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

 at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)

 at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)

 at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

 at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

 at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

 at org.mortbay.jetty.Server.handle(Server.java:326)

 at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)

 at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)

 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)

 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

 at
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)

 at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



 *From:* John Lilley [mailto:john.lil...@redpoint.net]
 *Sent:* Tuesday, March 04, 2014 6:08 

Re: Hadoop Jobtracker cluster summary of heap size and OOME

2014-03-04 Thread Pabale Vikas
join the group


On Fri, Oct 11, 2013 at 10:28 PM, Viswanathan J
jayamviswanat...@gmail.comwrote:

 Hi,

 I'm running a 14 nodes Hadoop cluster with tasktrackers running in all
 nodes.

 Have set the jobtracker default memory size in hadoop-env.sh

 *HADOOP_HEAPSIZE=1024*

 Have set the mapred.child.java.opts value in mapred-site.xml as,

 property
   namemapred.child.java.opts/name
 value-Xmx2048m/value


 --
 Regards,
 Viswa.J

 --

 ---
 You received this message because you are subscribed to the Google Groups
 CDH Users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to cdh-user+unsubscr...@cloudera.org.
 For more options, visit
 https://groups.google.com/a/cloudera.org/groups/opt_out.




-- 


 Regards.
Vikas S Pabale.
+919730198004


Re: Node manager or Resource Manager crash

2014-03-04 Thread Vinod Kumar Vavilapalli
I remember you asking this question before. Check if your OS' OOM killer is 
killing it.

+Vinod

On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com 
wrote:

 Hi,
   I am running an application on a 2-node cluster, which tries to acquire all 
 the containers that are available on one of those nodes and remaining 
 containers from the other node in the cluster. When I run this application 
 continuously in a loop, one of the NM or RM is getting killed at a random 
 point. There is no corresponding message in the log files.
 
 One of the times that NM had got killed today, the tail of the it's log is 
 like this:
 
 2014-03-04 02:42:44,386 DEBUG 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
 isredeng:52867 sending out status for 16 containers
 2014-03-04 02:42:44,386 DEBUG 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's 
 health-status : true,
 
 
 And at the time of NM's crash, the RM's log has the following entries:
 
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing 
 isredeng:52867 of type STATUS_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Dispatching the event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
  NODE_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server 
 Responder: responding to 
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  nodeUpdate: isredeng:52867 clusterResources: 
 memory:16384, vCores:16
 2014-03-04 02:42:40,371 DEBUG 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Node being looked for scheduling isredeng:52867 
 availableResource: memory:0, vCores:-8
 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151
 
 
 Note: the name of the node on which NM has got killed is isredeng, does it 
 indicate anything from the above message as to why it got killed?
 
 Thanks,
 Kishore
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Not information in Job History UI

2014-03-04 Thread SF Hadoop
That explains a lot.  Thanks for the information.  I appreciate your help.


On Mon, Mar 3, 2014 at 7:47 PM, Jian He j...@hortonworks.com wrote:

  You said, there are no job logs generated on the server that is
 running the job..
 that was quoting your previous sentence and answer your question..

  If I were to run a job and I wanted to tail the job log as it was
 running, where would I find that log?
 1) set yarn.nodemanager.delete.debug-delay-sec to be a larger value, and
 look for logs in local dirs specified by yarn.nodemanager.log-dirs.
 Or
 2) enable log aggregation yarn.log-aggregation-enable. Log aggregation is
 to aggregate those NM local logs and upload them to HDFS once application
 is finished.Then you can use yarn logs command  or simply go the history UI
 to see the logs.
 You can find good explanation from
 http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

 Thanks.


 On Mon, Mar 3, 2014 at 4:29 PM, SF Hadoop sfhad...@gmail.com wrote:

 Thanks for that info Jian.

 You said, there are no job logs generated on the server that is running
 the job..  So am I correct in assuming the logs will be in the dir
 specified by yarn.nodemanager.log-dirs on the datanodes?

 I am quite confused as to where the logs for each specific part of the
 ecosystem reside.

 If I were to run a job and I wanted to tail the job log as it was
 running, where would I find that log?

 Thanks for your help.


  On Mon, Mar 3, 2014 at 11:46 AM, Jian He j...@hortonworks.com wrote:

  Note that node manager will not keep the finished applications and
 only show running apps,  so the UI won't show the finished apps.
  Conversely, job history server UI will only show the finished apps but
 not the running apps.

 bq. there are no job logs generated on the server that is running the
 job.
 by default, the local logs will be deleted after job finished.  you can
 config yarn.nodemanager.delete.debug-delay-sec, to delay the deletion
 of the logs.

 Jian


 On Mon, Mar 3, 2014 at 10:45 AM, SF Hadoop sfhad...@gmail.com wrote:

 Hadoop 2.2.0
 CentOS 6.4
 Viewing UI in various browsers.

 I am having a problem where no information is visible in my Job History
 UI.  I run test jobs, they complete without error, but no information ever
 populates the nodemanager or jobhistory server UI.

 Also, there are no job logs generated on the server that is running the
 job.

 I have the following settings configured:
 yarn.nodemanager.local-dirs
 yarn.nodemanager.log-dirs
 yarn.log.server.url

 ...plus the basic yarn log dir.  I get output in regards to the daemons
 but very little in regards to the job.  All I get that refers to the
 jobhistory server is the following (so it appears to be functioning
 properly):

 2014-02-18 11:43:06,824 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 19888
 2014-02-18 11:43:06,824 INFO org.mortbay.log: jetty-6.1.26
 2014-02-18 11:43:06,847 INFO org.mortbay.log: Extract
 jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.1.0.2.0.5.0-67.jar!/webapps/jobhistory
 to /tmp/Jetty_server_19888_jobhistoryv7gnnv/webapp
 2014-02-18 11:43:07,085 INFO org.mortbay.log: Started
 SelectChannelConnector@server:19888
 2014-02-18 11:43:07,085 INFO org.apache.hadoop.yarn.webapp.WebApps: Web
 app /jobhistory started at 19888
 2014-02-18 11:43:07,477 INFO org.apache.hadoop.yarn.webapp.WebApps:
 Registered webapp guice modules

 I have a feeling this is a misconfiguration but I cannot figure out
 what setting is missing or wrong.

 Other than not being able to see any of the jobs in the UIs, everything
 appears to be working correctly so this is quite confusing.

 Any help is appreciated.



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Meaning of messages in log and debugging

2014-03-04 Thread Zhijie Shen
bq. Container killed by the ApplicationMaster. Container killed on request.
Exit code is 143 mean? What does 143 stand for?

It's the diagnostic message generated by YARN, which indicates the
container is killed by MR's ApplicationMaster. 143 is a exit code of an
YARN container, which indicates the termination of a container.

bq. Are there any related links which describe the life cycle of a
container?

This is what I found online:
http://diggerk.wordpress.com/2013/09/19/lifecycle-of-yarn-resource-manager-containers/.
Otherwise, you can have a look at ContainerImpl.java if you want to know
the detail.

bq. My application is very memory intense... is there any way to profile the
memory consumption of a single container?

You can find the metrics info RM and NM web UI, or you
can programmatically access the RESTful APIs.

- Zhijie


On Tue, Mar 4, 2014 at 7:24 AM, Yves Weissig weis...@uni-mainz.de wrote:

 Hello list,

 I'm currently debugging my Hadoop MR application and I have some general
 questions to the messages in the log and the debugging process.

 - What does Container killed by the ApplicationMaster.
 Container killed on request. Exit code is 143 mean? What does 143 stand
 for?

 - I also see the following exception in the log: Exception from
 container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at

 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at

 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744). What does this mean? It
 originates from a Diagnostics report from a container and the log4j
 message level is set to INFO.

 - Are there any related links which describe the life cycle of a container?

 - Is there a golden rule to debug a Hadoop MR application?

 - My application is very memory intense... is there any way to profile
 the memory consumption of a single container?

 Thanks!
 Best regards
 Yves




-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Benchmarking Hive Changes

2014-03-04 Thread Anthony Mattas
I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 using 
the HDP Sandbox. 

I took one of their example queries and executed it with the tables stored as 
TEXTFILE, RCFILE, and ORC. I also tried enabling enabling vectorized execution, 
and predicate pushdown.

SELECT s07.description, s07.salary, s08.salary,
  s08.salary - s07.salary
FROM
  sample_07 s07 JOIN sample_08 s08
ON ( s07.code = s08.code)
WHERE
 s07.salary  s08.salary
SORT BY s08.salary-s07.salary DESC

Ultimately there was not much different performance in any of the executions, 
can someone clarify for me if I need an actual full cluster to see performance 
improvements, or if I’m missing something else. I thought at minimum I would 
have seen an improvement moving to ORC from TEXTFILE.

Re: Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Yes Vinod, I was asking this question sometime back, and I got back to
resolve the issue again.

I tried to see if the OOM is killing but it is not. I have checked the free
swap  space on my box while my test is going on, but it doesn't seem to be
the issue. Also, I  have verified if OOM score is going high for any of
these process because that is when OOM killer kills them, but they are not
going high too.

Thanks,
Kishore


On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 I remember you asking this question before. Check if your OS' OOM killer
 is killing it.

 +Vinod

 On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running an application on a 2-node cluster, which tries to acquire
 all the containers that are available on one of those nodes and remaining
 containers from the other node in the cluster. When I run this application
 continuously in a loop, one of the NM or RM is getting killed at a random
 point. There is no corresponding message in the log files.

 One of the times that NM had got killed today, the tail of the it's log is
 like this:

 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
 isredeng:52867 sending out status for 16 containers
 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
 health-status : true,


 And at the time of NM's crash, the RM's log has the following entries:

 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
 isredeng:52867 of type STATUS_UPDATE
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
 NODE_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
 Responder: responding to
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 nodeUpdate: isredeng:52867 clusterResources:
 memory:16384, vCores:16
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Node being looked for scheduling isredeng:52867
 availableResource: memory:0, vCores:-8
 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


 Note: the name of the node on which NM has got killed is isredeng, does it
 indicate anything from the above message as to why it got killed?

 Thanks,
 Kishore





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Question on DFS Balancing

2014-03-04 Thread Harsh J
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth divs.sh...@gmail.com wrote:
 Hi,

 I am new to the mailing list.

 I am using Hadoop 0.20.2 with an append r1056497 version. The question I
 have is related to balancing. I have a 5 datanode cluster and each node has
 2 disks attached to it. The second disk was added when the first disk was
 reaching its capacity.

 Now the scenario that I am facing is, when the new disk was added hadoop
 automatically moved over some data to the new disk. But over the time I
 notice that data is no longer being written to the second disk. I have also
 faced an issue on the datanode where the first disk had 100% utilization.

 How can I overcome such scenario, is it not hadoop's job to balance the disk
 utilization between multiple disks on single datanode?

 Thanks
 Divye Sheth



-- 
Harsh J


Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread Stanley Shi
Which version of hadoop are you using?
There's a possibility that the hadoop environment already have a avro**.jar
in place, thus caused the jar conflict.

Regards,
*Stanley Shi,*



On Tue, Mar 4, 2014 at 11:25 PM, John Pauley john.pau...@threattrack.comwrote:

  Outside hadoop: avro-1.7.6
 Inside hadoop:  avro-mapred-1.7.6-hadoop2


   From: Stanley Shi s...@gopivotal.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Monday, March 3, 2014 at 8:30 PM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: Re: [hadoop] AvroMultipleOutputs
 org.apache.avro.file.DataFileWriter$AppendWriteException

   which avro version are you using when running outside of hadoop?

  Regards,
 *Stanley Shi,*



 On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
 john.pau...@threattrack.comwrote:

   This is cross posted to avro-user list (
 http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e
 ).

   Hello all,

  I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The
 issue occurs when using a schema that has a union of null and a fixed
 (among other complex types), default to null, and it is not null.
  Please find the full stack trace below and a sample map/reduce job that
 generates an Avro container file and uses that for the m/r input.  Note
 that I can serialize/deserialize without issue using
 GenericDatumWriter/GenericDatumReader outside of hadoop…  Any insight would
 be helpful.

  Stack trace:
  java.lang.Exception:
 org.apache.avro.file.DataFileWriter$AppendWriteException:
 java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
 of union in field baz of com.foo.bar.simple_schema
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
 Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
 java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
 of union in field baz of com.foo.bar.simple_schema
 at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
 at
 org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
 at
 org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
 at
 org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
 at
 org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
 at
 com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
 at
 com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema
 in union null of union in field baz of com.foo.bar.simple_schema
 at
 org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
 at
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
 at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
 ... 16 more
 Caused by: java.lang.NullPointerException
 at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
 at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
 at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
 at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
 at
 org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
 at
 org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
 at
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
 at
 org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
 at
 org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
 at
 org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
 at
 org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
 at
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
 at
 org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)

  Sample m/r job:
 mr_job
  package com.tts.ox.mapreduce.example.avro;

  import 

Re: Question on DFS Balancing

2014-03-04 Thread divye sheth
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J ha...@cloudera.com wrote:

 You're probably looking for
 https://issues.apache.org/jira/browse/HDFS-1804

 On Tue, Mar 4, 2014 at 5:54 AM, divye sheth divs.sh...@gmail.com wrote:
  Hi,
 
  I am new to the mailing list.
 
  I am using Hadoop 0.20.2 with an append r1056497 version. The question I
  have is related to balancing. I have a 5 datanode cluster and each node
 has
  2 disks attached to it. The second disk was added when the first disk was
  reaching its capacity.
 
  Now the scenario that I am facing is, when the new disk was added hadoop
  automatically moved over some data to the new disk. But over the time I
  notice that data is no longer being written to the second disk. I have
 also
  faced an issue on the datanode where the first disk had 100% utilization.
 
  How can I overcome such scenario, is it not hadoop's job to balance the
 disk
  utilization between multiple disks on single datanode?
 
  Thanks
  Divye Sheth



 --
 Harsh J



Re: Question on DFS Balancing

2014-03-04 Thread Azuryy Yu
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth divs.sh...@gmail.com wrote:

 Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
 0.20.2 (we are in a process of upgrading) is there a workaround for the
 short term to balance the disk utilization? The patch in the Jira, if
 applied to the version that I am using, will it break anything?

 Thanks
 Divye Sheth


 On Wed, Mar 5, 2014 at 11:28 AM, Harsh J ha...@cloudera.com wrote:

 You're probably looking for
 https://issues.apache.org/jira/browse/HDFS-1804

 On Tue, Mar 4, 2014 at 5:54 AM, divye sheth divs.sh...@gmail.com wrote:
  Hi,
 
  I am new to the mailing list.
 
  I am using Hadoop 0.20.2 with an append r1056497 version. The question I
  have is related to balancing. I have a 5 datanode cluster and each node
 has
  2 disks attached to it. The second disk was added when the first disk
 was
  reaching its capacity.
 
  Now the scenario that I am facing is, when the new disk was added hadoop
  automatically moved over some data to the new disk. But over the time I
  notice that data is no longer being written to the second disk. I have
 also
  faced an issue on the datanode where the first disk had 100%
 utilization.
 
  How can I overcome such scenario, is it not hadoop's job to balance the
 disk
  utilization between multiple disks on single datanode?
 
  Thanks
  Divye Sheth



 --
 Harsh J