date:20140306

hi,

i have have hadoop v2.3.0 installed on CentOS 6.5 64-bit. OpenJDK 64-bit
v1.7 is my java version.

when i attempt to start hadoop, i keep seeing this message below.

OpenJDK 64-Bit Server VM warning: You have loaded library
/usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
libfile', or link it with '-z noexecstack'.


i followed the instruction. here were my steps.
1. sudo yum install -y prelink
2. execstack -c /usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0

however, this message still keeps popping up. i did some more search on the
internet, and one user says that basically, libhadoop.so.1.0.0 is 32-bit,
and to get rid of this message, i will need to recompile this into 64-bit.

is that correct? is there not a 64-bit version of libhadoop.so.1.0.0
available for download?

thanks,

are the job and task tracker monitor webpages gone now in hadoop v2.3.0

i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big
leap). i was wondering if there is a way to view my jobs now via a web UI?
i used to be able to do this by accessing the following URL

http://hadoop-cluster:50030/jobtracker.jsp

however, there is no more job tracker monitoring page here.

furthermore, i am confused about MapReduce as an application running on top
of YARN. so the documentation says MapReduce is just an application running
on YARN. if that is true, how come i do not see MapReduce as an application
on the ResourceManager web UI?

http://hadoop-cluster:8088/cluster/apps

is this because MapReduce is NOT a long-running app? meaning, a MapReduce
job will only show up as an app in YARN when it is running? (please bear
with me, i'm still adjusting to this new design).

any help/pointer is appreciated.

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Vinod Kumar Vavilapalli


Yes. JobTracker and TaskTracker are gone from all the 2.x release lines.

MapReduce is an application on top of YARN. That is per job - launches, starts 
and finishes after it is done with its work. Once it is done, you can go look 
at it in the MapReduce specific JobHistoryServer.

+Vinod

On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote:

 i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big
 leap). i was wondering if there is a way to view my jobs now via a web UI?
 i used to be able to do this by accessing the following URL
 
 http://hadoop-cluster:50030/jobtracker.jsp
 
 however, there is no more job tracker monitoring page here.
 
 furthermore, i am confused about MapReduce as an application running on top
 of YARN. so the documentation says MapReduce is just an application running
 on YARN. if that is true, how come i do not see MapReduce as an application
 on the ResourceManager web UI?
 
 http://hadoop-cluster:8088/cluster/apps
 
 is this because MapReduce is NOT a long-running app? meaning, a MapReduce
 job will only show up as an app in YARN when it is running? (please bear
 with me, i'm still adjusting to this new design).
 
 any help/pointer is appreciated.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

when i go to the job history server

http://hadoop-cluster:19888/jobhistory

i see no map reduce job there. i ran 3 simple mr jobs successfully. i
verified by the console output and hdfs output directory.

all i see on the UI is: No data available in table.

any ideas?

unless there is a JobHistoryServer just for MapReduce.. where is that?


On Thu, Mar 6, 2014 at 7:14 PM, Vinod Kumar Vavilapalli
vino...@apache.orgwrote:


 Yes. JobTracker and TaskTracker are gone from all the 2.x release lines.

 MapReduce is an application on top of YARN. That is per job - launches,
 starts and finishes after it is done with its work. Once it is done, you
 can go look at it in the MapReduce specific JobHistoryServer.

 +Vinod

 On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote:

  i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big
  leap). i was wondering if there is a way to view my jobs now via a web
 UI?
  i used to be able to do this by accessing the following URL
 
  http://hadoop-cluster:50030/jobtracker.jsp
 
  however, there is no more job tracker monitoring page here.
 
  furthermore, i am confused about MapReduce as an application running on
 top
  of YARN. so the documentation says MapReduce is just an application
 running
  on YARN. if that is true, how come i do not see MapReduce as an
 application
  on the ResourceManager web UI?
 
  http://hadoop-cluster:8088/cluster/apps
 
  is this because MapReduce is NOT a long-running app? meaning, a MapReduce
  job will only show up as an app in YARN when it is running? (please bear
  with me, i'm still adjusting to this new design).
 
  any help/pointer is appreciated.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

ok, the reason why hadoop jobs were not showing up was because i did not
enable mapreduce to be run as a yarn application.


On Thu, Mar 6, 2014 at 11:45 PM, Jane Wayne jane.wayne2...@gmail.comwrote:

 when i go to the job history server

 http://hadoop-cluster:19888/jobhistory

 i see no map reduce job there. i ran 3 simple mr jobs successfully. i
 verified by the console output and hdfs output directory.

 all i see on the UI is: No data available in table.

 any ideas?

 unless there is a JobHistoryServer just for MapReduce.. where is that?


 On Thu, Mar 6, 2014 at 7:14 PM, Vinod Kumar Vavilapalli 
 vino...@apache.org wrote:


 Yes. JobTracker and TaskTracker are gone from all the 2.x release lines.

 MapReduce is an application on top of YARN. That is per job - launches,
 starts and finishes after it is done with its work. Once it is done, you
 can go look at it in the MapReduce specific JobHistoryServer.

 +Vinod

 On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote:

  i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big
  leap). i was wondering if there is a way to view my jobs now via a web
 UI?
  i used to be able to do this by accessing the following URL
 
  http://hadoop-cluster:50030/jobtracker.jsp
 
  however, there is no more job tracker monitoring page here.
 
  furthermore, i am confused about MapReduce as an application running on
 top
  of YARN. so the documentation says MapReduce is just an application
 running
  on YARN. if that is true, how come i do not see MapReduce as an
 application
  on the ResourceManager web UI?
 
  http://hadoop-cluster:8088/cluster/apps
 
  is this because MapReduce is NOT a long-running app? meaning, a
 MapReduce
  job will only show up as an app in YARN when it is running? (please bear
  with me, i'm still adjusting to this new design).
 
  any help/pointer is appreciated.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.

hdfs permission is still being checked after being disabled

i am using hadoop v2.3.0.

in my hdfs-site.xml, i have the following property set.

 property
  namedfs.permissions.enabled/name
  valuefalse/value
 /property

however, when i try to run a hadoop job, i see the following
AccessControlException.

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=hadoopuser, access=EXECUTE,
inode=/tmp:root:supergroup:drwxrwx---

to me, it seems that i have already disabled permission checking, so i
shouldn't get that AccessControlException.

any ideas?

MapReduce: How to output multiplt Avro files?

our input is a line of text which may be parsed to e.g. A or B object.
We want all A objects written to A.avro files, while all B objects
written to B.avro.

I looked into AvroMultipleOutputs class:
http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html
There is an example, however, it's not quite clear.
For job submission, it uses AvroMultipleOutputs.addNamedOutput to add
schemas for A and B.
In my program looks like:
AvroMultipleOutputs.addNamedOutput(job, A,
AvroKeyOutputFormat.class, aSchema, null);
AvroMultipleOutputs.addNamedOutput(job, B,
AvroKeyOutputFormat.class, bSchema, null);
I believe this is for Reducer output files.

*My question is* what the Mapper output should be, in specific what
job.setMapOutputValueClass should be,
since the Mapper output could be A or B object, with schema aSchema or
bSchema.

In my progam, I simply set it to GenericData, but get error as below:

14/03/06 15:55:34 INFO mapreduce.Job: Task Id :
attempt_1393817780522_0012_m_10_2, Status : FAILED
Error: java.lang.NullPointerException
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

I have no idea what this means.

Re: Impact of Tez/Spark to MapReduce

2014-03-06 Thread Emil A. Siemes

I think it is necessary to look at the question from multiple angles:

First there is MapReduce as computing paradigm.
Second there is the MapReduce API.
And third you have an implementation.

My believe is that the computing paradigm is not going away anytime soon. It's 
a fundamental approach for distributed computing. Not the only one though.
The API should also be quite stable so our applications will continue to work. 
I think it's also a save bet that there will be more high level apis making 
developers more productive but will call MapReduce internally.
And then there is the implementation which will indeed call/use Tez to execute 
the map and reduce tasks rather sooner than later. This is transparent for the 
developer just his app just execute faster.

Functional programming will certainly play an important role but I doubt it 
will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE 
or Spring over the last 10 years.

And that's great isn't it? Java/JVM has always been about developer freedom: 
Platform, language, APIs, frameworks, implementations, You pick what makes 
you most productive. 

Just my few cents...
Emil



Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas anth...@mattas.net:

 Unfortunately I’m not super familiar with Spark - I guess my curiosity stems 
 from a deep seated belief that big iron EDW type appliances are slowly going 
 to fade out, so I’m trying to really get my head around what that’s going to 
 look like in the next few years.
 
 Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not 
 sure if the more open Hive solution will be preferred longer term.  Does 
 Map-Reduce still exist at that time, or does it slowly fade away (I would 
 assume its still around because there are a lot of unique things you can do 
 with MR today that isn’t easily accomplished in other frameworks).
 
 On Mar 5, 2014, at 8:48 PM, Jeff Zhang jezh...@gopivotal.com wrote:
 
 I believe in the future the spark functional style api will dominate the big 
 data world. Very few people will use the native mapreduce API. Even now 
 usually users use third-party mapreduce library such as cascading, scalding, 
 scoobi or script language hive, pig rather than the native mapreduce api.  
 And this functional style of api compatible both with hadoop's mapreduce and 
 spark's RDD. The underlying execution engine will be transparent to users. 
 So I guess or I hope in the future, the api will be unified  while the 
 underlying execution engine will been choose intelligently according the 
 resources you have and the metadata of the data you operate on. 
 
 
 On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 The thing about yarn is you chose what is right for the the workload. 
 
 For example: Spark may not the right choice if for example join tables do 
 not fit in memory.
 
 
 On Wednesday, March 5, 2014, Anthony Mattas anth...@mattas.net wrote:
  With Tez and Spark becoming mainstream what does Map Reduce look like 
  longer term? Will it become a component that sits on top of Tez, or will 
  they continue to live side by side utilizing YARN?
  I'm struggling a little bit to understand what the roadmap looks like for 
  the technologies that sit on top of YARN.
 
  Anthony Mattas
  anth...@mattas.net
 
 -- 
 Sorry this was sent from mobile. Will do less grammar and spell check than 
 usual.
 
 

Emil Andreas Siemes
Sr. Solution Engineer
Hortonworks Inc.
esie...@hortonworks.com
+49 176 72590764


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: MapReduce: How to output multiplt Avro files?

add avro user mail-list

2014-03-06 16:09 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

 our input is a line of text which may be parsed to e.g. A or B object.
 We want all A objects written to A.avro files, while all B objects
 written to B.avro.

 I looked into AvroMultipleOutputs class:
 http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html
 There is an example, however, it's not quite clear.
 For job submission, it uses AvroMultipleOutputs.addNamedOutput to add
 schemas for A and B.
 In my program looks like:
 AvroMultipleOutputs.addNamedOutput(job, A,
 AvroKeyOutputFormat.class, aSchema, null);
 AvroMultipleOutputs.addNamedOutput(job, B,
 AvroKeyOutputFormat.class, bSchema, null);
 I believe this is for Reducer output files.

 *My question is* what the Mapper output should be, in specific what
 job.setMapOutputValueClass should be,
 since the Mapper output could be A or B object, with schema aSchema or
 bSchema.

 In my progam, I simply set it to GenericData, but get error as below:

 14/03/06 15:55:34 INFO mapreduce.Job: Task Id :
 attempt_1393817780522_0012_m_10_2, Status : FAILED
 Error: java.lang.NullPointerException
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)
 at
 org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)
 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

 I have no idea what this means.

Re:

2014-03-06 Thread Stanley Shi

Maybe your console and browser are using different settings, would you
please try wget
http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom
?

Regards,
*Stanley Shi,*



On Wed, Mar 5, 2014 at 6:59 PM, Avinash Kujur avin...@gmail.com wrote:

 yes ming.


 On Wed, Mar 5, 2014 at 2:56 AM, Mingjiang Shi m...@gopivotal.com wrote:

 Can you access this link?

 http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom


 On Wed, Mar 5, 2014 at 6:54 PM, Avinash Kujur avin...@gmail.com wrote:

 if i follow repo.maven.apache.org link on my url, it is showing this
 message :

 Browsing for this directory has been disabled.

 View http://search.maven.org/#browse this directory's contents on
 http://search.maven.org http://search.maven.org/#browse instead.

 so how can i change the link from repo.maven.apache .org to
 http://search.maven.org ?


 On Wed, Mar 5, 2014 at 2:49 AM, Avinash Kujur avin...@gmail.com wrote:

 yes. it has internet access.


 On Wed, Mar 5, 2014 at 2:47 AM, Mingjiang Shi m...@gopivotal.comwrote:

 see the error message:

 Unknown host repo.maven.apache.org - [Help 2]


 Does your machine has internet access?


 On Wed, Mar 5, 2014 at 6:42 PM, Avinash Kujur avin...@gmail.comwrote:

 home/cloudera/ contains hadoop files.


 On Wed, Mar 5, 2014 at 2:40 AM, Avinash Kujur avin...@gmail.comwrote:

 [cloudera@localhost hadoop-common-trunk]$ mvn clean install
 -DskipTests -Pdist
 [INFO] Scanning for projects...
 Downloading:
 http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom
 [ERROR] The build could not read 1 project - [Help 1]
 [ERROR]
 [ERROR]   The project org.apache.hadoop:hadoop-main:3.0.0-SNAPSHOT
 (/home/cloudera/hadoop-common-trunk/pom.xml) has 1 error
 [ERROR] Unresolveable build extension: Plugin
 org.apache.felix:maven-bundle-plugin:2.4.0 or one of its dependencies 
 could
 not be resolved: Failed to read artifact descriptor for
 org.apache.felix:maven-bundle-plugin:jar:2.4.0: Could not transfer 
 artifact
 org.apache.felix:maven-bundle-plugin:pom:2.4.0 from/to central (
 http://repo.maven.apache.org/maven2): repo.maven.apache.org:
 Unknown host repo.maven.apache.org - [Help 2]

 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven with
 the -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug
 logging.
 [ERROR]
 [ERROR] For more information about the errors and possible
 solutions, please read the following articles:
  [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
 [ERROR] [Help 2]
 http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException


 when i execute this command from hadoop directory this is the error
 i am getting.


 On Wed, Mar 5, 2014 at 2:33 AM, Mingjiang Shi m...@gopivotal.comwrote:

 Did you execute the command from /home/cloudera? Does it contains
 the hadoop source code? You need to execute the command from the source
 code directory.


 On Wed, Mar 5, 2014 at 6:28 PM, Avinash Kujur avin...@gmail.comwrote:

 when i am using this command
 mvn clean install -DskipTests -Pdist

 its giving this error:

 [cloudera@localhost ~]$ mvn clean install -DskipTests -Pdist
 [INFO] Scanning for projects...
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 0.170 s
 [INFO] Finished at: 2014-03-05T02:25:52-08:00
 [INFO] Final Memory: 2M/43M
 [INFO]
 
 [WARNING] The requested profile dist could not be activated
 because it does not exist.
 [ERROR] The goal you specified requires a project to execute but
 there is no POM in this directory (/home/cloudera). Please verify you
 invoked Maven from the correct directory. - [Help 1]
 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven
 with the -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug
 logging.
 [ERROR]
 [ERROR] For more information about the errors and possible
 solutions, please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException



 help me out.
 Thanks in advance. :)




 --
 Cheers
 -MJ






 --
 Cheers
 -MJ






 --
 Cheers
 -MJ

[no subject]

2014-03-06 Thread Avinash Kujur

while impoting jar files using..
mvn clean install -DskipTests -Pdist

 i am getting this error,


[ERROR] The goal you specified requires a project to execute but there is
no POM in this directory (/home/cloudera). Please verify you invoked Maven
from the correct directory. - [Help 1]

help me out

Re:

2014-03-06 Thread Nitin Pawar

please start writing subject lines for your emails

also look at the error message
[ERROR] The goal you specified requires a project to execute but there is
no POM in this directory (/home/cloudera)

do ls -l pom.xml  inside /home/cloudera directory

change directory to where your codebase is and then run the command again
after making sure there is pom.xml present in that directory



On Thu, Mar 6, 2014 at 3:17 PM, Avinash Kujur avin...@gmail.com wrote:

 while impoting jar files using..
 mvn clean install -DskipTests -Pdist

  i am getting this error,


 [ERROR] The goal you specified requires a project to execute but there is
 no POM in this directory (/home/cloudera). Please verify you invoked Maven
 from the correct directory. - [Help 1]

 help me out




-- 
Nitin Pawar

MR2 Job over LZO data

2014-03-06 Thread KingDavies

Running on Hadoop 2.2.0

The Java MR2 job works as expected on an uncompressed data source using the
TextInputFormat.class.
But when using the LZO format the job fails:
import com.hadoop.mapreduce.LzoTextInputFormat;
job.setInputFormatClass(LzoTextInputFormat.class);

Dependencies from the maven repository:
http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
Also tried with elephant-bird-core 4.4

The same data can be queried fine from within Hive(0.12) on the same
cluster.


The exception:
Exception in thread main java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at
com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
at
com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.cloudreach.DataQuality.Main.main(Main.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

I believe the issue is related to the changes in Hadoop 2, but where can I
find a H2 compatible version?

Thanks

Warning in secondary namenode log

2014-03-06 Thread Vimal Jain

Hi,
I am setting up 2 node hadoop cluster ( 1.2.1)
After formatting the FS and starting namenode,datanode and
secondarynamenode , i am getting below warning in SecondaryNameNode logs.

*WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint
Period   :3600 secs (60 min)*

Please help to debug this.
-- 
Thanks and Regards,
Vimal Jain

Re: Warning in secondary namenode log

2014-03-06 Thread Nitin Pawar

you can ignore this on 2 node cluster.

This value means time it waits between two periodic checkpoints on
secondary namenode.


On Thu, Mar 6, 2014 at 4:10 PM, Vimal Jain vkj...@gmail.com wrote:

 Hi,
 I am setting up 2 node hadoop cluster ( 1.2.1)
 After formatting the FS and starting namenode,datanode and
 secondarynamenode , i am getting below warning in SecondaryNameNode logs.

 *WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint
 Period   :3600 secs (60 min)*

 Please help to debug this.
 --
 Thanks and Regards,
 Vimal Jain




-- 
Nitin Pawar

Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan

Hi
I have downloaded hadoop-2.3.0-src and followed the guide from 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

The first command mvn clean install -DskipTests was successful. However wen I 
run

   

   cd hadoop-mapreduce-project 

   mvn clean install assembly:assembly -Pnative


At some point I get an error. Searching the error message on the web shows some 
QA on the mailing list archive but seems that they are related to developers. 
Please see the full messages below (sorry for long post).

Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec  
FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
  Time elapsed: 2.978 sec   FAILURE!
java.lang.AssertionError: null
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertNotNull(Assert.java:526)
    at org.junit.Assert.assertNotNull(Assert.java:537)
    at 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314)

testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)  
Time elapsed: 0.255 sec   FAILURE!
java.lang.AssertionError: null
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertNotNull(Assert.java:526)
    at org.junit.Assert.assertNotNull(Assert.java:537)
    at 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263)

Running org.apache.hadoop.mapreduce.v2.app.TestRecovery
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - 
in org.apache.hadoop.mapreduce.v2.app.TestRecovery
Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in 
org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Running org.apache.hadoop.mapreduce.v2.app.TestFail
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestFail
Running org.apache.hadoop.mapreduce.v2.app.TestMRApp
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - 
in org.apache.hadoop.mapreduce.v2.app.TestMRApp
Running org.apache.hadoop.mapreduce.v2.app.TestKill
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestKill
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - 
in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec  
FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
  Time elapsed: 5.28 sec   FAILURE!
java.lang.AssertionError: expected:SETUP but was:RUNNING
    at org.junit.Assert.fail(Assert.java:93)
    at org.junit.Assert.failNotEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:128)
    at org.junit.Assert.assertEquals(Assert.java:147)
    at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:816)
    at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:499)

Running 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.051 sec - in 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.112 sec - in 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider
Running

Fetching configuration values from cluster

2014-03-06 Thread John Lilley

How would I go about fetching configuration values (e.g. yarn-site.xml) from 
the cluster via the API from an application not running on a cluster node?
Thanks
John

HDFS java client vs the Command Line

2014-03-06 Thread Geoffry Roberts

All,

I'm running the 2.3.0 distribution as a single node on OSX 10.7.  I want to
create a directory.  From the command line it works; from java it doesn't.
 I have Googled and read bits and pieces that this is an issue with the OSX
feature of case insensitivity with its file system.  Can anyone confirm
this?  If so, can anyone advise as to a workaround?

Such a simple thing to get hung up on, go figure.

Thanks

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Fw: Hadoop at ApacheCon Denver

2014-03-06 Thread Oleg Zhurakousky

Wow. . . blast from the past ;)!!
How the hell are you?

Cheers
Oleg


On Wed, Mar 5, 2014 at 10:18 AM, Melissa Warnkin missywarn...@yahoo.comwrote:

 Hello Hadoop enthusiasts,

  As you are no doubt aware, ApacheCon North America will be held in
 Denver, Colorado starting on April 7th.  Hadoop has 25 talks and two
 tutorials!! Check it out here:
 http://apacheconnorthamerica2014.sched.org/?s=hadoop.

  We would love to see you in Denver next month.  Register soon, as prices
 go up on March 14th. http://na.apachecon.com/. http://na.apachecon.com/

 Best regards,

 Melissa
 ApacheCon Planning Team

Partitions in Hive

2014-03-06 Thread nagarjuna kanamarlapudi

Hi,

I have a table with 3 columns in hive.

I want that table to be partitioned based on first letter of column 1.
How do we define such partition condition in hive ?

Regards,
Nagarjuna K

Re: HDFS java client vs the Command Line

I've never faced an issue trying to run hadoop and related programs on
my OSX. What is your error exactly?

Have you ensured your Java classpath carries the configuration
directory on it as well, if you aren't running the program via hadoop
jar ... but via java -cp ... instead.

On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com wrote:
 All,

 I'm running the 2.3.0 distribution as a single node on OSX 10.7.  I want to
 create a directory.  From the command line it works; from java it doesn't.
 I have Googled and read bits and pieces that this is an issue with the OSX
 feature of case insensitivity with its file system.  Can anyone confirm
 this?  If so, can anyone advise as to a workaround?

 Such a simple thing to get hung up on, go figure.

 Thanks

 --
 There are ways and there are ways,

 Geoffry Roberts



-- 
Harsh J

Re: Partitions in Hive

2014-03-06 Thread Nitin Pawar

partition in hive is done on the column value and not on the sub portion of
column value.

If you want to separate data based on the first character then create
another column to store that value




On Thu, Mar 6, 2014 at 11:42 PM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com wrote:

 Hi,

 I have a table with 3 columns in hive.

 I want that table to be partitioned based on first letter of column 1.
 How do we define such partition condition in hive ?

 Regards,
 Nagarjuna K




-- 
Nitin Pawar

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan

Stuck at this step.
Hope to receive any idea...


 
Regards,
Mahmood



On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan nt_mahm...@yahoo.com 
wrote:
 
Hi
I have downloaded hadoop-2.3.0-src and followed the guide from 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

The first command mvn clean install -DskipTests was successful. However wen I 
run

   

   cd hadoop-mapreduce-project 

   mvn clean install assembly:assembly -Pnative


At some point I get an error. Searching the error message on the web shows some 
QA on the mailing list archive but seems that they are related to developers. 
Please see the full messages below (sorry for long post).

Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Tests run: 7, Failures: 0, Errors: 0,
 Skipped: 0, Time elapsed: 27.59 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec  
FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
  Time elapsed: 2.978 sec   FAILURE!
java.lang.AssertionError: null
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertNotNull(Assert.java:526)
    at org.junit.Assert.assertNotNull(Assert.java:537)
    at
 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314)

testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)  
Time elapsed: 0.255 sec   FAILURE!
java.lang.AssertionError: null
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertNotNull(Assert.java:526)
    at org.junit.Assert.assertNotNull(Assert.java:537)
    at 
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263)

Running org.apache.hadoop.mapreduce.v2.app.TestRecovery
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - 
in org.apache.hadoop.mapreduce.v2.app.TestRecovery
Running
 org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in 
org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Running org.apache.hadoop.mapreduce.v2.app.TestFail
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in
 org.apache.hadoop.mapreduce.v2.app.TestFail
Running org.apache.hadoop.mapreduce.v2.app.TestMRApp
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - 
in org.apache.hadoop.mapreduce.v2.app.TestMRApp
Running org.apache.hadoop.mapreduce.v2.app.TestKill
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in 
org.apache.hadoop.mapreduce.v2.app.TestKill
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - 
in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec  
FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
  Time elapsed: 5.28
 sec   FAILURE!
java.lang.AssertionError: expected:SETUP but was:RUNNING
    at org.junit.Assert.fail(Assert.java:93)
    at org.junit.Assert.failNotEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:128)
    at org.junit.Assert.assertEquals(Assert.java:147)
    at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:816)
    at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:499)

Running 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.051 sec - in 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider
Tests
 run: 1, Failures: 0, Errors: 0,

Running a Job in a Local Job Runner:Windows 7 64-bit

2014-03-06 Thread Radhe Radhe

Hi All,

I'm trying to get some hands-on on the Map Reduce programming.

I downloaded the code examples from Hadoop-The definitive guide, 3rd edition 
and build it using Maven:
mvn package -DskipTests -Dhadoop.distro=apache-2

Next I imported the maven projects into Eclipse. Using Eclipse now I can 
develop my own Map Reduce jobs.
Bu how do I test\run the job locally using the Local Job Runner?

The book excerpt says:
Now we can run this application against some local files. Hadoop comes with a 
local
job runner, a cut-down version of the MapReduce execution engine for running 
Map-
Reduce jobs in a single JVM. It’s designed for testing, and is very convenient 
for use in
an IDE, since you can run it in a debugger to step through the code in your 
mapper and
reducer.

Do I also need to install Hadoop locally on Windows for that? 

Thanks,
-RR

Re: HDFS java client vs the Command Line

2014-03-06 Thread Geoffry Roberts

Thanks for the response.  I figured out what was wrong.

I was doing this:

Configuration conf = new Configuration();

conf.addResource(new Path(F.CFG_PATH + /core-site.xml));

conf.addResource(new Path(F.CFG_PATH + /hdfs-site.xml));

conf.addResource(new Path(F.CFG_PATH + /mapred-site.xml));


F.CFG_PATH was close but not correct.  Fixed it and all is well.


Thanks



On Thu, Mar 6, 2014 at 1:05 PM, Harsh J ha...@cloudera.com wrote:

 I've never faced an issue trying to run hadoop and related programs on
 my OSX. What is your error exactly?

 Have you ensured your Java classpath carries the configuration
 directory on it as well, if you aren't running the program via hadoop
 jar ... but via java -cp ... instead.

 On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com
 wrote:
  All,
 
  I'm running the 2.3.0 distribution as a single node on OSX 10.7.  I want
 to
  create a directory.  From the command line it works; from java it
 doesn't.
  I have Googled and read bits and pieces that this is an issue with the
 OSX
  feature of case insensitivity with its file system.  Can anyone confirm
  this?  If so, can anyone advise as to a workaround?
 
  Such a simple thing to get hung up on, go figure.
 
  Thanks
 
  --
  There are ways and there are ways,
 
  Geoffry Roberts



 --
 Harsh J




-- 
There are ways and there are ways,

Geoffry Roberts

Re: HDFS java client vs the Command Line

You could avoid all that code by simply placing the configuration
directory on the classpath - it will auto-load necessary properties.

On Thu, Mar 6, 2014 at 11:36 AM, Geoffry Roberts threadedb...@gmail.com wrote:
 Thanks for the response.  I figured out what was wrong.

 I was doing this:

 Configuration conf = new Configuration();

 conf.addResource(new Path(F.CFG_PATH + /core-site.xml));

 conf.addResource(new Path(F.CFG_PATH + /hdfs-site.xml));

 conf.addResource(new Path(F.CFG_PATH + /mapred-site.xml));


 F.CFG_PATH was close but not correct.  Fixed it and all is well.


 Thanks




 On Thu, Mar 6, 2014 at 1:05 PM, Harsh J ha...@cloudera.com wrote:

 I've never faced an issue trying to run hadoop and related programs on
 my OSX. What is your error exactly?

 Have you ensured your Java classpath carries the configuration
 directory on it as well, if you aren't running the program via hadoop
 jar ... but via java -cp ... instead.

 On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com
 wrote:
  All,
 
  I'm running the 2.3.0 distribution as a single node on OSX 10.7.  I want
  to
  create a directory.  From the command line it works; from java it
  doesn't.
  I have Googled and read bits and pieces that this is an issue with the
  OSX
  feature of case insensitivity with its file system.  Can anyone
  confirm
  this?  If so, can anyone advise as to a workaround?
 
  Such a simple thing to get hung up on, go figure.
 
  Thanks
 
  --
  There are ways and there are ways,
 
  Geoffry Roberts



 --
 Harsh J




 --
 There are ways and there are ways,

 Geoffry Roberts



-- 
Harsh J

Re: MapReduce: How to output multiplt Avro files?

If you have a reducer involved, you'll likely need a common map output
data type that both A and B can fit into.

On Thu, Mar 6, 2014 at 12:09 AM, Fengyun RAO raofeng...@gmail.com wrote:
 our input is a line of text which may be parsed to e.g. A or B object.
 We want all A objects written to A.avro files, while all B objects written
 to B.avro.

 I looked into AvroMultipleOutputs class:
 http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html
 There is an example, however, it's not quite clear.
 For job submission, it uses AvroMultipleOutputs.addNamedOutput to add
 schemas for A and B.
 In my program looks like:
 AvroMultipleOutputs.addNamedOutput(job, A,
 AvroKeyOutputFormat.class, aSchema, null);
 AvroMultipleOutputs.addNamedOutput(job, B,
 AvroKeyOutputFormat.class, bSchema, null);
 I believe this is for Reducer output files.

 My question is what the Mapper output should be, in specific what
 job.setMapOutputValueClass should be,
 since the Mapper output could be A or B object, with schema aSchema or
 bSchema.

 In my progam, I simply set it to GenericData, but get error as below:

 14/03/06 15:55:34 INFO mapreduce.Job: Task Id :
 attempt_1393817780522_0012_m_10_2, Status : FAILED
 Error: java.lang.NullPointerException
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)
 at
 org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)
 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

 I have no idea what this means.



-- 
Harsh J

Re: MapReduce: How to output multiplt Avro files?

thanks, Harsh.

any idea on how to build a common map output data type?
The only way I can think of is toString(), which would be very
inefficient,
since A and B are big objects and may change with time, which is also the
reason we want to use Avro serialization.


2014-03-07 9:55 GMT+08:00 Harsh J ha...@cloudera.com:

 If you have a reducer involved, you'll likely need a common map output
 data type that both A and B can fit into.

 On Thu, Mar 6, 2014 at 12:09 AM, Fengyun RAO raofeng...@gmail.com wrote:
  our input is a line of text which may be parsed to e.g. A or B object.
  We want all A objects written to A.avro files, while all B objects
 written
  to B.avro.
 
  I looked into AvroMultipleOutputs class:
 
 http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html
  There is an example, however, it's not quite clear.
  For job submission, it uses AvroMultipleOutputs.addNamedOutput to add
  schemas for A and B.
  In my program looks like:
  AvroMultipleOutputs.addNamedOutput(job, A,
  AvroKeyOutputFormat.class, aSchema, null);
  AvroMultipleOutputs.addNamedOutput(job, B,
  AvroKeyOutputFormat.class, bSchema, null);
  I believe this is for Reducer output files.
 
  My question is what the Mapper output should be, in specific what
  job.setMapOutputValueClass should be,
  since the Mapper output could be A or B object, with schema aSchema or
  bSchema.
 
  In my progam, I simply set it to GenericData, but get error as below:
 
  14/03/06 15:55:34 INFO mapreduce.Job: Task Id :
  attempt_1393817780522_0012_m_10_2, Status : FAILED
  Error: java.lang.NullPointerException
  at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)
  at
  org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
  at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)
  at
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674)
  at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
 
  I have no idea what this means.



 --
 Harsh J

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Akira AJISAKA


Hi Mahmood,

 I have downloaded hadoop-2.3.0-src and followed the guide from
 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html


The documentation is still old, and you don't need to compile
the source code to build a cluster.

I built the latest document and uploaded to github.io. Please follow the 
following page.


http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/SingleCluster.html

Thanks,
Akira

(2014/03/06 11:08), Mahmood Naderan wrote:

Stuck at this step.
Hope to receive any idea...

Regards,
Mahmood


On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan
nt_mahm...@yahoo.com wrote:
Hi
I have downloaded hadoop-2.3.0-src and followed the guide from
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

The first command mvn clean install -DskipTests was successful.
However wen I run

cd hadoop-mapreduce-project
mvn clean install assembly:assembly -Pnative

At some point I get an error. Searching the error message on the web
shows some QA on the mailing list archive but seems that they are
related to developers. Please see the full messages below (sorry for
long post).

Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59
sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003
sec  FAILURE! - in
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
Time elapsed: 2.978 sec   FAILURE!
java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:92)
 at org.junit.Assert.assertTrue(Assert.java:43)
 at org.junit.Assert.assertNotNull(Assert.java:526)
 at org.junit.Assert.assertNotNull(Assert.java:537)
 at
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314)

testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
Time elapsed: 0.255 sec   FAILURE!
java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:92)
 at org.junit.Assert.assertTrue(Assert.java:43)
 at org.junit.Assert.assertNotNull(Assert.java:526)
 at org.junit.Assert.assertNotNull(Assert.java:537)
 at
org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263)

Running org.apache.hadoop.mapreduce.v2.app.TestRecovery
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458
sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery
Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306
sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec -
in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273
sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859
sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
Running org.apache.hadoop.mapreduce.v2.app.TestFail
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483
sec - in org.apache.hadoop.mapreduce.v2.app.TestFail
Running org.apache.hadoop.mapreduce.v2.app.TestMRApp
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554
sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp
Running org.apache.hadoop.mapreduce.v2.app.TestKill
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046
sec - in org.apache.hadoop.mapreduce.v2.app.TestKill
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314
sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272
sec  FAILURE! - in
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
Time elapsed: 5.28 sec   FAILURE!
java.lang.AssertionError: expected:SETUP but was:RUNNING
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:147)
 at

Re: MR2 Job over LZO data

2014-03-06 Thread Stanley Shi

May be you can try download the LZO class and rebuild it against Hadoop
2.2.0;
If build success, you should be good to go;
if failed, then maybe you need to wait for the LZO guys to update their
code.

Regards,
*Stanley Shi,*



On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0

 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.
 But when using the LZO format the job fails:
 import com.hadoop.mapreduce.LzoTextInputFormat;
 job.setInputFormatClass(LzoTextInputFormat.class);

 Dependencies from the maven repository:
 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
 Also tried with elephant-bird-core 4.4

 The same data can be queried fine from within Hive(0.12) on the same
 cluster.


 The exception:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
  at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
  at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
  at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
  at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
  at com.cloudreach.DataQuality.Main.main(Main.java:42)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 I believe the issue is related to the changes in Hadoop 2, but where can I
 find a H2 compatible version?

 Thanks

Re: Fetching configuration values from cluster

2014-03-06 Thread Stanley Shi

You can read from http://resource-manager.host.ip:8088/conf
This is an xml format file you can use directly.

Regards,
*Stanley Shi,*



On Fri, Mar 7, 2014 at 1:46 AM, John Lilley john.lil...@redpoint.netwrote:

  How would I go about fetching configuration values (e.g. yarn-site.xml)
 from the cluster via the API from an application not running on a cluster
 node?

 Thanks

 John

Re: Running a Job in a Local Job Runner:Windows 7 64-bit

2014-03-06 Thread Rakesh Davanum

Hi RR,

You don't need to have the actual Hadoop daemons running on windows
machince. Just install Cygwin and ensure that you have all the required
Hadoop jars in the class path of your program. You can test/debug directly
from the IDE itself just by saying Run As - Java Application on the
driver class. This will run the program in the Local Job Runner mode. You
can use this to verify the basic logic of your MR code. We you start using
advanced features of MR, the behavior/output on Local Job Runner may be
different from when running the program on a real distributed cluster.

Regards,
Rakesh


On Thu, Mar 6, 2014 at 2:36 PM, Radhe Radhe radhe.krishna.ra...@live.comwrote:

 Hi All,

 I'm trying to get some hands-on on the Map Reduce programming.

 I downloaded the code examples from Hadoop-The definitive guide, 3rd
 edition and build it using Maven:
 *mvn package -DskipTests -Dhadoop.distro=apache-2*

 Next I imported the maven projects into Eclipse. Using Eclipse now I can
 develop my own Map Reduce jobs.
 Bu how do I test\run the job locally using the *Local Job Runner*?

 The book excerpt says:





 *Now we can run this application against some local files. Hadoop comes
 with a localjob runner, a cut-down version of the MapReduce execution
 engine for running Map-Reduce jobs in a single JVM. It's designed for
 testing, and is very convenient for use inan IDE, since you can run it in a
 debugger to step through the code in your mapper andreducer.*
 Do I also need to install Hadoop locally on Windows for that?

 Thanks,
 -RR

RE: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Alan Paulsen

Hi Fengyun,

 

Here's what I've done in the past when facing a similar issue:

 

1)  Set the map output schema to a UNION of both of your target schemas,
A and B.

2)  Serialize the data in the mappers, using the avro datum as the
value.

3)  Figure out what the avro schema is for each datum and write out the
data in the reducer.  

 

Thanks,

 

Alan

 

From: Fengyun RAO [mailto:raofeng...@gmail.com] 
Sent: Thursday, March 06, 2014 2:14 AM
To: user@hadoop.apache.org; u...@avro.apache.org
Subject: Re: MapReduce: How to output multiplt Avro files?

 

add avro user mail-list

 

2014-03-06 16:09 GMT+08:00 Fengyun RAO raofeng...@gmail.com:

our input is a line of text which may be parsed to e.g. A or B object.

We want all A objects written to A.avro files, while all B objects written
to B.avro.

 

I looked into AvroMultipleOutputs class:
http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMul
tipleOutputs.html

There is an example, however, it's not quite clear.

For job submission, it uses AvroMultipleOutputs.addNamedOutput to add
schemas for A and B.

In my program looks like:

AvroMultipleOutputs.addNamedOutput(job, A,
AvroKeyOutputFormat.class, aSchema, null);  

AvroMultipleOutputs.addNamedOutput(job, B,
AvroKeyOutputFormat.class, bSchema, null);

I believe this is for Reducer output files.

 

My question is what the Mapper output should be, in specific what
job.setMapOutputValueClass should be, 

since the Mapper output could be A or B object, with schema aSchema or
bSchema.

 

In my progam, I simply set it to GenericData, but get error as below:

 

14/03/06 15:55:34 INFO mapreduce.Job: Task Id :
attempt_1393817780522_0012_m_10_2, Status : FAILED

Error: java.lang.NullPointerException

at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)

at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)

at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)

at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)

 

I have no idea what this means.

Re: Running a Job in a Local Job Runner:Windows 7 64-bit

Running your Driver class from Eclipse should automatically run it in
the local runner mode (as thats the default mode).

You shouldn't need a local Hadoop install for this.

On Thu, Mar 6, 2014 at 11:36 AM, Radhe Radhe
radhe.krishna.ra...@live.com wrote:
 Hi All,

 I'm trying to get some hands-on on the Map Reduce programming.

 I downloaded the code examples from Hadoop-The definitive guide, 3rd edition
 and build it using Maven:
 mvn package -DskipTests -Dhadoop.distro=apache-2

 Next I imported the maven projects into Eclipse. Using Eclipse now I can
 develop my own Map Reduce jobs.
 Bu how do I test\run the job locally using the Local Job Runner?

 The book excerpt says:
 Now we can run this application against some local files. Hadoop comes with
 a local
 job runner, a cut-down version of the MapReduce execution engine for running
 Map-
 Reduce jobs in a single JVM. It's designed for testing, and is very
 convenient for use in
 an IDE, since you can run it in a debugger to step through the code in your
 mapper and
 reducer.

 Do I also need to install Hadoop locally on Windows for that?

 Thanks,
 -RR



-- 
Harsh J

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan

Thanks for the update. Let me ask a question before continuing the 
installation. It has been stated

To get a Hadoop distribution, download a recent stable release from one of 
the Apache Download Mirrors.

Do you mean the source package or the other?

hadoop-2.3.0-src.tar.gz  (14MB)
hadoop-2.3.0.tar.gz  (127MB)

 
Regards,
Mahmood



On Friday, March 7, 2014 6:04 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp 
wrote:
 
Hi Mahmood,

 I have downloaded hadoop-2.3.0-src and followed the guide from
 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

The documentation is still old, and you don't need to compile
the source code to build a cluster.

I built the latest document and uploaded to github.io. Please follow the 
following page.

http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/SingleCluster.html

Thanks,
Akira


(2014/03/06 11:08), Mahmood Naderan wrote:
 Stuck at this step.
 Hope to receive any idea...

 Regards,
 Mahmood


 On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan
 nt_mahm...@yahoo.com wrote:
 Hi
 I have downloaded hadoop-2.3.0-src and followed the guide from
 http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html

 The first command mvn clean install -DskipTests was successful.
 However wen I run

     cd hadoop-mapreduce-project
     mvn clean install assembly:assembly -Pnative

 At some point I get an error. Searching the error message on the web
 shows some QA on the mailing list archive but seems that they are
 related to developers. Please see the full messages below (sorry for
 long post).

 Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
 Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59
 sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
 Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
 Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003
 sec  FAILURE! - in
 org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
 testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
 Time elapsed: 2.978 sec   FAILURE!
 java.lang.AssertionError: null
      at org.junit.Assert.fail(Assert.java:92)
      at org.junit.Assert.assertTrue(Assert.java:43)
      at org.junit.Assert.assertNotNull(Assert.java:526)
      at org.junit.Assert.assertNotNull(Assert.java:537)
      at
 org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314)

 testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler)
 Time elapsed: 0.255 sec   FAILURE!
 java.lang.AssertionError: null
      at org.junit.Assert.fail(Assert.java:92)
      at org.junit.Assert.assertTrue(Assert.java:43)
      at org.junit.Assert.assertNotNull(Assert.java:526)
      at org.junit.Assert.assertNotNull(Assert.java:537)
      at
 org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263)

 Running org.apache.hadoop.mapreduce.v2.app.TestRecovery
 Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458
 sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery
 Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
 Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306
 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
 Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec -
 in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
 Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273
 sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics
 Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859
 sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
 Running org.apache.hadoop.mapreduce.v2.app.TestFail
 Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483
 sec - in org.apache.hadoop.mapreduce.v2.app.TestFail
 Running org.apache.hadoop.mapreduce.v2.app.TestMRApp
 Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554
 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp
 Running org.apache.hadoop.mapreduce.v2.app.TestKill
 Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046
 sec - in org.apache.hadoop.mapreduce.v2.app.TestKill
 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314
 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
 Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
 Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272
 sec  FAILURE! - in

how to import the hadoop code into eclipse.

2014-03-06 Thread Avinash Kujur

hi,

i have downloaded the hadoop code. And executed maven command successfully.
how to import hadoop source code cleanly. because its showing red
exclamation mark on some of the modules while i am importing it.
help me out.
 thanks in advance.

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen

mvn eclipse:eclipse, and then import the existing projects in eclipse.

- Zhijie


On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote:

 hi,

 i have downloaded the hadoop code. And executed maven command
 successfully. how to import hadoop source code cleanly. because its showing
 red exclamation mark on some of the modules while i am importing it.
 help me out.
  thanks in advance.





-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Avinash Kujur

i did that. but i have some doubt while importing code. because its showing
some warning and error on imported modules. i was wondering if u could give
me any proper procedure link.


On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen zs...@hortonworks.com wrote:

 mvn eclipse:eclipse, and then import the existing projects in eclipse.

 - Zhijie


 On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote:

 hi,

 i have downloaded the hadoop code. And executed maven command
 successfully. how to import hadoop source code cleanly. because its showing
 red exclamation mark on some of the modules while i am importing it.
 help me out.
  thanks in advance.





 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread hequn cheng

Hi~
First, i use FileSystem to open a file in hdfs.
 FSDataInputStream m_dis = fs.open(...);

Second, read the data in m_dis to a byte array.
  byte[] inputdata = new byte[m_dis.available()];
 //m_dis.available = 47185920
  m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

the value returned by m_dis.read() is 131072(2^17), so the data after
131072 is missing. It seems that FSDataInputStream use short to manage it's
data which confused me a lot. The same code run well in hadoop1.2.1.

thank you~

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mingjiang Shi

If you just want to install a cluster to play with, download the
hadoop-2.3.0.tar.gz  (127MB).

On Fri, Mar 7, 2014 at 12:32 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 hadoop-2.3.0.tar.gz  (127MB)





-- 
Cheers
-MJ

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread Binglin Chang

the semantic of read does not guarantee read as much as possible. you need
to call read() many times or use readFully


On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng chenghe...@gmail.com wrote:

 Hi~
 First, i use FileSystem to open a file in hdfs.
  FSDataInputStream m_dis = fs.open(...);

 Second, read the data in m_dis to a byte array.
   byte[] inputdata = new byte[m_dis.available()];
  //m_dis.available = 47185920
   m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

 the value returned by m_dis.read() is 131072(2^17), so the data after
 131072 is missing. It seems that FSDataInputStream use short to manage it's
 data which confused me a lot. The same code run well in hadoop1.2.1.

 thank you~

Re: MR2 Job over LZO data

2014-03-06 Thread Gordon Wang

You can try to get the source code
https://github.com/twitter/hadoop-lzo and then compile it against
hadoop 2.2.0.

In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0


On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0

 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.
 But when using the LZO format the job fails:
 import com.hadoop.mapreduce.LzoTextInputFormat;
 job.setInputFormatClass(LzoTextInputFormat.class);

 Dependencies from the maven repository:
 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
 Also tried with elephant-bird-core 4.4

 The same data can be queried fine from within Hive(0.12) on the same
 cluster.


 The exception:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
  at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
  at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
  at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
  at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
  at com.cloudreach.DataQuality.Main.main(Main.java:42)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 I believe the issue is related to the changes in Hadoop 2, but where can I
 find a H2 compatible version?

 Thanks




-- 
Regards
Gordon Wang

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread hequn cheng

yep that did the job :)
I use readFully instead and it works well~~thank you~


2014-03-07 13:48 GMT+08:00 Binglin Chang decst...@gmail.com:

 the semantic of read does not guarantee read as much as possible. you need
 to call read() many times or use readFully


 On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng chenghe...@gmail.com wrote:

 Hi~
 First, i use FileSystem to open a file in hdfs.
  FSDataInputStream m_dis = fs.open(...);

 Second, read the data in m_dis to a byte array.
   byte[] inputdata = new byte[m_dis.available()];
  //m_dis.available = 47185920
   m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

 the value returned by m_dis.read() is 131072(2^17), so the data after
 131072 is missing. It seems that FSDataInputStream use short to manage it's
 data which confused me a lot. The same code run well in hadoop1.2.1.

 thank you~

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen

ah, yes, I was experiencing some errors on the imported modules, but I
fixed it myself manually. Not sure other people has encounter the same
problem. Here's a link: http://wiki.apache.org/hadoop/EclipseEnvironment


On Thu, Mar 6, 2014 at 9:30 PM, Avinash Kujur avin...@gmail.com wrote:

 i did that. but i have some doubt while importing code. because its
 showing some warning and error on imported modules. i was wondering if u
 could give me any proper procedure link.


 On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen zs...@hortonworks.com wrote:

 mvn eclipse:eclipse, and then import the existing projects in eclipse.

 - Zhijie


 On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote:

 hi,

 i have downloaded the hadoop code. And executed maven command
 successfully. how to import hadoop source code cleanly. because its showing
 red exclamation mark on some of the modules while i am importing it.
 help me out.
  thanks in advance.





 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: App Master issue.

2014-03-06 Thread Sai Prasanna

Hi MJ, Extremely sorry for a late response...Had some infrastructure issues
here...

I am using Hadoop 2.3.0. Actually when i was trying to solve this AppMaster
issue, i came up with a strange observation. STICKY SLOT of app-master to
only the data node at the Master node if i set the following parameters
along with *yarn.resourcemanager.hostname*

*yarn.resourcemanager.address to master:8034*
*yarn.resourcemanager.scheduler.address to master:8030*
*yarn.resourcemanager.resource-tracker.address to master:8025 *across all
the slave nodes.
The default values can be found here...
https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

I came up with this strange observation though...If i dont set this 3
values i had described above, always in the slave nodes if the appmaster is
launced, it tries connecting to the resource manager to the default 0.0.0.0
and not to the specified one. But with this values set, app-master is
always launced in the master and everything seems fine...So i brought the
datanode in the master down and checked as to what is happening...

Strange, the jobs are not even assigned to any appmaster...but i think
appmaster doesnt have any property of getting sticky...resource manager
looks for a free container and goes ahead launching it...

So now these things needs to be resolved:
1) Why does it work fine if the 3 mentioned values are set in the slave
nodes and app master gets launced only in the master node'
2) If no data node is there in the master, then application doesnt get
assigned at all to any app master.

I have attached my config files for your reference...[Renamed for better
reading n understanding]

Thanks for your response !!

On Thu, Mar 6, 2014 at 7:34 AM, Mingjiang Shi m...@gopivotal.com wrote:

Sorry, it should be accessing http://node_manager_ip:8042/conf to check
the value of yarn.resourcemanager.
scheduler.address on the node manager.

On Thu, Mar 6, 2014 at 9:36 AM, Mingjiang Shi m...@gopivotal.com wrote:

Hi Sai,
A few questions:
1. which version of hadoop are you using? yarn.resourcemanager.hostname
is a new configuration which is not available old versions.
2. Does your yarn-site.xml contains
yarn.resourcemanager.scheduler.address? If yes, what's the value?
3. or you could access http://resource_mgr:8088/conf to check the
value of yarn.resourcemanager.scheduler.address.

On Thu, Mar 6, 2014 at 3:29 AM, Sai Prasanna ansaiprasa...@gmail.comwrote:

Hi,

I have a five node cluster. One master and 4 slaves. Infact master also
has a data node running. When ever app master is launched in the master
node, simple wordcount program runs fine. But if it is launched in some
slave nodes, the progress of the application gets hung.
The problem is, though i have set the yarn.resourcemanager.hostname to
the ip-address of the master, the slave connects only to the default,
0.0.0.0:8030.
What could be the reason ???

I get the following message in the logs of app.master in web-UI.
*...Configuration: job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.*

*2014-03-05 20:15:50,597 WARN [main]
org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final
parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-03-05 20:15:50,603 INFO [main] org.apache.hadoop.yarn.client.RMProxy:
Connecting to ResourceManager at /0.0.0.0:8030
http://0.0.0.0:80302014-03-05 20:15:56,632 INFO [main]
org.apache.hadoop.ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8030 http://0.0.0.0/0.0.0.0:8030. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)*

?xml version=1.0?
!--
Licensed under the Apache License, Version 2.0 (the License);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--

configuration
property
namemapreduce.job.reduces/name
value3/value
description~I changed it so that multiple reduce tasks can be launched/description
/property
property
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle/value
/property
property
nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
/property
property
nameyarn.scheduler.minimum-allocation-mb/name
value128/value
descriptionMinimum limit of memory to allocate to each container request at the Resource Manager./description

GC overhead limit exceeded

2014-03-06 Thread haihong lu

Hi:

 i have a problem when run Hibench with hadoop-2.2.0, the wrong message
list as below

14/03/07 13:54:53 INFO mapreduce.Job:  map 19% reduce 0%
14/03/07 13:54:54 INFO mapreduce.Job:  map 21% reduce 0%
14/03/07 14:00:26 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_20_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:00:27 INFO mapreduce.Job:  map 20% reduce 0%
14/03/07 14:00:40 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_08_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:00:41 INFO mapreduce.Job:  map 19% reduce 0%
14/03/07 14:00:59 INFO mapreduce.Job:  map 20% reduce 0%
14/03/07 14:00:59 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_15_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:01:00 INFO mapreduce.Job:  map 19% reduce 0%
14/03/07 14:01:03 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_23_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:01:11 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_26_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:01:35 INFO mapreduce.Job:  map 20% reduce 0%
14/03/07 14:01:35 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_19_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:01:36 INFO mapreduce.Job:  map 19% reduce 0%
14/03/07 14:01:43 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_07_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:00 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_00_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:01 INFO mapreduce.Job:  map 18% reduce 0%
14/03/07 14:02:23 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_21_0, Status : FAILED
Error: Java heap space
14/03/07 14:02:24 INFO mapreduce.Job:  map 17% reduce 0%
14/03/07 14:02:31 INFO mapreduce.Job:  map 18% reduce 0%
14/03/07 14:02:33 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_29_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:34 INFO mapreduce.Job:  map 17% reduce 0%
14/03/07 14:02:38 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_10_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:41 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_18_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:43 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_14_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:47 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_28_0, Status : FAILED
Error: Java heap space
14/03/07 14:02:50 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_02_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:51 INFO mapreduce.Job:  map 16% reduce 0%
14/03/07 14:02:51 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_05_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:52 INFO mapreduce.Job:  map 15% reduce 0%
14/03/07 14:02:55 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_06_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:57 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_27_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:02:58 INFO mapreduce.Job:  map 14% reduce 0%
14/03/07 14:03:04 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_09_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:03:05 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_17_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:03:05 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_22_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:03:06 INFO mapreduce.Job:  map 12% reduce 0%
14/03/07 14:03:10 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_01_0, Status : FAILED
Error: GC overhead limit exceeded
14/03/07 14:03:11 INFO mapreduce.Job:  map 13% reduce 0%
14/03/07 14:03:11 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0010_m_24_0, Status : FAILED

and then i add a parameter mapred.child.java.opts to the file
mapred-site.xml,
  property
namemapred.child.java.opts/name
value-Xmx1024m/value
  /property
then another error occurs as below

14/03/07 11:21:51 INFO mapreduce.Job:  map 0% reduce 0%
14/03/07 11:21:59 INFO mapreduce.Job: Task Id :
attempt_1394160253524_0003_m_02_0, Status : FAILED
Container [pid=5592,containerID=container_1394160253524_0003_01_04] is
running beyond virtual memory limits. Current usage: 112.6 MB of 1 GB
physical memory used; 2.7 GB of

2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1394160253524_0003_01_04 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 5598 5592 5592 5592 (java) 563 14

Re: MapReduce: How to output multiplt Avro files?