Re: Unsubscribe

2018-06-12 Thread Chris MacKenzie - Studio
Unsubscribe


Re: New to this group.

2015-01-03 Thread Chris MacKenzie
Hi Krish,

I completed an MSc project using Hadoop this summer from installation
through to programming with the Java Api and then tuning. In all I did about
14 weeks solid with limited unix, server experience and an academic
knowledge of Java skills from my Masters course. I got an A ;O)

Along the way I installed Eclipse, got Hadoop to work with it and built a
genetic sequence alignment tool. It was hard work but I had a blast. I ran
it on a 32 node cluster and got some good speedups.

I¹m also interested in developing my skills further and this BigPetStore
application seems like a good way to go. Following my course I¹m a trainee
db admin for a global investment manager using Sybase.

If you want to work on a collaborative project, I am sure I could share my
Java skills and knowledge this far if you were happy to share your knowledge
too.

Why not connect on Linked In ;O)

Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/

From:  Krish Donald gotomyp...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Friday, 2 January 2015 19:43
To:  user@hadoop.apache.org
Subject:  Re: New to this group.

I would like to go towards administration side not in development side as I
don't know java at all...

On Fri, Jan 2, 2015 at 11:37 AM, Jay Vyas jayunit100.apa...@gmail.com
wrote:
 Many demos out there are for the business community...
 
 For a demonstration of hadoop at a finer grained level, how it's deployed,
 packaged, installed and used, for a developer who wants to learn hadoop the
 hard way,  
 
 I'd suggest :
 
 1 - Getting Apache bigtop stood up on VMs, and
 2 - running the BigPetStore application , which is meant to demonstrate end to
 end building testing and deployment of a hadoop batch analytics system with
 mapreduce, pig, and mahout.
 
 This will also expose you to puppet, gradle, vagrant, all in a big data app
 which solves Real world problems like jar dependencies and multiple ecosystem
 components.
 
 Since BPS generates its own data, you don't  waste time worrying about
 external data sets, Twitter credentials, etc, and can test both on your laptop
 and on a 100 node cluster (similar to teragen but for the whole ecosystem).
 
 Since it features integration tests and tested on Bigtops hadoop distribution,
 (which is 100% pure Apache based), it's imo the purest learning source, not
 blurred with company specific downloads or branding.
 
 Disclaimer : Of course I'm biased as I work on it... :)  but we've been
 working hard to make bigtop easily consumable as a gateway drug to bigdata
 processing, and if you have solid linux and Java background, im sure others
 would agree it's great place to get immersed in the hadoop ecosystem.
 
 On Jan 2, 2015, at 1:05 PM, Krish Donald gotomyp...@gmail.com wrote:
 
 I would like to work on some kind of case studies like I have seen couple on
 Horton works like twitter sentiment analysis, web log analysis etc.
 
 But if somebody can give idea about other case studies which can be worked
 upon and can be put in resume later .
 As I don't have real time project experience.
 
 On Fri, Jan 2, 2015 at 10:33 AM, Ted Yu yuzhih...@gmail.com wrote:
 You can search for Open JIRAs which are related to admin. Here is an example
 query:
 
 https://issues.apache.org/jira/browse/HADOOP-9642?jql=project%20%3D%20HADOOP
 %20AND%20status%20%3D%20Open%20AND%20text%20~%20%22admin%22
 
 FYI
 
 On Fri, Jan 2, 2015 at 10:24 AM, Krish Donald gotomyp...@gmail.com wrote:
 I have fair understanding of hadoop eco system...
 I have setup multinode cluster using VMs in my personal laptop for Hadoop
 2.0 .
 But beyond that i would like to work on some project to get a good hold on
 the subject.
 
 I basically would like to go to into Hadoop Administartion side as my
 backgroud is RDBMS databases Admnistrator .
 
 On Fri, Jan 2, 2015 at 10:11 AM, Wilm Schumacher
 wilm.schumac...@gmail.com wrote:
 Hi,
 
 the standard books may be a good start:
 
 I liked the following
 
 definitive guide:
 http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
 
 hadoop in action:
 http://www.manning.com/lam2/
 
 hadoop in practive:
 http://www.manning.com/holmes2/
 
 A list is here:
 http://wiki.apache.org/hadoop/Books
 
 Hope this helps.
 
 Best wishes,
 
 Wilm
 
 Am 02.01.2015 um 19:02 schrieb Krish Donald:
  Hi,
 
  I am new to this group and hadoop.
  Please help me to learn hadoop and suggest some self study project .
 
  Thanks
  Krish Donald
 
 
 
 





Re: Error when executing a WordCount Program

2014-09-10 Thread Chris MacKenzie
Hi have you set a class in your code ?

 WARN mapred.JobClient: No job jar file set.  User classes may not be found. 
 See JobConf(Class) or JobConf#setJar(String).
 


Also you need to check the path for your input file

 Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input
 

These are pretty straight forward errors resolve them and you should be good to 
go. 

Sent from my iPhone

 On 10 Sep 2014, at 14:19, Shahab Yunus shahab.yu...@gmail.com wrote:
 
 hdfs://latdevweb02:9000/home/hadoop/hadoop/input
 
 is this is a valid path on hdfs? Can you access this path outside of the 
 program? For example using hadoop fs -ls command? Also, was this path and 
 files in it, created by a different user?
 
 The exception seem to say that it does not exist or the running user does not 
 have permission to read it.
 
 Regards,
 Shahab
 
 
 
 On Wed, Sep 10, 2014 at 9:09 AM, YIMEN YIMGA Gael 
 gael.yimen-yi...@sgcib.com wrote:
 Hello Hadoopers,
 
  
 
 Here is the error, I’m facing when running WordCount example program written 
 by myself.
 
 Kind find attached the file of my WordCount program.
 
 Below the error.
 
  
 
 ===
 
 -bash-4.1$ bin/hadoop jar WordCount.jar
 
 Entr?e dans le programme MAIN !!!
 
 14/09/10 15:00:24 WARN mapred.JobClient: Use GenericOptionsParser for 
 parsing the arguments. Applications should implement Tool for the same.
 
 14/09/10 15:00:24 WARN mapred.JobClient: No job jar file set.  User classes 
 may not be found. See JobConf(Class) or JobConf#setJar(String).
 
 14/09/10 15:00:24 INFO util.NativeCodeLoader: Loaded the native-hadoop 
 library
 
 14/09/10 15:00:24 WARN snappy.LoadSnappy: Snappy native library not loaded
 
 14/09/10 15:00:24 INFO mapred.JobClient: Cleaning up the staging area 
 hdfs://latdevweb02:9000/user/hadoop/.staging/job_201409101141_0001
 
 14/09/10 15:00:24 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:hadoop 
 cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not 
 exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input
 
 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
 hdfs://latdevweb02:9000/home/hadoop/hadoop/input
 
 at 
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
 
 at 
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
 
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
 
 at 
 org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
 
 at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
 
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
 
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
 
 at java.security.AccessController.doPrivileged(Native Method)
 
 at javax.security.auth.Subject.doAs(Subject.java:415)
 
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
 
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
 
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
 
 at 
 fr.societegenerale.bigdata.lactool.WordCountDriver.main(WordCountDriver.java:50)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
 at java.lang.reflect.Method.invoke(Method.java:601)
 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 
 -bash-4.1$
 
 ===
 
  
 
 Thanks in advance for your help.
 
  
 
 Warm regards
 
 GYY
 
 *
 This message and any attachments (the message) are confidential, intended 
 solely for the addressee(s), and may contain legally privileged information.
 Any unauthorised use or dissemination is prohibited. E-mails are susceptible 
 to alteration.  
 Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be 
 liable for the message if altered, changed or
 falsified.
 Please visit http://swapdisclosure.sgcib.com for important information with 
 respect to derivative products.
   
 Ce message et toutes les pieces jointes (ci-apres le message) sont 
 confidentiels et susceptibles de contenir des informations couvertes 
 par le secret professionnel. 
 Ce message est etabli a l'intention exclusive de ses destinataires. Toute 
 

Re: total number of map tasks

2014-09-01 Thread Chris MacKenzie
Thanks for the update ;O)


Regards,

Chris MacKenzie
 http://www.chrismackenziephotography.co.uk/Expert in all aspects of
photography
telephone: 0131 332 6967 tel:0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
weddings: www.wedding.chrismackenziephotography.co.uk
http://www.wedding.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://twitter.com/#!/MacKenzieStudio
http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250
http://www.linkedin.com/in/chrismackenziephotography/
http://pinterest.com/ChrisMacKenzieP/




On 27/08/2014 17:36, Stijn De Weirdt stijn.dewei...@ugent.be wrote:

hi all,

someone PM'ed me suggesting i'd take a look in the input split setting,
and indeed, the splitsize is determining the number of tasks



stijn

On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
 It's my understanding that you don't get map tasks as such but
containers.

 My experience is with version 2 +

 And if that's true containers are based on memory tuning in
mapred-site.xml

 Otherwise I'd love to learn more.

 Sent from my iPhone

 On 27 Aug 2014, at 12:14, Stijn De Weirdt stijn.dewei...@ugent.be
wrote:

 hi all,

 we are tuning yarn (or trying to) on our environment (shared
fielsystem, no hdfs) using terasort and one of the main issue we are
seeing is that an avg map task takes  15sec. some tuning guides and
websites suggest that ideally map tasks run between 40sec to 1 or 2
minutes.

 (however, it's also not very clear if the recommendations are still
valid for yarn)

 in particluar, we see way more map tasks then expected, and we are
wondering how the number of map tasks per job run is determined.

 teragen created 64 output files, we are only expecting 64 map tasks,
each processing one input file. however, we see something like 3000
tasks


 hints are much appreciated

 stijn






Re: total number of map tasks

2014-08-27 Thread Chris MacKenzie
It's my understanding that you don't get map tasks as such but containers. 

My experience is with version 2 +

And if that's true containers are based on memory tuning in mapred-site.xml

Otherwise I'd love to learn more. 

Sent from my iPhone

 On 27 Aug 2014, at 12:14, Stijn De Weirdt stijn.dewei...@ugent.be wrote:
 
 hi all,
 
 we are tuning yarn (or trying to) on our environment (shared fielsystem, no 
 hdfs) using terasort and one of the main issue we are seeing is that an avg 
 map task takes  15sec. some tuning guides and websites suggest that ideally 
 map tasks run between 40sec to 1 or 2 minutes.
 
 (however, it's also not very clear if the recommendations are still valid for 
 yarn)
 
 in particluar, we see way more map tasks then expected, and we are wondering 
 how the number of map tasks per job run is determined.
 
 teragen created 64 output files, we are only expecting 64 map tasks, each 
 processing one input file. however, we see something like 3000 tasks
 
 
 hints are much appreciated
 
 stijn


Re: Hadoop YARM Cluster Setup Questions

2014-08-23 Thread Chris MacKenzie
Hi,

The requirement is simply to have the slaves and masters files on the resource 
manager it's used by the shell script that starts the demons :-)

Sent from my iPhone

 On 23 Aug 2014, at 16:02, S.L simpleliving...@gmail.com wrote:
 
 Ok, Ill copy the slaves file to the other slave nodes as well.
 
 What about the masters file though?
 
 Sent from my HTC
 
 - Reply message -
 From: rab ra rab...@gmail.com
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: Hadoop YARM Cluster Setup Questions
 Date: Sat, Aug 23, 2014 5:03 AM
 
 Hi,
 
 1. Typically,we used to copy the slaves file all the participating nodes
 though I do not have concrete theory to back up this. Atleast, this is what
 I was doing in hadoop 1.2 and I am doing the same in hadoop 2x
 
 2. I think, you should investigate the yarn GUI and see how many maps it
 has spanned. There is a high possibility that both the maps are running in
 the same node in parallel. Since there are two splits, there would be two
 map processes, and one node is capable of handling more than one map.
 
 3. There could be no replica of input file stored and it is small, and
 hence stored in a single block in one node itself.
 
 These could be few hints which might help you
 
 regards
 rab
 
 
 
 On Sat, Aug 23, 2014 at 12:26 PM, S.L simpleliving...@gmail.com wrote:
 
  Hi Folks,
 
  I was not able to find  a clear answer to this , I know that on the master
  node we need to have a slaves file listing all the slaves , but do we need
  to have the slave nodes have a master file listing the single name node( I
  am not using a secondary name node). I only have the slaves file on the
  master node.
 
  I was not able to find a clear answer to this ,the reason I ask this is
  because when I submit a hadoop job , even though the input is being split
  into 2 parts , only one data node is assigned applications , the other two
  ( I have three) are no tbeing assigned any applications.
 
  Thanks in advance!
 


Re: Hadoop 2.2 Built-in Counters

2014-08-14 Thread Chris MacKenzie
Hi,

This is the content of my shell script for running the job history server:

cd $HADOOP_PREFIX
hadoop fs -mkdir -p /mr-history/tmp
hadoop fs -chmod -R 1777 /mr-history/tmp
hadoop fs -mkdir -p /mr-history/done
hadoop fs -chmod -R 1777 /mr-history/done

sbin/mr-jobhistory-daemon.sh start historyserver

These configurable variables are in mapred-site.xml

property
namemapreduce.jobhistory.address/name
value137.195.143.129:10020/value
descriptionDefault port is 10020./description
/property

property
namemapreduce.jobhistory.webapp.address/name
value137.195.143.129:19888/value
descriptionDefault port is 19888./description
/property

I start the history server on the same node as my resource manager


The counters are available from when the job is running from:

http://your-server:8088/proxy/application_1408007466921_0002/mapreduce/job/
job_1408007466921_0002

Drill down through the application master to the job.

If you don¹t have the history server running the job data is not
persistent.

Hope this helps.


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  ou senshaw sens...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Thursday, 14 August 2014 07:14
To:  user@hadoop.apache.org
Subject:  Hadoop 2.2 Built-in Counters


Hi all,
I'm trying to analyze my mapreduce job performance via built-in counters
such as physical memory usage, heap memory usage...
When the job is running, I can watch these counters via Resource manager
website(namenode:8088). However, when the job is done, counter information
is not available in resource manager website anymore. I know I can get
them from client output. I was wondering if there is other place in name
node or data node to get the final counter measures regarding job id?
Thanks,
Shaw




Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-14 Thread Chris MacKenzie
Hi,

I have been using Hadoop since Christmas loosely and from May for an
Software engineering MSc at Heriot Watt University in Edinburgh, Scotland.
I have written a genetic sequence alignment algorithm.

I have installed Hadoop in various places including a 32 node cluster and
am using eclipse kepler sr 2 as an IDE.

My current Hadoop version is 2.4.1 which I download as a tar from the
apache mirror servers.

It¹s been a tough learning curve, but that has made the learning all the
more valuable.

I believe using the straight Hadoop version has given insights that
proprietary builds wouldn¹t have. There are so many confusing issues that
crop up, it¹s easy to attach importance to trying to fix the an error
which masks another. With the proprietary versions it would be easy to
attach blame where it¹s not that build or this builds fault.

Go with your heart but be prepared to work to solve the problems you
encounter.

Buy Tom Whites book, it isn¹t perfect and a couple of years out of date
but it gives you enough detail and structure to build an impression you
can work from. The downloadable source code is a great help when trying to
get started.

Good luck.


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Adaryl \Bob\ Wakefield, MBA adaryl.wakefi...@hotmail.com
Reply-To:  user@hadoop.apache.org
Date:  Thursday, 14 August 2014 01:13
To:  user@hadoop.apache.org
Subject:  Re: Started learning Hadoop. Which distribution is best for
native install in pseudo distributed mode?


He didn¹t ask for the best and nobody framed up their answer like that. He
asked what people were using. Out of the 10 responses only four of them
actually 
answered his question.
 
I¹ve been studying Hadoop for two months straight. Quite frankly, I wish
more people would ask for community input and what does what and how.
 
Adaryl 
Bob Wakefield, MBA
Principal
Mass Street 
Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: 
@BobLovesData
 
From: Kilaru, Sambaiah mailto:sambaiah_kil...@intuit.com
Sent: Wednesday, August 13, 2014 1:10 PM
To: user@hadoop.apache.org
Subject: Re: Started learning Hadoop. Which distribution is best for
native install in pseudo distributed mode?


 

Engough wars on going on which is best. You choose one of it and try to
learn and there is nothing that x is better or y is better.
It is upto your choice.
 
Thanks,
Sam
 
From: Sebastiano Di Paola sebastiano.dipa...@gmail.com
Reply-To: user@hadoop.apache.org user@hadoop.apache.org
Date: Wednesday, August 13, 2014 at 6:28
PM
To: user@hadoop.apache.org user@hadoop.apache.org
Subject: Re: Started learning Hadoop. Which
distribution is best for native install in pseudo distributed mode?

 
Hi,
I'm a newbie too and I'm not using any particular distribution. Just
download the component I need / want to try for my deploiment and use
them.

It's a slow process but allows me to better understand what I'm
doing under the hood.

Regards,
Seba



On Tue, Aug 12, 2014 at 10:12 PM, mani kandan mankand...@gmail.com wrote:

  Which distribution are you people using? Cloudera vs Hortonworks vs
  Biginsights? 



 




Re: Can anyone help me resolve this Error: unable to create new native thread

2014-08-14 Thread Chris MacKenzie
Hi Ravi,

I resolved this. Many thanks.


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Ravi Prakash ravi...@ymail.com
Reply-To:  user@hadoop.apache.org
Date:  Friday, 15 August 2014 01:31
To:  user@hadoop.apache.org user@hadoop.apache.org
Subject:  Re: Can anyone help me resolve this Error: unable to create new
native thread


Hi Chris!


When is this error caused? Which logs do you see this in? Are you sure you
are setting the ulimit for the correct user? What application are you
trying to run which is causing you to run up against this limit?


HTH
Ravi
 


 On Saturday, August 9, 2014 6:07 AM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:
  
  

 Hi,

I¹ve scrabbled around looking for a fix for a while and have set the soft
ulimit size to 13172.

I¹m using Hadoop 2.4.1

Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




  
 
  
 




Re: ulimit for Hive

2014-08-12 Thread Chris MacKenzie
Hi Zhijie,

ulimit is common between hard and soft ulimit

The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user

A user can add a line to their .profile file setting a soft -ulimit up to
the hard limit. You can google how to do that

You can check the ulimits like so:

ulimit -H -a // hard limit
ulimit -S -a // soft limit

The max value for the hard limit is -unlimited. I currently have mine set
to this as I was running out of processes (nproc)

I don¹t know about restarting, I think so.
I don¹t know about hive.



Warm regards.

Chris

telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://twitter.com/#!/MacKenzieStudio
http://www.linkedin.com/in/chrismackenziephotography/




From:  Zhijie Shen zs...@hortonworks.com
Reply-To:  user@hadoop.apache.org
Date:  Tuesday, 12 August 2014 18:33
To:  user@hadoop.apache.org, u...@hive.apache.org
Subject:  Re: ulimit for Hive


+ Hive user mailing list
It should be a better place for your questions.



On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan ana.gil...@gmail.com wrote:

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for
file descriptors in Hadoop and I think it¹s probably the cause of a lot of
the errors I¹ve been having when trying to run queries on larger data sets
in Hive. However, I¹m really confused about how and where to set the
limit, so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

Thanks in advance,
Ana







-- 
Zhijie ShenHortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately and delete it from your system. Thank You.




Can anyone help me resolve this Error: unable to create new native thread

2014-08-09 Thread Chris MacKenzie
Hi,

I¹ve scrabbled around looking for a fix for a while and have set the soft
ulimit size to 13172.

I¹m using Hadoop 2.4.1

Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




Re: Can anyone tell me the current typical memory specification, switch size and disk space

2014-08-04 Thread Chris MacKenzie
Thanks Adaryl,

I’m currently looking at Tom White p298, published May 2012, which
references a 2010 spec. Both Tom and Eric's books where published in 2012
so the information in both will be a tad dated no doubt.

What I need to know is the current:

Processor average spec
Memory spec
Disk storage spec
Network speed.

Can you help me out with that ?

Thanks in advance,

Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






On 01/08/2014 17:28, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:

The book Hadoop Operations by Eric Sammer helped answer a lot of these
questions for me.


Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
-Original Message-
From: Chris MacKenzie
Sent: Friday, August 01, 2014 4:35 AM
To: user@hadoop.apache.org
Subject: Can anyone tell me the current typical memory specification,
switch 
size and disk space

Hi,

I¹d really appreciate it if someone could let me know the current
preferred specification for a cluster set up.

On average how many nodes
Disk space
Memory
Switch size

A link to a paper or discussion would be much appreciated.

Thanks in advance


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






Can anyone tell me the current typical memory specification, switch size and disk space

2014-08-01 Thread Chris MacKenzie
Hi,

I¹d really appreciate it if someone could let me know the current
preferred specification for a cluster set up.

On average how many nodes
Disk space
Memory
Switch size

A link to a paper or discussion would be much appreciated.

Thanks in advance


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




Re: Cannot compaile a basic PutMerge.java program

2014-07-28 Thread Chris MacKenzie
Hi,

I can probably help you out with that. I don¹t want to sound patronising
though. What is your IDE and have you included the hadoop libraries in
your jar ?

Regards,

Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  R J rj201...@yahoo.com
Reply-To:  user@hadoop.apache.org
Date:  Monday, 28 July 2014 01:46
To:  user@hadoop.apache.org user@hadoop.apache.org
Subject:  Cannot compaile a basic PutMerge.java program


Hi All,

I am new to programming on hadoop. I tried to compile the following
program (example program from a hadoop book) on my linix server where I
have Haddop installed:
I get the errors:
$javac PutMerge.java
PutMerge.java:2: package org.apache.hadoop.conf does not exist
import org.apache.hadoop.conf.Configuration;
 ^
PutMerge.java:3: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataInputStream;
   ^
PutMerge.java:4: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataOutputStream;
   ^
PutMerge.java:5: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FileStatus;
   ^
PutMerge.java:6: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FileSystem;
   ^
PutMerge.java:7: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.Path;


I have $HADOOP_HOME set u:
$echo $HADOOP_HOME
/usr/lib/hadoop

Could you please suggest how to compile this program? Thanks a lot.

Shu



PutMerge.java=

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class PutMerge {

public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);

Path inputDir = new Path(args[0]);
Path hdfsFile = new Path(args[1]);

try {
FileStatus[] inputFiles =
 local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);

for (int i=0; iinputFiles.length; i++) {
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = local.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while( (bytesRead = in.read(buffer))  0) {
out.write(buffer, 0, bytesRead);
}
in.close();
}
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

=




Re: How to set up the conf folder

2014-07-28 Thread Chris MacKenzie
Hi Ravindra,

Thanks for replying, it’s much appreciated.

That’s always been the case with my setup:

export HADOOP_PREFIX=/scratch/extra/cm469/hadoop-2.4.1
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

I think my issue is that I have not set yarn-env.sh up correctly. TBH I
didn’t know it existed. I plan to get round to trying that at some point
in the near future.


Many thanks,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Ravindra ravin.i...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Monday, 28 July 2014 13:09
To:  user@hadoop.apache.org
Subject:  Re: How to set up the conf folder


Hi,


Could you try putting this in .bash_profile
export HADOOP_CONF_DIR=/scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/


Regards,
Ravindra





On Wed, Jul 23, 2014 at 3:17 PM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

Hi,

Can anyone shed some light on this for me. Every time I attempt to set up
the conf directory, I run into a whole load of errors and ssh issues which
I don¹t see when my config files are in etc/hadoop

I want to understand how to use the conf directory. My ultimate goal is to
use symbolic links

I have a running cluster based version of hadoop-2.4.1. I start and stop
the cluster from the RM with:

./hadoop-2.4.1/sbin/start-dfs.sh
./hadoop-2.4.1/sbin/start-yarn.sh

My understanding is that to use the conf directory my settings should be
was follows:


Settings:
bash-profile has $HADOOP_CONF_DIR =
/scratch/extra/cm469/hadoop-2.4.1/etc/hadoop
Hadoop-env.sh has export
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²}

conf/core-site.xml
conf/hdfs-site.xml

conf/mapred-site.xml

conf/yarn-site.xml

conf/capacity-scheduler.xml


etc/hadoop/hadoop-env.sh


etc/hadoop/slaves

etc/hadoop/masters



Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/










Re: Is it a good idea to delete / move the default configuration xml file ?

2014-07-23 Thread Chris MacKenzie
Hi thanks for that, much appreciated.

I guess they are in the jar files then ;O) I was really surprised to see
the default configs pulled in, especially considering I thought I was in
full control, I did a file search on an installation and saw the files and
jumped to the wrong conclusion.

I feel like a real idiot some times, but there is so much conflicting
information out there that later you realise that questions asked seem non
sensical but at the time they feel valid ;O)


Thanks for your tolerance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






On 21/07/2014 09:46, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

Hi All,

I have just realised that my implementation of hadoop-2.4.1 is pulling in
all the default.xml files.

I have three copies of each in different directories, obviously at least
one of those is on the class path.

Anyway with all the effort to set up a site, it seems strange to me that I
would use settings I had no idea existed and that may not be how I would
choose to set them up.


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






How to set up the conf folder

2014-07-23 Thread Chris MacKenzie
Hi,

Can anyone shed some light on this for me. Every time I attempt to set up
the conf directory, I run into a whole load of errors and ssh issues which
I don¹t see when my config files are in etc/hadoop

I want to understand how to use the conf directory. My ultimate goal is to
use symbolic links

I have a running cluster based version of hadoop-2.4.1. I start and stop
the cluster from the RM with:

./hadoop-2.4.1/sbin/start-dfs.sh
./hadoop-2.4.1/sbin/start-yarn.sh

My understanding is that to use the conf directory my settings should be
was follows:


Settings:
bash-profile has $HADOOP_CONF_DIR =
/scratch/extra/cm469/hadoop-2.4.1/etc/hadoop
Hadoop-env.sh has export
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²}

conf/core-site.xml
conf/hdfs-site.xml

conf/mapred-site.xml

conf/yarn-site.xml

conf/capacity-scheduler.xml


etc/hadoop/hadoop-env.sh


etc/hadoop/slaves

etc/hadoop/masters



Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




Amended question - How to set up the conf folder

2014-07-23 Thread Chris MacKenzie
My $HADOOP_CONF_DIR = /scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/conf
Hi,

Can anyone shed some light on this for me. Every time I attempt to set up
the conf directory, I run into a whole load of errors and ssh issues which
I don¹t see when my config files are in etc/hadoop

I want to understand how to use the conf directory. My ultimate goal is to
use symbolic links

I have a running cluster based version of hadoop-2.4.1. I start and stop
the cluster from the RM with:

./hadoop-2.4.1/sbin/start-dfs.sh
./hadoop-2.4.1/sbin/start-yarn.sh

My understanding is that to use the conf directory my settings should be
was follows:


Settings:
bash-profile has $HADOOP_CONF_DIR =
/scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/conf
Hadoop-env.sh has export
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²}

conf/core-site.xml
conf/hdfs-site.xml

conf/mapred-site.xml

conf/yarn-site.xml

conf/capacity-scheduler.xml


etc/hadoop/hadoop-env.sh


etc/hadoop/slaves

etc/hadoop/masters



Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






Re: Configuration set up questions - Container killed on request. Exit code is 143

2014-07-21 Thread Chris MacKenzie
Thanks Ozawa


Regards,

Chris MacKenzie
 http://www.chrismackenziephotography.co.uk/Expert in all aspects of
photography
telephone: 0131 332 6967 tel:0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
weddings: www.wedding.chrismackenziephotography.co.uk
http://www.wedding.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://twitter.com/#!/MacKenzieStudio
http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250
http://www.linkedin.com/in/chrismackenziephotography/
http://pinterest.com/ChrisMacKenzieP/




On 18/07/2014 18:07, Tsuyoshi OZAWA”   wrote:

Hi Chris MacKenzie,

How about trying as follows to identify the reason of your problem?

1. Making both yarn.nodemanager.pmem-check-enabled and
yarn.nodemanager.vmem-check-enabled false
2. Making yarn.nodemanager.pmem-check-enabled true
3. Making yarn.nodemanager.pmem-check-enabled true and
yarn.nodemanager.vmem-pmem-ratio large value(e.g. 100)
4. Making yarn.nodemanager.pmem-check-enabled true and
yarn.nodemanager.vmem-pmem-ratio expected value(e.g. 2.1 or something)

If there is problem on 1, the reason may be JVM configuration problem
or another issue. If there is problem on 2, the reason is shortage of
physical memory.

Thanks,
- Tsuyoshi


On Fri, Jul 18, 2014 at 6:52 PM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:
 Hi Guys,

 Thanks very much for getting back to me.


 Thanks Chris - the idea of slitting the data is a great suggestion.
 Yes Wangda, I was restarting after changing the configs

 I’ve been checking the relationship between what I thought was in my
 config files and what hadoop thought were in them.

 With:

 // Print out Config file settings for testing.
 for (EntryString, String entry: conf){
 System.out.printf(%s=%s\n, entry.getKey(), entry.getValue());
 }



 There were anomalies ;0(

 Now that my hadoop reflects the values that are in my config files - I
 just get the message “Killed” without any explanation.


 Unfortunately, where I was applying changes incrementally and testing
I’ve
 applied all the changes all at once.

 I’m now backing out the changes I made slowly to see where it starts to
 reflect what I expect.

 Regards,

 Chris MacKenzie
 telephone: 0131 332 6967
 email: stu...@chrismackenziephotography.co.uk
 corporate: www.chrismackenziephotography.co.uk
 http://www.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
 http://www.linkedin.com/in/chrismackenziephotography/






 From:  Chris Mawata chris.maw...@gmail.com
 Reply-To:  user@hadoop.apache.org
 Date:  Thursday, 17 July 2014 16:15
 To:  user@hadoop.apache.org
 Subject:  Re: Configuration set up questions - Container killed on
 request. Exit code is 143


 Another thing to try is smaller input splits if your data can be broken
up
 into smaller files that can be independently processed. That way s
 you get more but smaller map tasks. You could also use more  but smaller
 reducers. The many files will tax your NameNode more but you might get
to
 use all you cores.
 On Jul 17, 2014 9:07 AM, Chris MacKenzie
 stu...@chrismackenziephotography.co.uk wrote:

 Hi Chris,

 Thanks for getting back to me. I will set that value to 10

 I have just tried this.
 
https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN-
Me
 mory-Parameters

 Setting both to mapreduce.map.memory.mb mapreduce.reduce.memory.mb.
Though
 after setting it I didn’t get the expected change.

 As the output was still 2.1 GB of 2.1 GB virtual memory used. Killing
 container


 Regards,

 Chris MacKenzie
 telephone: 0131 332 6967
 email: stu...@chrismackenziephotography.co.uk
 corporate: www.chrismackenziephotography.co.uk
 http://www.chrismackenziephotography.co.uk
 http://www.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
 http://www.linkedin.com/in/chrismackenziephotography/






 From:  Chris Mawata chris.maw...@gmail.com
 Reply-To:  user@hadoop.apache.org
 Date:  Thursday, 17 July 2014 13:36
 To:  Chris MacKenzie stu...@chrismackenziephotography.co.uk
 Cc:  user@hadoop.apache.org
 Subject:  Re: Configuration set up questions - Container killed on
 request. Exit code is 143


 Hi Chris MacKenzie,  I have a feeling (I am not familiar with the
kind
 of work you are doing) that your application is memory intensive.  8
cores
 per node and only 12GB is tight. Try bumping up the
 yarn.nodemanager.vmem-pmem-ratio
 Chris Mawata




 On Wed, Jul 16, 2014 at 11:37 PM, Chris MacKenzie
 stu...@chrismackenziephotography.co.uk wrote:

 Hi,

 Thanks Chris Mawata
 I’m working through this myself, but wondered if anyone could point me
in
 the right direction.

 I have attached my configs.


 I’m using hadoop 2.41

 My system is:
 32 Clusters
 8 processors per machine
 12 gb ram
 Available disk

Is it a good idea to delete / move the default configuration xml file ?

2014-07-21 Thread Chris MacKenzie
Hi All,

I have just realised that my implementation of hadoop-2.4.1 is pulling in
all the default.xml files.

I have three copies of each in different directories, obviously at least
one of those is on the class path.

Anyway with all the effort to set up a site, it seems strange to me that I
would use settings I had no idea existed and that may not be how I would
choose to set them up.


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




Re: Configuration set up questions - Container killed on request. Exit code is 143

2014-07-18 Thread Chris MacKenzie
Hi Guys,

Thanks very much for getting back to me.
 

Thanks Chris - the idea of slitting the data is a great suggestion.
Yes Wangda, I was restarting after changing the configs

I’ve been checking the relationship between what I thought was in my
config files and what hadoop thought were in them.

With:

// Print out Config file settings for testing.
for (EntryString, String entry: conf){
System.out.printf(%s=%s\n, entry.getKey(), entry.getValue());
}



There were anomalies ;0(

Now that my hadoop reflects the values that are in my config files - I
just get the message “Killed” without any explanation.


Unfortunately, where I was applying changes incrementally and testing I’ve
applied all the changes all at once.

I’m now backing out the changes I made slowly to see where it starts to
reflect what I expect.

Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Chris Mawata chris.maw...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Thursday, 17 July 2014 16:15
To:  user@hadoop.apache.org
Subject:  Re: Configuration set up questions - Container killed on
request. Exit code is 143


Another thing to try is smaller input splits if your data can be broken up
into smaller files that can be independently processed. That way s
you get more but smaller map tasks. You could also use more  but smaller
reducers. The many files will tax your NameNode more but you might get to
use all you cores.
On Jul 17, 2014 9:07 AM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

Hi Chris,

Thanks for getting back to me. I will set that value to 10

I have just tried this.
https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN-Me
mory-Parameters

Setting both to mapreduce.map.memory.mb mapreduce.reduce.memory.mb. Though
after setting it I didn’t get the expected change.

As the output was still 2.1 GB of 2.1 GB virtual memory used. Killing
container


Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Chris Mawata chris.maw...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Thursday, 17 July 2014 13:36
To:  Chris MacKenzie stu...@chrismackenziephotography.co.uk
Cc:  user@hadoop.apache.org
Subject:  Re: Configuration set up questions - Container killed on
request. Exit code is 143


Hi Chris MacKenzie,  I have a feeling (I am not familiar with the kind
of work you are doing) that your application is memory intensive.  8 cores
per node and only 12GB is tight. Try bumping up the
yarn.nodemanager.vmem-pmem-ratio
Chris Mawata




On Wed, Jul 16, 2014 at 11:37 PM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

Hi,

Thanks Chris Mawata
I’m working through this myself, but wondered if anyone could point me in
the right direction.

I have attached my configs.


I’m using hadoop 2.41

My system is:
32 Clusters
8 processors per machine
12 gb ram
Available disk space per node 890 gb

This is my current error:

mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id :
attempt_1405538067846_0006_r_00_1, Status : FAILED
Container [pid=25848,containerID=container_1405538067846_0006_01_04]
is running beyond virtual memory limits. Current usage: 439.0 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1405538067846_0006_01_04 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 25853 25848 25848 25848 (java) 2262 193 2268090368 112050
/usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx768m
-Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog
s/application_1405538067846_0006/container_1405538067846_0006_01_04
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056
attempt_1405538067846_0006_r_00_1 4
|- 25848 25423 25848 25848 (bash) 0 0 108613632 333 /bin/bash -c
/usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN  -Xmx768m
-Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
plication_1405538067846_0006

Configuration set up questions - Container killed on request. Exit code is 143

2014-07-16 Thread Chris MacKenzie
Hi,

Thanks Chris Mawata
I’m working through this myself, but wondered if anyone could point me in
the right direction.

I have attached my configs.


I’m using hadoop 2.41

My system is:
32 Clusters
8 processors per machine
12 gb ram
Available disk space per node 890 gb

This is my current error:

mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id :
attempt_1405538067846_0006_r_00_1, Status : FAILED
Container [pid=25848,containerID=container_1405538067846_0006_01_04]
is running beyond virtual memory limits. Current usage: 439.0 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1405538067846_0006_01_04 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 25853 25848 25848 25848 (java) 2262 193 2268090368 112050
/usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx768m
-Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog
s/application_1405538067846_0006/container_1405538067846_0006_01_04
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056
attempt_1405538067846_0006_r_00_1 4
|- 25848 25423 25848 25848 (bash) 0 0 108613632 333 /bin/bash -c
/usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN  -Xmx768m
-Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap
plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog
s/application_1405538067846_0006/container_1405538067846_0006_01_04
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056
attempt_1405538067846_0006_r_00_1 4
1/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846
_0006/container_1405538067846_0006_01_04/stdout
2/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846
_0006/container_1405538067846_0006_01_04/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143






Regards,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/






From:  Chris Mawata chris.maw...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Thursday, 17 July 2014 02:10
To:  user@hadoop.apache.org
Subject:  Re: Can someone shed some light on this ? - java.io.IOException:
Spill failed


I would post the configuration files -- easier for someone to spot
something wrong than to imagine what configuration would get you to that
stacktrace. The part
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
not find any valid local directory for
attempt_1405523201400_0006_m_00_0_spill_8.out

would suggest you might not have hadoop.tmp.dir set (?)



On Wed, Jul 16, 2014 at 1:02 PM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

Hi,

Is this a coding or a setup issue ?

I¹m using Hadoop 2.41
My program is doing a concordance on 500,000 sequences of 400 chars.
My cluster set is 32 data nodes and two masters.

The exact error is:
Error: java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTas
k.java:1535)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1062)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInput
OutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapp
er.java:112)
at
par.gene.align.v3.concordance.ConcordanceMapper.map(ConcordanceMapper.java:
96)
at
par.gene.align.v3.concordance.ConcordanceMapper.map(ConcordanceMapper.java:
1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
ava:1556

ControlledJob.java:submit Job in state RUNNING instead of DEFINE - can someone shed some light on this error for me ;O)

2014-07-09 Thread Chris MacKenzie
Hi,

I¹m using Controlledjob and my code is:

ControlledJob doConcordance = new ControlledJob(
this.doParallelConcordance(), null);
ŠŠ...Š.
control.addJob(doConcordance);
control.addJob(viableSubequenceMaxLength);
control.addJob(viableSubSequences);
control.addJob(actualCriticalSubsequence);
control.addJob(generatePins);
control.addJob(deriveLeftPinMaxLength);
control.addJob(doAlignment);
control.addJob(doConcatenation);


When it comes to an end I have:

jobcontrol.ControlledJob (ControlledJob.java:submit(338)) - Concordance
Phase got an error while submitting
java.lang.IllegalStateException: Job in state RUNNING instead of DEFINE



Thanks in advance,

Chris MacKenzie
telephone: 0131 332 6967
email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




Thank you And What advice would you give me on running my first Hadoop cluster based Job

2014-07-04 Thread Chris MacKenzie
Hi,

Over the past two weeks, from a standing start, I¹ve worked on a Hadoop
based parallel genetic sequence alignment algorithm as part of my
university masters project.

Thankfully that¹s now up and running, along the way I got some great help
from members of this group and I deeply appreciate that strangers would
take time out of their busy lives to shed a bit of light on what seemed at
times an insurmountable task.

On Monday I get to play with a 32 node system and the only advice I have
so far is to benchmark my algorithm with 5gb per node.

I wonder if, if you were starting out again on your first big Hadoop map
reduce job what would would you differently ? What advice would you give
me starting out ?

Thanks again, I really appreciate your support.

Best Chris


Regards,

Chris MacKenzie
 http://www.chrismackenziephotography.co.uk/
http://www.chrismackenziephotography.co.uk/Expert
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://www.linkedin.com/in/chrismackenziephotography/




What is the correct way to get a string back from a mapper or reducer

2014-07-02 Thread Chris MacKenzie
Hi,

I have the following code and am using hadoop 2.4:

In my driver:
Configuration conf = new Configuration();
conf.set(sub, help);
Š..
String s = conf.get(sub²);

In my reducer:
Configuration conf = context.getConfiguration();
conf.set(sub, Test²);

When I test the value in the driver, it isn¹t updated following the reduce


Best,

Chris MacKenzie
 http://www.chrismackenziephotography.co.uk/
http://www.chrismackenziephotography.co.uk/Expert




job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Chris MacKenzie
Hi,

What is the anticipated usage of the above with the new api ? Is there
another way to remove the empty part-r files

When using it with MultipleOutputs to remove empty part-r files I have no
output ;O)



Regards,

Chris MacKenzie
http://www.chrismackenziephotography.co.uk/




Re: job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Chris MacKenzie
Hi Markus And Shahab,

Thanks for getting back to me, I really appreciate it. LazyOutputFormat did
the trick. I tried NUllOutputFormat
(job.setOutputFormatClass(NullOutputFormat.class);) before writing to the
group but was getting an empty folder.

I looked at LazyOutputFormat, in fact, my mos is written from:
http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapreduce/lib/out
put/MultipleOutputs.html

Just couldn¹t see the wood for the trees ;O)


Best,

Chris




Re: Partitioning and setup errors

2014-06-29 Thread Chris MacKenzie
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceReducer not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getReducerClass(JobContextIm
pl.java:210)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:611)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class
par.gene.align.concordance.ConcordanceReducer not found
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 8 more


14/06/29 13:17:37 INFO mapreduce.Job:  map 0% reduce 100%
14/06/29 13:17:37 INFO mapreduce.Job: Job job_1403953980618_0009 failed with
state FAILED due to: Task failed task_1403953980618_0009_r_00
Job failed as tasks failed. failedMaps:0 failedReduces:1


14/06/29 13:18:04 INFO mapreduce.Job: Counters: 7
Job Counters
Failed reduce tasks=4
Launched reduce tasks=4
Total time spent by all maps in occupied slots (ms)=0
Total time spent by all reduces in occupied slots (ms)=7786
Total time spent by all reduce tasks (ms)=7786
Total vcore-seconds taken by all reduce tasks=7786
Total megabyte-seconds taken by all reduce tasks=7972864
14/06/29 13:18:05 INFO ipc.Client: Retrying connect to server:
admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 0 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
14/06/29 13:18:06 INFO ipc.Client: Retrying connect to server:
admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 1 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
14/06/29 13:18:07 INFO ipc.Client: Retrying connect to server:
admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 2 time(s); retry
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000
MILLISECONDS)
14/06/29 13:18:07 INFO mapred.ClientServiceDelegate: Application state is
completed. FinalApplicationStatus=FAILED. Redirecting to job history server
End time = 1404044288031
Elapsed time = 161318 ms
Finished
/usr/local/hadoop-2.4.0/jar_files $
Warm regards.
Chris

From:  Vinod Kumar Vavilapalli vino...@hortonworks.com
Reply-To:  user@hadoop.apache.org
Date:  Sunday, 29 June 2014 00:20
To:  user@hadoop.apache.org
Subject:  Re: Partitioning and setup errors

What is happening is the client is not able to pick up the right jar to push
to the cluster. It looks in the class-path for the jar that contains the
class ParallelGeneticAlignment.

How are you packaging your code? How are your running your job - paste the
command line?

+Vinod 

On Jun 27, 2014, at 5:15 AM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:

 Hi,
 
 I realise my previous question may have been a bit naïve and I also realise I
 am asking an awful lot here, any advice would be greatly appreciated.
 * I have been using Hadoop 2.4 in local mode and am sticking to the
 mapreduce.* side of the track.
 * I am using a Custom Line reader to read each sequence into a Map
 * I have a partitioner class which is testing the key from the map class.
 * I've tried debugging in eclipse with a breakpoint in the partitioner class
 but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks)
 is not being called.
 Could there be any reason for that ?
 
 Because my map and reduce code works in local mode within eclipse, I wondered
 if I may get the partitioner to work if  I changed to Pseudo Distributed Mode
 exporting a runnable jar from Eclipse (Kepler)
 
 I have several faults On my own computer  Pseudo Distributed Mode and the
 university clusters Pseudo Distributed Mode which I set up. I¹ve googled and
 read extensively but am not seeing a solution to any of these issues.
 
 I have this line:
 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set.  User
 classes may not be found. See Job or Job#setJar(String).
 My driver code is:
 private void doParallelConcordance() throws Exception {
 
 Path inDir = new Path(input_sequences/10_sequences.txt);
 Path outDir = new Path(demo_output);
 
 Job job = Job.getInstance(new Configuration());
 job.setJarByClass(ParallelGeneticAlignment.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 
 job.setInputFormatClass(CustomFileInputFormat.class);
 job.setMapperClass(ConcordanceMapper.class);
 job.setPartitionerClass(ConcordanceSequencePartitioner.class);
 job.setReducerClass(ConcordanceReducer.class);
 
 FileInputFormat.addInputPath(job

Re: Partitioning and setup errors

2014-06-28 Thread Chris MacKenzie
HI Chris,

I¹m away from my books for the weekend. Is that call (extends Configurable
implements Tool) a Hadoop 2 call. Would that mean that I am better sticking
with Hadoop 1.x ?
Warm regards.
Chris
 http://www.chrismackenziephotography.co.uk/
Expert in all aspects of photography
telephone: 0131 332 6967 tel:0131 332 6967

email: stu...@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
http://www.chrismackenziephotography.co.uk/
 http://plus.google.com/+ChrismackenziephotographyCoUk/posts
http://twitter.com/#!/MacKenzieStudio
http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250
http://www.linkedin.com/in/chrismackenziephotography/
http://pinterest.com/ChrisMacKenzieP/

From:  Chris Mawata chris.maw...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Friday, 27 June 2014 23:46
To:  user@hadoop.apache.org
Subject:  Re: Partitioning and setup errors


Probably my fault. I was looking for the
extends Configurable implements Tool
part. I will double check when I get home rather than send you on a wild
goose chase.
Cheers
Chris

On Jun 27, 2014 8:16 AM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:
 Hi,
 
 I realise my previous question may have been a bit naïve and I also realise I
 am asking an awful lot here, any advice would be greatly appreciated.
 * I have been using Hadoop 2.4 in local mode and am sticking to the
 mapreduce.* side of the track.
 * I am using a Custom Line reader to read each sequence into a Map
 * I have a partitioner class which is testing the key from the map class.
 * I've tried debugging in eclipse with a breakpoint in the partitioner class
 but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks)
 is not being called.
 Could there be any reason for that ?
 
 Because my map and reduce code works in local mode within eclipse, I wondered
 if I may get the partitioner to work if  I changed to Pseudo Distributed Mode
 exporting a runnable jar from Eclipse (Kepler)
 
 I have several faults On my own computer  Pseudo Distributed Mode and the
 university clusters Pseudo Distributed Mode which I set up. I¹ve googled and
 read extensively but am not seeing a solution to any of these issues.
 
 I have this line:
 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set.  User
 classes may not be found. See Job or Job#setJar(String).
 My driver code is:
 private void doParallelConcordance() throws Exception {
 
 Path inDir = new Path(input_sequences/10_sequences.txt);
 
 Path outDir = new Path(demo_output);
 
 
 
 Job job = Job.getInstance(new Configuration());
 
 job.setJarByClass(ParallelGeneticAlignment.class);
 
 job.setOutputKeyClass(Text.class);
 
 job.setOutputValueClass(IntWritable.class);
 
 
 
 job.setInputFormatClass(CustomFileInputFormat.class);
 
 job.setMapperClass(ConcordanceMapper.class);
 
 job.setPartitionerClass(ConcordanceSequencePartitioner.class);
 
 job.setReducerClass(ConcordanceReducer.class);
 
 
 
 FileInputFormat.addInputPath(job, inDir);
 
 FileOutputFormat.setOutputPath(job, outDir);
 
 
 
 job.waitForCompletion(true)
 
 }
 
 
 On the university server I am getting this error:
 4/06/27 11:45:40 INFO mapreduce.Job: Task Id :
 attempt_1403860966764_0003_m_00_0, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
 par.gene.align.concordance.ConcordanceMapper not found
 
 On my machine the error is:
 4/06/27 12:58:03 INFO mapreduce.Job: Task Id :
 attempt_1403864060032_0004_r_00_2, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
 par.gene.align.concordance.ConcordanceReducer not found
 
 On the university server I get total paths to process:
 14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process : 1
 14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1
 
 On my machine I get total paths to process:
 14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process : 0
 14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0
 
 Being new to this community, I thought it polite to introduce myself. I¹m
 planning to return to software development via an MSc at Heriot Watt
 University in Edinburgh. My MSc project is based on Fosters Genetic Sequence
 Alignment. I have written a sequential version my goal is now to port it to
 Hadoop.
 
 Thanks in advance,
 Regards,
 
 Chris MacKenzie




Re: Partitioning and setup errors

2014-06-27 Thread Chris MacKenzie
HI Chris,

Thanks for your response. I deeply appreciate it.

I don¹t know what you mean by that question.  I use configuration:
* In the driver  Job job = Job.getInstance(new Configuration());
* In the CustomLineRecordReader Configuration job =
context.getConfiguration();
One of the biggest issues I have had is staying true to the mapreduce.*
format

Best wishes,

Chris MacKenzie

From:  Chris Mawata chris.maw...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Friday, 27 June 2014 14:11
To:  user@hadoop.apache.org
Subject:  Re: Partitioning and setup errors


The new Configuration() is suspicious. Are you setting configuration
information manually?
Chris

On Jun 27, 2014 5:16 AM, Chris MacKenzie
stu...@chrismackenziephotography.co.uk wrote:
 Hi,
 
 I realise my previous question may have been a bit naïve and I also realise I
 am asking an awful lot here, any advice would be greatly appreciated.
 * I have been using Hadoop 2.4 in local mode and am sticking to the
 mapreduce.* side of the track.
 * I am using a Custom Line reader to read each sequence into a Map
 * I have a partitioner class which is testing the key from the map class.
 * I've tried debugging in eclipse with a breakpoint in the partitioner class
 but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks)
 is not being called.
 Could there be any reason for that ?
 
 Because my map and reduce code works in local mode within eclipse, I wondered
 if I may get the partitioner to work if  I changed to Pseudo Distributed Mode
 exporting a runnable jar from Eclipse (Kepler)
 
 I have several faults On my own computer  Pseudo Distributed Mode and the
 university clusters Pseudo Distributed Mode which I set up. I¹ve googled and
 read extensively but am not seeing a solution to any of these issues.
 
 I have this line:
 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set.  User
 classes may not be found. See Job or Job#setJar(String).
 My driver code is:
 private void doParallelConcordance() throws Exception {
 
 Path inDir = new Path(input_sequences/10_sequences.txt);
 
 Path outDir = new Path(demo_output);
 
 
 
 Job job = Job.getInstance(new Configuration());
 
 job.setJarByClass(ParallelGeneticAlignment.class);
 
 job.setOutputKeyClass(Text.class);
 
 job.setOutputValueClass(IntWritable.class);
 
 
 
 job.setInputFormatClass(CustomFileInputFormat.class);
 
 job.setMapperClass(ConcordanceMapper.class);
 
 job.setPartitionerClass(ConcordanceSequencePartitioner.class);
 
 job.setReducerClass(ConcordanceReducer.class);
 
 
 
 FileInputFormat.addInputPath(job, inDir);
 
 FileOutputFormat.setOutputPath(job, outDir);
 
 
 
 job.waitForCompletion(true)
 
 }
 
 
 On the university server I am getting this error:
 4/06/27 11:45:40 INFO mapreduce.Job: Task Id :
 attempt_1403860966764_0003_m_00_0, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
 par.gene.align.concordance.ConcordanceMapper not found
 
 On my machine the error is:
 4/06/27 12:58:03 INFO mapreduce.Job: Task Id :
 attempt_1403864060032_0004_r_00_2, Status : FAILED
 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
 par.gene.align.concordance.ConcordanceReducer not found
 
 On the university server I get total paths to process:
 14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process : 1
 14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1
 
 On my machine I get total paths to process:
 14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process : 0
 14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0
 
 Being new to this community, I thought it polite to introduce myself. I¹m
 planning to return to software development via an MSc at Heriot Watt
 University in Edinburgh. My MSc project is based on Fosters Genetic Sequence
 Alignment. I have written a sequential version my goal is now to port it to
 Hadoop.
 
 Thanks in advance,
 Regards,
 
 Chris MacKenzie




Splitting map and reduce

2014-06-25 Thread Chris MacKenzie
Hi,

This is my first mail to this user group. I hope that the email is well
formed and enables me to learn a great deal about Hadoop.

I have to carry out sequence alignment using Hadoop with the aid of a
critical subsequence. A potential critical subsequence is derived from the
longest unique subsequence in a sequence. A valid critical subsequence must
exist in one or more sequences.

I have run a concordance with a line splitter and have able to get Map
input records=10² Which leads me to believe I can assess each sequence
independently.

I was hoping to get 10 reduce outputs but I only got one. Is there a way to
split the reduce output in the same way that I can split the map input.

Thanks in advance, 
Regards,

Chris MacKenzie