Build hadoop from src and running it?

2013-11-05 Thread Karim Awara
Hi,

I checked out hadoop common project 2.2.0 from the svn and I built it using
maven. Now, I don't find any conf folder or any cofiuration files to set to
start running hadoop and using its hdfs? How can I do so?

--
Best Regards,
Karim Ahmed Awara

-- 

--
This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.


which is better : form reducer or Driver

2013-11-05 Thread unmesha sreeveni
I am emiting 'A' value and 'B' value from reducer.

I need to do further calculations also.
which is a better way?
1. Do all remamining computations within reducer , after emiting
or
2.Do remaining computation In driver: read A and B value from part file and
do further computations.

Pls suggest a better way.

-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

*Amrita Center For Cyber Security*


* Amritapuri.www.amrita.edu/cyber/ http://www.amrita.edu/cyber/*


Re: C++ example for hadoop-2.2.0

2013-11-05 Thread Salman Toor
Hi,

I have solved the problem! Have downloaded the source, compiled it on the 
machine and now can successfully link to the library. 

Thank you everyone for you quick responses.

Regards..
Salman. 



Salman Toor, PhD
salman.t...@it.uu.se



On Nov 4, 2013, at 3:21 PM, Andre Kelpe wrote:

 No, because I was trying to set up a cluster automatically with the
 tarballs from apache.org.
 
 - André
 
 On Mon, Nov 4, 2013 at 3:05 PM, Salman Toor salman.t...@it.uu.se wrote:
 Hi,
 
 Did you tried to compile with source?
 
 /Salman.
 
 
 Salman Toor, PhD
 salman.t...@it.uu.se
 
 
 
 On Nov 4, 2013, at 2:55 PM, Andre Kelpe wrote:
 
 I reported the 32bit/64bit problem a few weeks ago. There hasn't been
 much activity around it though:
 https://issues.apache.org/jira/browse/HADOOP-9911
 
 - André
 
 On Mon, Nov 4, 2013 at 2:20 PM, Salman Toor salman.t...@it.uu.se wrote:
 
 Hi,
 
 
 Ok so 2.x is not a new version its another branch. Good to know! Actually
 
 32bit will be difficult as the code I got have already some dependencies on
 
 64 bit.
 
 
 Otherwise I will continue with 1.x version. Can you suggest some version
 
 with 1.x series which is stable and work on the cluster environment?
 
 especially with C++ ...
 
 
 Regards..
 
 Salman.
 
 
 Salman Toor, PhD
 
 salman.t...@it.uu.se
 
 
 
 
 On Nov 4, 2013, at 1:54 PM, Amr Shahin wrote:
 
 
 Well, the 2 series isn't exactly the next version. It's a continuation of
 
 branch .2.
 
 Also, the error message from the gcc indicates that the library you're
 
 trying to link to isn't compatible which made me suspicious. check the
 
 documentation to see if hadoop has 64 libraries, or otherwise compile
 
 against the 32 ones
 
 
 
 On Mon, Nov 4, 2013 at 4:51 PM, Salman Toor salman.t...@it.uu.se wrote:
 
 
 Hi,
 
 
 Thanks for your answer!
 
 
 But are you sure about it? Actually Hadoop version 1.2 have both 32 and 64
 
 bit libraries so I believe the the next version should have both... But I am
 
 not sure just a random guess :-(
 
 
 Regards..
 
 Salman.
 
 
 Salman Toor, PhD
 
 salman.t...@it.uu.se
 
 
 
 
 On Nov 4, 2013, at 1:38 PM, Amr Shahin wrote:
 
 
 I believe hadoop isn't compatible with 64 architecture. Try installing the
 
 32 libraries and compile against them.
 
 This error (skipping incompatible
 
 /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
 
 -lhadooppipes) indicates so
 
 
 
 On Mon, Nov 4, 2013 at 2:44 PM, Salman Toor salman.t...@it.uu.se wrote:
 
 
 Hi,
 
 
 Can someone give a pointer?
 
 
 
 Thanks in advance.
 
 
 Regards..
 
 Salman.
 
 
 
 Salman Toor, PhD
 
 salman.t...@it.uu.se
 
 
 
 
 On Nov 3, 2013, at 11:31 PM, Salman Toor wrote:
 
 
 Hi,
 
 
 I am quite new to the Hadoop world, previously was running hadoop-1.2.0
 
 stable version on my small cluster and encountered some strange problems
 
 like the local path to the mapper  file didn't copy to the hdfs  It
 
 works fine on the single node setup but on multiple node simple word-count
 
 python example didn't work...  I read on the blog that it might be the
 
 problem in the version I am using. So I thought to change the version and
 
 downloaded the Hadoop 2.2.0. This version has yarn together with many new
 
 features that I hope I will learn in the future. Now simple wordcount
 
 example works without any problem on the multi-node setup. I am using simple
 
 python example.
 
 
 Now I would like to compile my C++ code. Since the directory structure
 
 together with other things have been changed. I have started to get the
 
 following error:
 
 
 
 
 /urs/bin/ld:  skipping incompatible
 
 /home/sztoor/hadoop-2.2.0/lib/native/libhadooputils.a when searching
 
 -lhadooputils
 
 cannot find -lhadooputils
 
 
 /urs/bin/ld:  skipping incompatible
 
 /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
 
 -lhadooppipes
 
 cannot find -lhadooppipes
 
 --
 
 
 I have managed to run the c++ example successfully with the 1.2.0 version
 
 on single node setup.
 
 
 I am having 64bit Ubuntu machine. previously I was using Linux-amd64-64
 
 
 Now in new version, lib and include directories are in the
 
 hadoop-2.2.0 directory. No build.xml is available...
 
 
 Can someone please give me an example of a makefile based on the version
 
 2.2.0? Or suggest me which version I should go for? Or if there are some
 
 prerequisites that I should do before compiling my code?
 
 
 Thanks in advance.
 
 
 Regards..
 
 Salman.
 
 
 
 
 Salman Toor, PhD
 
 salman.t...@it.uu.se
 
 
 
 
 
 
 
 
 
 
 
 
 --
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com
 
 
 
 
 
 -- 
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com



Re: which is better : form reducer or Driver

2013-11-05 Thread Chris Mawata
If you have multiple reducers you are doing it in parallel while in the 
driver it is surely single threaded so my bet would be on the reducers.


Chris

On 11/5/2013 6:15 AM, unmesha sreeveni wrote:

I am emiting 'A' value and 'B' value from reducer.

I need to do further calculations also.
which is a better way?
1. Do all remamining computations within reducer , after emiting
or
2.Do remaining computation In driver: read A and B value from part 
file and do further computations.


Pls suggest a better way.

--
/Thanks  Regards/
/
/
Unmesha Sreeveni U.B/
/
/Junior Developer
/
/Amrita Center For Cyber Security
/
/
Amritapuri.

www.amrita.edu/cyber/ http://www.amrita.edu/cyber/
/




Re: Error while running Hadoop Source Code

2013-11-05 Thread Basu,Indrashish


Hi,

Can anyone kindly assist on this ?

Regards,
Indrashish


On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote:

Hi All,

Any update on the below post ?

I came across some old post regarding the same issue. It explains the
solution as  The *nopipe* example needs more documentation.  It
assumes that it is run with the InputFormat from
src/test/org/apache/*hadoop*/mapred/*pipes*/
*WordCountInputFormat*.java, which has a very specific input split
format. By running with a TextInputFormat, it will send binary bytes
as the input split and won't work right. The *nopipe* example should
probably be recoded *to* use libhdfs *too*, but that is more
complicated *to* get running as a unit test. Also note that since the
C++ example is using local file reads, it will only work on a cluster
if you have nfs or something working across the cluster. 

I would need some more light on the above explanation , so if anyone
could elaborate a bit about the same as what needs to be done 
exactly.

To mention, I am trying to run a sample KMeans algorithm on a GPU
using Hadoop.

Thanks in advance.

Regards,
Indrashish.

On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote:

Hi,

I am trying to run a sample Hadoop GPU source code (kmeans 
algorithm)

on an ARM processor and getting the below error. Can anyone please
throw some light on this ?

rmr: cannot remove output: No such file or directory.
13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to
process : 1
13/10/31 13:43:13 INFO mapred.JobClient: Running job: 
job_201310311320_0001

13/10/31 13:43:14 INFO mapred.JobClient:  map 0% reduce 0%
13/10/31 13:43:39 INFO mapred.JobClient: Task Id :
attempt_201310311320_0001_m_00_0, Status : FAILED
java.io.IOException: pipe child exception
at 
org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191)

at

org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103)
at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at 
java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

at java.io.DataOutputStream.write(DataOutputStream.java:107)
at

org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
at

org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286)
at

org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92)
... 3 more

attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec

'/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D'
'0'   /dev/null  1

/usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout
2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/

Regards,


--
Indrashish Basu
Graduate Student
Department of Electrical and Computer Engineering
University of Florida


only one map or reduce job per time on one node

2013-11-05 Thread John
hi,

I have a cluster of 7 nodes. Every node has 2 maps-lots and 1 reduce slot.
Is it possible to force the jobtracker executing only 2 map jobs or 1
reduce job per time? I have found this configuration option:
mapred.reduce.slowstart.completed.maps. I think this will do exactly what
I want If I set it to 1.0 if there is only one mapreduce job per time. But
whenn there are 2 jobs I think it doesn't work because if the second job is
finished with the map part it will maybe execute a reduce job on node where
still the first job is running. Or am I wrong? Is there a way to allow
executing reduce jobs only if there is no map reduce job on this node?

kind regards


Re: only one map or reduce job per time on one node

2013-11-05 Thread Vinod Kumar Vavilapalli
Why do you want to do this?

+Vinod

On Nov 5, 2013, at 9:17 AM, John wrote:

 Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce 
 job per time?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Error while running Hadoop Source Code

2013-11-05 Thread Vinod Kumar Vavilapalli
It seems like your pipes mapper is exiting before consuming all the input. Did 
you check the task-logs on the web UI?

Thanks,
+Vinod

On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote:

 
 Hi,
 
 Can anyone kindly assist on this ?
 
 Regards,
 Indrashish
 
 
 On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote:
 Hi All,
 
 Any update on the below post ?
 
 I came across some old post regarding the same issue. It explains the
 solution as  The *nopipe* example needs more documentation.  It
 assumes that it is run with the InputFormat from
 src/test/org/apache/*hadoop*/mapred/*pipes*/
 *WordCountInputFormat*.java, which has a very specific input split
 format. By running with a TextInputFormat, it will send binary bytes
 as the input split and won't work right. The *nopipe* example should
 probably be recoded *to* use libhdfs *too*, but that is more
 complicated *to* get running as a unit test. Also note that since the
 C++ example is using local file reads, it will only work on a cluster
 if you have nfs or something working across the cluster. 
 
 I would need some more light on the above explanation , so if anyone
 could elaborate a bit about the same as what needs to be done exactly.
 To mention, I am trying to run a sample KMeans algorithm on a GPU
 using Hadoop.
 
 Thanks in advance.
 
 Regards,
 Indrashish.
 
 On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote:
 Hi,
 
 I am trying to run a sample Hadoop GPU source code (kmeans algorithm)
 on an ARM processor and getting the below error. Can anyone please
 throw some light on this ?
 
 rmr: cannot remove output: No such file or directory.
 13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set.  User
 classes may not be found. See JobConf(Class) or
 JobConf#setJar(String).
 13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 13/10/31 13:43:13 INFO mapred.JobClient: Running job: job_201310311320_0001
 13/10/31 13:43:14 INFO mapred.JobClient:  map 0% reduce 0%
 13/10/31 13:43:39 INFO mapred.JobClient: Task Id :
 attempt_201310311320_0001_m_00_0, Status : FAILED
 java.io.IOException: pipe child exception
at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333)
at
 org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286)
at
 org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92)
... 3 more
 
 attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec
 '/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D'
 '0'   /dev/null  1
 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout
 2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/
 
 Regards,
 
 -- 
 Indrashish Basu
 Graduate Student
 Department of Electrical and Computer Engineering
 University of Florida


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


HDP 2.0 GA?

2013-11-05 Thread John Lilley
I noticed that HDP 2.0 is available for download here:
http://hortonworks.com/products/hdp-2/?b=1#install
Is this the final GA version that tracks Apache Hadoop 2.2?
Sorry I am just a little confused by the different numbering schemes.
Thanks
John



RE: HDP 2.0 GA?

2013-11-05 Thread Jim Falgout
HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2.



From: John Lilley john.lil...@redpoint.net
Sent: Tuesday, November 05, 2013 12:34 PM
To: user@hadoop.apache.org
Subject: HDP 2.0 GA?

I noticed that HDP 2.0 is available for download here:
http://hortonworks.com/products/hdp-2/?b=1#install
Is this the final GA version that tracks Apache Hadoop 2.2?
Sorry I am just a little confused by the different numbering schemes.
Thanks
John





Re: only one map or reduce job per time on one node

2013-11-05 Thread John
Because my node swaps the memory if the 2 map slots + 1 reduce is occupied
with my job. Sure I can minimize the max memory for the map/reduce process.
I tried this already, but I got a out of memory exception if set the max
heap size for the map/reduce process to low for my mr job.

kind regards


2013/11/5 Vinod Kumar Vavilapalli vino...@hortonworks.com

 Why do you want to do this?

 +Vinod

 On Nov 5, 2013, at 9:17 AM, John wrote:

 Is it possible to force the jobtracker executing only 2 map jobs or 1
 reduce job per time?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: HDP 2.0 GA?

2013-11-05 Thread Suresh Srinivas
Please send the questions related to a vendor specific distro to vendor
mailing list. In this case - http://hortonworks.com/community/forums/.


On Tue, Nov 5, 2013 at 10:49 AM, Jim Falgout jim.falg...@actian.com wrote:

  HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2.


  --
 *From:* John Lilley john.lil...@redpoint.net
 *Sent:* Tuesday, November 05, 2013 12:34 PM
 *To:* user@hadoop.apache.org
 *Subject:* HDP 2.0 GA?


 I noticed that HDP 2.0 is available for download here:

 http://hortonworks.com/products/hdp-2/?b=1#install

 Is this the final “GA” version that tracks Apache Hadoop 2.2?

 Sorry I am just a little confused by the different numbering schemes.

 *Thanks*

 *John*






-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


AUTO: Jose Luis Mujeriego Gomez is out of the office. (returning 06/11/2013)

2013-11-05 Thread Jose Luis Mujeriego Gomez1

I am out of the office until 06/11/2013.

I will be out of the office with limited email access. I will check email
regularly but no guarantees I will be able to answer you back promptly.

For any urgent matter please contact Dittmar Haegele
(dittmar.haeg...@de.ibm.com) or Tadhg Murphy (murp...@ie.ibm.com)
I will respond to your message when I return.


Note: This is an automated response to your message  Re: Build hadoop from
src and running it? sent on 05/11/2013 17:32:06.

This is the only notification you will receive while this person is away.



what different between infoserver and streaming server in datanode?

2013-11-05 Thread ch huang
hi,all:
   i am reading the source code about datanode starting,when datanode
start ,it will start streaming server and info server,i do not know what
difference between the two server


Re: which is better : form reducer or Driver

2013-11-05 Thread unmesha sreeveni
i am dealing with multiple mappers and 1 reducer. so which is best?


On Tue, Nov 5, 2013 at 6:28 PM, Chris Mawata chris.maw...@gmail.com wrote:

  If you have multiple reducers you are doing it in parallel while in the
 driver it is surely single threaded so my bet would be on the reducers.

 Chris


 On 11/5/2013 6:15 AM, unmesha sreeveni wrote:

  I am emiting 'A' value and 'B' value from reducer.

  I need to do further calculations also.
 which is a better way?
  1. Do all remamining computations within reducer , after emiting
 or
 2.Do remaining computation In driver: read A and B value from part file
 and do further computations.

  Pls suggest a better way.

  --
 *Thanks  Regards*

  Unmesha Sreeveni U.B

 *Junior Developer *

 *Amrita Center For Cyber Security *


 * Amritapuri. www.amrita.edu/cyber/ http://www.amrita.edu/cyber/ *





-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

*Amrita Center For Cyber Security*


* Amritapuri.www.amrita.edu/cyber/ http://www.amrita.edu/cyber/*


how to read the xml tags data

2013-11-05 Thread mallik arjun
hi all,

how to write the mapreduce code to read the xml files tag info.


thanks,
mallik.


Re: how to read the xml tags data

2013-11-05 Thread murali adireddy
Hi Mallik Arjun,

you can use XmlInputFormat class which is provided by Apache mohut and
not provided by Hadoop.

Here is the link for the code.

https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java

And here is another link explained about using of xml input file.

http://xmlandhadoop.blogspot.com/




Thanks,
Murali AdiReddy


On Wed, Nov 6, 2013 at 12:38 PM, mallik arjun mallik.cl...@gmail.comwrote:

 hi all,

 how to write the mapreduce code to read the xml files tag info.


 thanks,
 mallik.