RE: Hadoop Case Studies?

2011-02-28 Thread Evert Lammerts
Hi Ted,

For what it's worth, here's a short article listing some of the cases that we 
(SARA, Dutch center for HPC) are supporting on our cluster at the moment:

http://blog.bottledbits.com/2011/01/sara-hadoop-pilot-project-use-cases-on-hadoop-in-the-dutch-center-for-hpc/

Cheers,

Evert Lammerts
Consultant eScience  Cloud Services
SARA Computing  Network Services
Operations, Support  Development

Phone: +31 20 888 4101
Email: evert.lamme...@sara.nl
http://www.sara.nl


 -Original Message-
 From: Ted Dunning [mailto:tdunn...@maprtech.com]
 Sent: maandag 28 februari 2011 5:39
 To: common-user@hadoop.apache.org; tpede...@d.umn.edu
 Subject: Re: Hadoop Case Studies?

 At any large company that makes heavy use of Hadoop, you aren't going
 to
 find any concise description of all the ways that hadoop is used.

 That said, here is a concise description of some of the ways that
 hadoop is
 (was) used at Yahoo:

 http://www.slideshare.net/ydn/hadoop-yahoo-internet-scale-data-
 processing

 On Sun, Feb 27, 2011 at 7:31 PM, Ted Pedersen tpede...@d.umn.edu
 wrote:

  Thanks for all these great ideas. These are really very helpful.
 
  What I'm also hoping to find are articles or papers that describe
 what
  particular companies or organizations have done with Hadoop. How does
  Facebook use Hadoop for example (that's one of the case studies in
 the
  White book), or how does last.fm use Hadoop (another of the case
  studies in the White book).
 
  One interesting resource is the list of powered by Hadoop projects
  available here:
 
  http://wiki.apache.org/hadoop/PoweredBy
 
  Some of these entries provide links to more detailed discussions of
  what an organization is doing, as in the following from Twitter
  http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-
 east-2009
 
  So any additional descriptions of what specific organizations are
  doing with Hadoop (to the extent they are willing to share) would be
  really helpful (these sorts of real world cases tend to be
  particularly motivating).
 
  Cordially,
  Ted
 
  On Sun, Feb 27, 2011 at 9:23 PM, Simon gsmst...@gmail.com wrote:
   I think you can also simulate PageRank Algorithm with hadoop.
  
   Simon -
  
   On Sun, Feb 27, 2011 at 9:20 PM, Lance Norskog goks...@gmail.com
  wrote:
  
   This is an exercise that will appeal to undergrads: pull the
 Craiglist
   personals ads from several cities, and do text classification.
 Given a
   training set of all the cities, attempt to classify test ads by
 city.
   (If Peter Harrington is out there, I stole this from you.)
  
   Lance
  
   On Sun, Feb 27, 2011 at 4:55 PM, Ted Dunning
 tdunn...@maprtech.com
   wrote:
Ted,
   
Greetings back at you.  It has been a while.
   
Check out Jimmy Lin and Chris Dyer's book about text processing
 with
hadoop:
   
http://www.umiacs.umd.edu/~jimmylin/book.html
   
   
On Sun, Feb 27, 2011 at 4:34 PM, Ted Pedersen
 tpede...@d.umn.edu
   wrote:
   
Greetings all,
   
I'm teaching an undergraduate Computer Science class that is
 using
Hadoop quite heavily, and would like to include some case
 studies at
various points during this semester.
   
We are using Tom White's Hadoop The Definitive Guide as a
 text, and
that includes a very nice chapter of case studies which might
 even
provide enough material for my purposes.
   
But, I wanted to check and see if there were other case studies
 out
there that might provide motivating and interesting examples of
 how
Hadoop is currently being used. The idea is to find material
 that
  goes
beyond simply saying X uses Hadoop to explaining in more
 detail how
and why X are using Hadoop.
   
Any hints would be very gratefully received.
   
Cordially,
Ted
   
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
   
   
  
  
  
   --
   Lance Norskog
   goks...@gmail.com
  
  
  
  
   --
   Regards,
   Simon
  
 
 
 
  --
  Ted Pedersen
  http://www.d.umn.edu/~tpederse
 


Re: Library Issue

2011-02-28 Thread Adarsh Sharma

Harsh J wrote:

You're facing a permissions issue with a device, not a Hadoop-related
issue. Find a way to let users access the required devices
(/dev/nvidiactl is what's reported in your ST, for starters).

On Mon, Feb 28, 2011 at 12:05 PM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
 

Greetings to all,

Today i came across a strange problem about non-root users in Linux 
( CentOS

).

I am able to compile  run a Java Program through below commands 
properly :


[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But I need to run it through other user [B]hadoop[/B]  in CentOS

[hadoop@ws37-mah-lin hadoop-0.20.2]$ javac EnumDevices.java
[hadoop@ws37-mah-lin hadoop-0.20.2]$ java EnumDevices
NVIDIA: could not open the device file /dev/nvidiactl (Permission 
denied).

Exception in thread main CUDA Driver error: 100
  at jcuda.CUDA.setError(CUDA.java:1874)
  at jcuda.CUDA.init(CUDA.java:62)
  at jcuda.CUDA.init(CUDA.java:42)
  at EnumDevices.main(EnumDevices.java:20)


*I settled the above issue by setting permission to /dev/nvidia files to 
Hadoop user  group.

Now, I am able to compile programs from command

[hadoop@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[hadoop@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512

But Still don't know why it fails in Map-reduce job.

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar 
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
11/02/28 15:01:45 INFO input.FileInputFormat: Total input paths to 
process : 3

11/02/28 15:01:45 INFO mapred.JobClient: Running job: job_201102281104_0006
11/02/28 15:01:46 INFO mapred.JobClient:  map 0% reduce 0%
11/02/28 15:01:56 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_00_0, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/28 15:01:56 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_01_0, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)
   at org.myorg.WordCount$TokenizerMapper.init(WordCount.java:28)
   ... 8 more

11/02/28 15:02:05 INFO mapred.JobClient: Task Id : 
attempt_201102281104_0006_m_02_1, Status : FAILED


Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma

Dear all,

I want to set some extra jars in java.library.path , used while running 
map-reduce program in Hadoop Cluster.


I got a exception entitled no jcuda in java.library.path in each map task.

I run my map-reduce code by below commands :

javac -classpath 
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/hadoop-0.20.2/lib/commons-cli-1.2.jar 
-d wordcount_classes1/ WordCount.java


jar -cvf wordcount1.jar -C wordcount_classes1/ .

bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg 
/user/hadoop/output1



Please guide how to achieve this.



Thanks  best Regards,

Adarsh Sharma


Fwd: TeraSort bug?

2011-02-28 Thread David Saile
Sorry list, please ignore the previous mail!

I really have to apologize for this!


Anfang der weitergeleiteten E-Mail:

 Von: David Saile da...@uni-koblenz.de
 Datum: 28. Februar 2011 11:28:20 MEZ
 An: David Saile da...@uni-koblenz.de
 Betreff: Re: TeraSort bug?
 
 Hallo Ralf,
 
 Leider habe ich noch keine Antwort von der Mailinglist. Hättest du heute Zeit 
 für eine kurze Lagebesprechung?
 
 Ich bin ab ca 14:15 an der Uni. Wir können uns aber auch erst wieder um 16h 
 treffe, wenn dir das besser passt.
 
 Grüße
 David
 
   
 Am 27.02.2011 um 23:33 schrieb David Saile:
 
 Hallo Ralf,
 
 Das Cluster läuft einigermaßen stabil, aber ich habe mich die letzten 2-3 
 Tage mit dem Problem rumgeschlagen, dass bei TeraSortDelta fast alle Tuple  
 dem selben Reducer  zugewiesen werden (hatte ich Donnerstag kurz erläutert).
 Ich denke das Problem ist jedoch Teil von TeraSort oder Hadoop, da ich das 
 Problem auch bei einem simplen Copy-Job reproduzieren konnte. 
 
 Ich habe die unten angehängte Email gerade an die Hadoop-Mailinglist 
 geschickt. Ich hoffe dass ich da bis morgen Antwort bekomme. 
 
 Wie soll ich bis dahin weiter verfahren? Soll ich versuchen eine Art 
 Pipeline-Job zu implementieren?
 
 Viele Grüße,
 David
 
 
 Anfang der weitergeleiteten E-Mail:
 
 Von: David Saile da...@uni-koblenz.de
 Datum: 27. Februar 2011 23:27:16 MEZ
 An: common-user@hadoop.apache.org
 Betreff: TeraSort bug?
 Antwort an: common-user@hadoop.apache.org
 
 Hi,
 
 I have a problem concerning the TeraSort benchmark.
 I am running the version that ships with hadoop-0.21.0 and if I use it as 
 described (i.e. TeraGen -TeraSort - TeraValidate), everything works fine.
 
 However, for some tests I need to run, I added a simple job between TeraGen 
 and TeraSort that does nothing but copy the input. I included its code 
 below. 
 
 If I run this Copy-job after TeraGen, TeraSort will partition the input in 
 a way, that most tuples will go to the last reducer. 
 For example if I run TeraSort with 500MB input, and 20 Reducers I get the 
 following distribution:
 -Reducers 0-18 process ~10.000 tuples each
 -Reducer 19 processes ~5.000.000 tuples 
 
 Can anyone reproduce this behavior? I would really appreciated any help!
 
 David
 
 
 public class Copy extends Configured implements Tool {
 
public int run(String[] args) throws IOException, InterruptedException, 
 ClassNotFoundException {
 Job job = Job.getInstance(new Cluster(getConf()), getConf());
 
 Path inputDirOld = new Path(args[0]);
 TeraInputFormat.addInputPath(job, inputDirOld);
 job.setInputFormatClass(TeraInputFormat.class);
 
 job.setJobName(Copy);
 job.setJarByClass(Void.class);
 job.setMapOutputKeyClass(Text.class);
 job.setMapOutputValueClass(Text.class);
 
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 job.setOutputFormatClass(TeraOutputFormat.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(Text.class);
 
 return job.waitForCompletion(true) ? 0 : 1;
 
}
 
 public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration(), new Void(), args);
 System.exit(res);
 }
 }
 
 



why quick sort when spill map output?

2011-02-28 Thread elton sky
Hello forumers,

Before spill the data in kvbuffer to local disk in map task, k/v are
sorted using quick sort. The complexity of quick sort is O(nlogn) and
worst case is O(n^2).
Why using quick sort?

Regards


Missing files in the trunk ??

2011-02-28 Thread bharath vissapragada
Hi all,

I checked out the map-reduce trunk a few days back  and following
files are missing..

import org.apache.hadoop.mapreduce.jobhistory.Events;
import org.apache.hadoop.mapreduce.jobhistory.JhCounter;
import org.apache.hadoop.mapreduce.jobhistory.JhCounterGroup;
import org.apache.hadoop.mapreduce.jobhistory.JhCounters;

ant jar works  well but eclipse finds these files missing in the
corresponding packages ..

I browsed the trunk online but couldn't trace these files..

Any help is highly appreciated :)

-- 
Regards,
Bharath .V
w:http://research.iiit.ac.in/~bharath.v


RE: Setting java.library.path for map-reduce job

2011-02-28 Thread Kaluskar, Sanjay
You will probably have to use distcache to distribute your jar to all
the nodes too. Read the distcache documentation; Then on each node you
can add the new jar to the java.library.path through
mapred.child.java.opts.

You need to do something like the following in mapred-site.xml, where
fs-uri is the URI of the file system (something like
host.mycompany.com:54310).

property
  namemapred.cache.files/name
  valuehdfs://fs-uri/jcuda/jcuda.jar#jcuda.jar /value
/property
property
  namemapred.create.symlink/name
  valueyes/value
/property
property
  namemapred.child.java.opts/name
  value-Djava.library.path=jcuda.jar/value
/property


-Original Message-
From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] 
Sent: 28 February 2011 16:03
To: common-user@hadoop.apache.org
Subject: Setting java.library.path for map-reduce job

Dear all,

I want to set some extra jars in java.library.path , used while running
map-reduce program in Hadoop Cluster.

I got a exception entitled no jcuda in java.library.path in each map
task.

I run my map-reduce code by below commands :

javac -classpath
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/
project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/h
adoop-0.20.2/lib/commons-cli-1.2.jar
-d wordcount_classes1/ WordCount.java

jar -cvf wordcount1.jar -C wordcount_classes1/ .

bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg
/user/hadoop/output1


Please guide how to achieve this.



Thanks  best Regards,

Adarsh Sharma


Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma

Thanks Sanjay, it seems i found the root cause.

But I result in following error:

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar 
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
Exception in specified URI's java.net.URISyntaxException: Illegal 
character in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar

   at java.net.URI$Parser.fail(URI.java:2809)
   at java.net.URI$Parser.checkChars(URI.java:2982)
   at java.net.URI$Parser.parseHierarchical(URI.java:3066)
   at java.net.URI$Parser.parse(URI.java:3014)
   at java.net.URI.init(URI.java:578)
   at 
org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
   at 
org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
   at 
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
   at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)

   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Exception in thread main java.lang.NullPointerException
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
   at 
org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
   at 
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
   at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)

   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Please check my attached mapred-site.xml


Thanks  best regards,

Adarsh Sharma


Kaluskar, Sanjay wrote:

You will probably have to use distcache to distribute your jar to all
the nodes too. Read the distcache documentation; Then on each node you
can add the new jar to the java.library.path through
mapred.child.java.opts.

You need to do something like the following in mapred-site.xml, where
fs-uri is the URI of the file system (something like
host.mycompany.com:54310).

property
  namemapred.cache.files/name
  valuehdfs://fs-uri/jcuda/jcuda.jar#jcuda.jar /value
/property
property
  namemapred.create.symlink/name
  valueyes/value
/property
property
  namemapred.child.java.opts/name
  value-Djava.library.path=jcuda.jar/value
/property


-Original Message-
From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] 
Sent: 28 February 2011 16:03

To: common-user@hadoop.apache.org
Subject: Setting java.library.path for map-reduce job

Dear all,

I want to set some extra jars in java.library.path , used while running
map-reduce program in Hadoop Cluster.

I got a exception entitled no jcuda in java.library.path in each map
task.

I run my map-reduce code by below commands :

javac -classpath
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/
project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/h
adoop-0.20.2/lib/commons-cli-1.2.jar
-d wordcount_classes1/ WordCount.java

jar -cvf wordcount1.jar -C wordcount_classes1/ .

bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg
/user/hadoop/output1


Please guide how to achieve this.



Thanks  best Regards,

Adarsh Sharma
  


?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
  namemapred.job.tracker/name
  value192.168.0.131:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property

property
  namemapred.local.dir/name
  value/hdd-1/mapred/local/value
  descriptionThe local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  /description
/property


property
  namemapred.system.dir/name
  value/home/mapred/system/value
  descriptionThe shared directory where MapReduce 

Re: why quick sort when spill map output?

2011-02-28 Thread James Seigel
Sorting out of the map phase is core to how hadoop works.  Are you asking why 
sort at all?  or why did someone use quick sort as opposed to _sort?

Cheers
James


On 2011-02-28, at 3:30 AM, elton sky wrote:

 Hello forumers,
 
 Before spill the data in kvbuffer to local disk in map task, k/v are
 sorted using quick sort. The complexity of quick sort is O(nlogn) and
 worst case is O(n^2).
 Why using quick sort?
 
 Regards



Re:Re: Problem with building hadoop 0.21

2011-02-28 Thread 朱韬
Hi.Simon:
   I modified some coed related to scheduler and designed a  customized 
scheduler .when I built the modified code, then the problems described above 
came up with it. I doubt whether there was something with my code, but after  I 
built the out-of-box code, the same problems still existed. Can you tell me how 
to build and deploy  a  customized hadoop?
 Thank you!

   zhutao
 




At 2011-02-28 11:21:16,Simon gsmst...@gmail.com wrote:

Hey,

Can you let us know why you want to replace all the jar files? That usually
does not work, especially for development code in the code base.
So, just use the one you have successfully compiled, don't replace jar
files.

Hope it can work.

Simon

2011/2/27 朱韬 ryanzhu...@163.com

 Hi,guys:
  I checked out the source code fromhttp://
 svn.apache.org/repos/asf/hadoop/mapreduce/trunk/. Then I compiled using
 this script:
  #!/bin/bash
 export JAVA_HOME=/usr/share/jdk1.6.0_14
 export CFLAGS=-m64
 export CXXFLAGS=-m64
 export ANT_HOME=/opt/apache-ant-1.8.2
 export PATH=$PATH:$ANT_HOME/bin
 ant -Dversion=0.21.0 -Dcompile.native=true
 -Dforrest.home=/home/hadoop/apache-forrest-0.9 clean tar
 It was Ok before these steps. Then I replaced
 hadoop-mapred-0.21.0.jar, hadoop-mapred-0.21.0-sources.jar,
  hadoop-mapred-examples-0.21.0.jar,hadoop-mapred-test-0.21.0.jar,and
 hadoop-mapred-tools-0.21.0.jar inRelease 0.21.0 with the compiled jar files
 from the above step. Also I added my scheduler to lib. When starting the
 customed hadoop, I encountered the problems as blow:
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/security/RefreshUserMappingsProtocol
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 10.61.0.6: starting tasktracker, logging to
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt0.hypercloud.ict.out
 10.61.0.143: starting tasktracker, logging to
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt1.hypercloud.ict.out
 10.61.0.7: starting tasktracker, logging to
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt2.hypercloud.ict.out
 10.61.0.6: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/io/SecureIOUtils$AlreadyExistsException
 10.61.0.6: Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException
 10.61.0.6:  at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 10.61.0.6:  at java.security.AccessController.doPrivileged(Native
 Method)
 10.61.0.6:  at
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 10.61.0.6:  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 10.61.0.6:  at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 10.61.0.6:  at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
 10.61.0.6:  at
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
 10.61.0.6: Could not find the main class:
 org.apache.hadoop.mapred.TaskTracker.  Program will exit.
 10.61.0.143: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/io/SecureIOUtils$AlreadyExistsException
 10.61.0.143: Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException
 10.61.0.143:at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 10.61.0.143:at java.security.AccessController.doPrivileged(Native
 Method)
 10.61.0.143:at
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 10.61.0.143:at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 10.61.0.143:at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 10.61.0.143:at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
 10.61.0.143:at
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
 10.61.0.143: Could not find the main class:
 org.apache.hadoop.mapred.TaskTracker.  Program will exit.
 10.61.0.7: Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/io/SecureIOUtils$AlreadyExistsException
 10.61.0.7: Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException
 10.61.0.7:  at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 10.61.0.7:  at 

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
Hi Adarsh,

I think your mapred.cache.files property has an extra space at the end. Try
removing that and let us know how it goes.
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Mon, Feb 28, 2011 at 5:06 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 Thanks Sanjay, it seems i found the root cause.

 But I result in following error:

 [hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
 org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
 Exception in specified URI's java.net.URISyntaxException: Illegal character
 in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar
   at java.net.URI$Parser.fail(URI.java:2809)
   at java.net.URI$Parser.checkChars(URI.java:2982)
   at java.net.URI$Parser.parseHierarchical(URI.java:3066)
   at java.net.URI$Parser.parse(URI.java:3014)
   at java.net.URI.init(URI.java:578)
   at
 org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
   at
 org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
   at
 org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
   at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 Exception in thread main java.lang.NullPointerException
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
   at
 org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
   at
 org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
   at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
   at org.myorg.WordCount.main(WordCount.java:59)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 Please check my attached mapred-site.xml


 Thanks  best regards,

 Adarsh Sharma



 Kaluskar, Sanjay wrote:

 You will probably have to use distcache to distribute your jar to all
 the nodes too. Read the distcache documentation; Then on each node you
 can add the new jar to the java.library.path through
 mapred.child.java.opts.

 You need to do something like the following in mapred-site.xml, where
 fs-uri is the URI of the file system (something like
 host.mycompany.com:54310).

 property
  namemapred.cache.files/name
  valuehdfs://fs-uri/jcuda/jcuda.jar#jcuda.jar /value
 /property
 property
  namemapred.create.symlink/name
  valueyes/value
 /property
 property
  namemapred.child.java.opts/name
  value-Djava.library.path=jcuda.jar/value
 /property


 -Original Message-
 From: Adarsh Sharma [mailto:adarsh.sha...@orkash.com] Sent: 28 February
 2011 16:03
 To: common-user@hadoop.apache.org
 Subject: Setting java.library.path for map-reduce job

 Dear all,

 I want to set some extra jars in java.library.path , used while running
 map-reduce program in Hadoop Cluster.

 I got a exception entitled no jcuda in java.library.path in each map
 task.

 I run my map-reduce code by below commands :

 javac -classpath
 /home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-core.jar://home/hadoop/
 project/hadoop-0.20.2/jcuda_1.1_linux64/jcuda.jar:/home/hadoop/project/h
 adoop-0.20.2/lib/commons-cli-1.2.jar
 -d wordcount_classes1/ WordCount.java

 jar -cvf wordcount1.jar -C wordcount_classes1/ .

 bin/hadoop jar wordcount1.jar org.myorg.WordCount /user/hadoop/gutenberg
 /user/hadoop/output1


 Please guide how to achieve this.



 Thanks  best Regards,

 Adarsh Sharma






Re: why quick sort when spill map output?

2011-02-28 Thread MANISH SINGLA
one of the major reasons of using quicksort would be that quicksort
can easily be parallalized...due to its divide and conquer nature

On Mon, Feb 28, 2011 at 6:06 PM, James Seigel ja...@tynt.com wrote:
 Sorting out of the map phase is core to how hadoop works.  Are you asking why 
 sort at all?  or why did someone use quick sort as opposed to _sort?

 Cheers
 James


 On 2011-02-28, at 3:30 AM, elton sky wrote:

 Hello forumers,

 Before spill the data in kvbuffer to local disk in map task, k/v are
 sorted using quick sort. The complexity of quick sort is O(nlogn) and
 worst case is O(n^2).
 Why using quick sort?

 Regards




Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma

Sonal Goyal wrote:

Hi Adarsh,

I think your mapred.cache.files property has an extra space at the end. Try
removing that and let us know how it goes.
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal


  


Thanks a Lot Sonal but it doesn't succeed.
Please if possible tell me the proper steps that are need to be followed 
after Configuring Hadoop Cluster.


I don't believe that a simple commands succeeded as

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512


but in Map-reduce job it fails :

11/02/28 18:42:47 INFO mapred.JobClient: Task Id : 
attempt_201102281834_0001_m_01_2, Status : FAILED

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)

   ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)



Thanks  best Regards,

Adarsh Sharma



On Mon, Feb 28, 2011 at 5:06 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Thanks Sanjay, it seems i found the root cause.

But I result in following error:

[hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
Exception in specified URI's java.net.URISyntaxException: Illegal character
in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parseHierarchical(URI.java:3066)
  at java.net.URI$Parser.parse(URI.java:3014)
  at java.net.URI.init(URI.java:578)
  at
org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
  at
org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
  at
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
  at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Exception in thread main java.lang.NullPointerException
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
  at
org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
  at
org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
  at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Please check my attached mapred-site.xml


Thanks  best regards,

Adarsh Sharma



Kaluskar, Sanjay wrote:



You will probably have to use distcache to distribute your jar to all
the nodes too. Read 

How to make a CGI with HBase?

2011-02-28 Thread edward choi
Hi,

I am planning to make a search engine for news articles. It will probably
have over several billions of news articles so I thought HBase is the way to
go.

However, I am very new to CGI. All I know is that you use php, python or
java script with HTML to make a web site and communicate with the backend
database such as MySQL.

But I am going to use HBase, not MySQL, and I can't seem to find a script
language that provides any form of API to communicate with HBase.

So what do I do?

Do I have to make a web site with pure Java? Is that even possible?

Or is using Hbase as the backend Database a bad idea in the first place?


Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
Adarsh,

Are you trying to distribute both the native library and the jcuda.jar?
Could you please explain your job's dependencies?
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Mon, Feb 28, 2011 at 6:54 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 Sonal Goyal wrote:

 Hi Adarsh,

 I think your mapred.cache.files property has an extra space at the end.
 Try
 removing that and let us know how it goes.
 Thanks and Regards,
 Sonal
 https://github.com/sonalgoyal/hihoHadoop ETL and Data
 Integrationhttps://github.com/sonalgoyal/hiho
 Nube Technologies http://www.nubetech.co

 http://in.linkedin.com/in/sonalgoyal





 Thanks a Lot Sonal but it doesn't succeed.
 Please if possible tell me the proper steps that are need to be followed
 after Configuring Hadoop Cluster.

 I don't believe that a simple commands succeeded as

 [root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
 [root@cuda1 hadoop-0.20.2]# java EnumDevices
 Total number of devices: 1
 Name: Tesla C1060
 Version: 1.3
 Clock rate: 1296000 MHz
 Threads per block: 512


 but in Map-reduce job it fails :

 11/02/28 18:42:47 INFO mapred.JobClient: Task Id :
 attempt_201102281834_0001_m_01_2, Status : FAILED
 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
   ... 3 more
 Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
   at java.lang.Runtime.loadLibrary0(Runtime.java:823)
   at java.lang.System.loadLibrary(System.java:1028)
   at jcuda.driver.CUDADriver.clinit(CUDADriver.java:909)
   at jcuda.CUDA.init(CUDA.java:62)
   at jcuda.CUDA.init(CUDA.java:42)




 Thanks  best Regards,

 Adarsh Sharma



 On Mon, Feb 28, 2011 at 5:06 PM, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:



 Thanks Sanjay, it seems i found the root cause.

 But I result in following error:

 [hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
 org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
 Exception in specified URI's java.net.URISyntaxException: Illegal
 character
 in path at index 36: hdfs://192.168.0.131:54310/jcuda.jar
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parseHierarchical(URI.java:3066)
  at java.net.URI$Parser.parse(URI.java:3014)
  at java.net.URI.init(URI.java:578)
  at
 org.apache.hadoop.util.StringUtils.stringToURI(StringUtils.java:204)
  at

 org.apache.hadoop.filecache.DistributedCache.getCacheFiles(DistributedCache.java:593)
  at

 org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:638)
  at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 Exception in thread main java.lang.NullPointerException
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
  at

 org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:506)
  at

 org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:640)
  at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at org.myorg.WordCount.main(WordCount.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at

 

RE: Hadoop Case Studies?

2011-02-28 Thread Tom Deutsch
Ted - ping me off line and I'll help. Most of what we're doing is 
classified or client confidential, but there are some I can share.


Tom Deutsch
Program Director
CTO Office: Information Management
Hadoop Product Manager / Customer Exec BigInsights
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeut...@us.ibm.com




Evert Lammerts evert.lamme...@sara.nl 
02/28/2011 01:04 AM
Please respond to
common-user@hadoop.apache.org


To
common-user@hadoop.apache.org common-user@hadoop.apache.org, 
tpede...@d.umn.edu tpede...@d.umn.edu
cc

Subject
RE: Hadoop Case Studies?






Hi Ted,

For what it's worth, here's a short article listing some of the cases that 
we (SARA, Dutch center for HPC) are supporting on our cluster at the 
moment:

http://blog.bottledbits.com/2011/01/sara-hadoop-pilot-project-use-cases-on-hadoop-in-the-dutch-center-for-hpc/


Cheers,

Evert Lammerts
Consultant eScience  Cloud Services
SARA Computing  Network Services
Operations, Support  Development

Phone: +31 20 888 4101
Email: evert.lamme...@sara.nl
http://www.sara.nl


 -Original Message-
 From: Ted Dunning [mailto:tdunn...@maprtech.com]
 Sent: maandag 28 februari 2011 5:39
 To: common-user@hadoop.apache.org; tpede...@d.umn.edu
 Subject: Re: Hadoop Case Studies?

 At any large company that makes heavy use of Hadoop, you aren't going
 to
 find any concise description of all the ways that hadoop is used.

 That said, here is a concise description of some of the ways that
 hadoop is
 (was) used at Yahoo:

 http://www.slideshare.net/ydn/hadoop-yahoo-internet-scale-data-
 processing

 On Sun, Feb 27, 2011 at 7:31 PM, Ted Pedersen tpede...@d.umn.edu
 wrote:

  Thanks for all these great ideas. These are really very helpful.
 
  What I'm also hoping to find are articles or papers that describe
 what
  particular companies or organizations have done with Hadoop. How does
  Facebook use Hadoop for example (that's one of the case studies in
 the
  White book), or how does last.fm use Hadoop (another of the case
  studies in the White book).
 
  One interesting resource is the list of powered by Hadoop projects
  available here:
 
  http://wiki.apache.org/hadoop/PoweredBy
 
  Some of these entries provide links to more detailed discussions of
  what an organization is doing, as in the following from Twitter
  http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-
 east-2009
 
  So any additional descriptions of what specific organizations are
  doing with Hadoop (to the extent they are willing to share) would be
  really helpful (these sorts of real world cases tend to be
  particularly motivating).
 
  Cordially,
  Ted
 
  On Sun, Feb 27, 2011 at 9:23 PM, Simon gsmst...@gmail.com wrote:
   I think you can also simulate PageRank Algorithm with hadoop.
  
   Simon -
  
   On Sun, Feb 27, 2011 at 9:20 PM, Lance Norskog goks...@gmail.com
  wrote:
  
   This is an exercise that will appeal to undergrads: pull the
 Craiglist
   personals ads from several cities, and do text classification.
 Given a
   training set of all the cities, attempt to classify test ads by
 city.
   (If Peter Harrington is out there, I stole this from you.)
  
   Lance
  
   On Sun, Feb 27, 2011 at 4:55 PM, Ted Dunning
 tdunn...@maprtech.com
   wrote:
Ted,
   
Greetings back at you.  It has been a while.
   
Check out Jimmy Lin and Chris Dyer's book about text processing
 with
hadoop:
   
http://www.umiacs.umd.edu/~jimmylin/book.html
   
   
On Sun, Feb 27, 2011 at 4:34 PM, Ted Pedersen
 tpede...@d.umn.edu
   wrote:
   
Greetings all,
   
I'm teaching an undergraduate Computer Science class that is
 using
Hadoop quite heavily, and would like to include some case
 studies at
various points during this semester.
   
We are using Tom White's Hadoop The Definitive Guide as a
 text, and
that includes a very nice chapter of case studies which might
 even
provide enough material for my purposes.
   
But, I wanted to check and see if there were other case studies
 out
there that might provide motivating and interesting examples of
 how
Hadoop is currently being used. The idea is to find material
 that
  goes
beyond simply saying X uses Hadoop to explaining in more
 detail how
and why X are using Hadoop.
   
Any hints would be very gratefully received.
   
Cordially,
Ted
   
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
   
   
  
  
  
   --
   Lance Norskog
   goks...@gmail.com
  
  
  
  
   --
   Regards,
   Simon
  
 
 
 
  --
  Ted Pedersen
  http://www.d.umn.edu/~tpederse
 



Re: Re: Problem with building hadoop 0.21

2011-02-28 Thread Simon
I mean can you just make changes to the 0.21 version of your hadoop rather
than put the 0.21 version jars to the latest code. There might be API
breakdowns. Or you can try downloading source code of version 0.21 and try
your steps.

Thanks
Simon

2011/2/28 朱韬 ryanzhu...@163.com

 Hi.Simon:
   I modified some coed related to scheduler and designed a  customized
 scheduler .when I built the modified code, then the problems described above
 came up with it. I doubt whether there was something with my code, but after
  I built the out-of-box code, the same problems still existed. Can you tell
 me how to build and deploy  a  customized hadoop?
 Thank you!

   zhutao





 At 2011-02-28 11:21:16,Simon gsmst...@gmail.com wrote:

 Hey,
 
 Can you let us know why you want to replace all the jar files? That
 usually
 does not work, especially for development code in the code base.
 So, just use the one you have successfully compiled, don't replace jar
 files.
 
 Hope it can work.
 
 Simon
 
 2011/2/27 朱韬 ryanzhu...@163.com
 
  Hi,guys:
   I checked out the source code fromhttp://
  svn.apache.org/repos/asf/hadoop/mapreduce/trunk/. Then I compiled using
  this script:
   #!/bin/bash
  export JAVA_HOME=/usr/share/jdk1.6.0_14
  export CFLAGS=-m64
  export CXXFLAGS=-m64
  export ANT_HOME=/opt/apache-ant-1.8.2
  export PATH=$PATH:$ANT_HOME/bin
  ant -Dversion=0.21.0 -Dcompile.native=true
  -Dforrest.home=/home/hadoop/apache-forrest-0.9 clean tar
  It was Ok before these steps. Then I replaced
  hadoop-mapred-0.21.0.jar, hadoop-mapred-0.21.0-sources.jar,
   hadoop-mapred-examples-0.21.0.jar,hadoop-mapred-test-0.21.0.jar,and
  hadoop-mapred-tools-0.21.0.jar inRelease 0.21.0 with the compiled jar
 files
  from the above step. Also I added my scheduler to lib. When starting the
  customed hadoop, I encountered the problems as blow:
  Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/hadoop/security/RefreshUserMappingsProtocol
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
 at
  java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  10.61.0.6: starting tasktracker, logging to
 
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt0.hypercloud.ict.out
  10.61.0.143: starting tasktracker, logging to
 
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt1.hypercloud.ict.out
  10.61.0.7: starting tasktracker, logging to
 
 /home/hadoop/hadoop-green-0.1.0/logs/hadoop-hadoop-tasktracker-hdt2.hypercloud.ict.out
  10.61.0.6: Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/hadoop/io/SecureIOUtils$AlreadyExistsException
  10.61.0.6: Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException
  10.61.0.6:  at
 java.net.URLClassLoader$1.run(URLClassLoader.java:200)
  10.61.0.6:  at java.security.AccessController.doPrivileged(Native
  Method)
  10.61.0.6:  at
  java.net.URLClassLoader.findClass(URLClassLoader.java:188)
  10.61.0.6:  at
 java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  10.61.0.6:  at
  sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  10.61.0.6:  at
 java.lang.ClassLoader.loadClass(ClassLoader.java:252)
  10.61.0.6:  at
  java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
  10.61.0.6: Could not find the main class:
  org.apache.hadoop.mapred.TaskTracker.  Program will exit.
  10.61.0.143: Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/hadoop/io/SecureIOUtils$AlreadyExistsException
  10.61.0.143: Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException
  10.61.0.143:at
 java.net.URLClassLoader$1.run(URLClassLoader.java:200)
  10.61.0.143:at java.security.AccessController.doPrivileged(Native
  Method)
  10.61.0.143:at
  java.net.URLClassLoader.findClass(URLClassLoader.java:188)
  10.61.0.143:at
 java.lang.ClassLoader.loadClass(ClassLoader.java:307)
  10.61.0.143:at
  sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
  10.61.0.143:at
 java.lang.ClassLoader.loadClass(ClassLoader.java:252)
  10.61.0.143:at
  java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
  10.61.0.143: Could not find the main class:
  org.apache.hadoop.mapred.TaskTracker.  Program will exit.
  10.61.0.7: Exception in thread main java.lang.NoClassDefFoundError:
  

Re: How to make a CGI with HBase?

2011-02-28 Thread Bibek Paudel
On Mon, Feb 28, 2011 at 2:37 PM, edward choi mp2...@gmail.com wrote:
 Hi,

 I am planning to make a search engine for news articles. It will probably
 have over several billions of news articles so I thought HBase is the way to
 go.

 However, I am very new to CGI. All I know is that you use php, python or
 java script with HTML to make a web site and communicate with the backend
 database such as MySQL.

 But I am going to use HBase, not MySQL, and I can't seem to find a script
 language that provides any form of API to communicate with HBase.

 So what do I do?

 Do I have to make a web site with pure Java? Is that even possible?

It is possible, if you know things like JSP, Java servelets etc.

For people comfortable with PHP or Python, I think Apache Thrift
(http://wiki.apache.org/thrift/) is an alternative.

-b


 Or is using Hbase as the backend Database a bad idea in the first place?



Re: How to make a CGI with HBase?

2011-02-28 Thread Usman Waheed

HI,

I have been using the Thrift Perl API to connect to Hbase for my web app.  
At the moment i only perform random reads and scans based on date ranges  
and some other search criteria.

It works and I am still testing performance.

-Usman


On Mon, Feb 28, 2011 at 2:37 PM, edward choi mp2...@gmail.com wrote:

Hi,

I am planning to make a search engine for news articles. It will  
probably
have over several billions of news articles so I thought HBase is the  
way to

go.

However, I am very new to CGI. All I know is that you use php, python or
java script with HTML to make a web site and communicate with the  
backend

database such as MySQL.

But I am going to use HBase, not MySQL, and I can't seem to find a  
script

language that provides any form of API to communicate with HBase.

So what do I do?

Do I have to make a web site with pure Java? Is that even possible?


It is possible, if you know things like JSP, Java servelets etc.

For people comfortable with PHP or Python, I think Apache Thrift
(http://wiki.apache.org/thrift/) is an alternative.

-b



Or is using Hbase as the backend Database a bad idea in the first place?




--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Missing files in the trunk ??

2011-02-28 Thread Tom White
These files are generated files. If you run ant avro-generate
eclipse then Eclipse should file these files.

Cheers,
Tom

On Mon, Feb 28, 2011 at 2:43 AM, bharath vissapragada
bharathvissapragada1...@gmail.com wrote:
 Hi all,

 I checked out the map-reduce trunk a few days back  and following
 files are missing..

 import org.apache.hadoop.mapreduce.jobhistory.Events;
 import org.apache.hadoop.mapreduce.jobhistory.JhCounter;
 import org.apache.hadoop.mapreduce.jobhistory.JhCounterGroup;
 import org.apache.hadoop.mapreduce.jobhistory.JhCounters;

 ant jar works  well but eclipse finds these files missing in the
 corresponding packages ..

 I browsed the trunk online but couldn't trace these files..

 Any help is highly appreciated :)

 --
 Regards,
 Bharath .V
 w:http://research.iiit.ac.in/~bharath.v



Advice for a new open-source project and a license

2011-02-28 Thread Mark Kerzner
Hi,

I am working on an open-source project that would be using
Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
searchable. Like Nutch, only applied to hard drives, and like Google Desktop
Search, only I want to output information about every file found. Not a big
difference though.

I am looking for an advice on the following

   1. Have you heard of a similar project?
   2. What license should I use? I am thinking of Apache V2.0, because it
   relies on other Apache V2.0 projects;
   3. Any other advice?

Thank you. Sincerely,
Mark


Re: Library Issue

2011-02-28 Thread Greg Roelofs
Adarsh Sharma adarsh.sha...@orkash.com wrote:

 But Still don't know why it fails in Map-reduce job.

 [hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
 org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
 11/02/28 15:01:45 INFO input.FileInputFormat: Total input paths to
 process : 3
 11/02/28 15:01:45 INFO mapred.JobClient: Running job: job_201102281104_0006
 11/02/28 15:01:46 INFO mapred.JobClient:  map 0% reduce 0%
 11/02/28 15:01:56 INFO mapred.JobClient: Task Id :
 attempt_201102281104_0006_m_00_0, Status : FAILED
 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
 [...]
 Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path

That would be why.  I don't know if jcuda is an external command, a
shared library, or a jar of some sort, but Hadoop wants it in the Java
library path (probably on all nodes).

Greg


Re: Advice for a new open-source project and a license

2011-02-28 Thread Ted Dunning
Check out http://www.elasticsearch.org/

http://www.elasticsearch.org/Not what you are doing, but possibly a
helpful bit of the pie.

Also, Solr integrates Tika and Lucene pretty nicely any more.  No Hbase yet,
but it isn't hard to add that.

On Mon, Feb 28, 2011 at 1:01 PM, Mark Kerzner markkerz...@gmail.com wrote:

 Hi,

 I am working on an open-source project that would be using
 Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
 searchable. Like Nutch, only applied to hard drives, and like Google
 Desktop
 Search, only I want to output information about every file found. Not a big
 difference though.

 I am looking for an advice on the following

   1. Have you heard of a similar project?
   2. What license should I use? I am thinking of Apache V2.0, because it
   relies on other Apache V2.0 projects;
   3. Any other advice?

 Thank you. Sincerely,
 Mark



Re: How to make a CGI with HBase?

2011-02-28 Thread Edward Choi
Thanks for the reply. 
Didn't know that Thrift was for such purpose. 
Servlet and JSP is totally new to me. I skimmed through the concept on the 
internet and they look fascinating. 
I think I am gonna give servlet and JSP a try. 

On 2011. 3. 1., at 오전 12:51, Usman Waheed usm...@opera.com wrote:
 HI,
 
 I have been using the Thrift Perl API to connect to Hbase for my web app. At 
 the moment i only perform random reads and scans based on date ranges and 
 some other search criteria.
 It works and I am still testing performance.
 
 -Usman
 
 On Mon, Feb 28, 2011 at 2:37 PM, edward choi mp2...@gmail.com wrote:
 Hi,
 
 I am planning to make a search engine for news articles. It will probably
 have over several billions of news articles so I thought HBase is the way to
 go.
 
 However, I am very new to CGI. All I know is that you use php, python or
 java script with HTML to make a web site and communicate with the backend
 database such as MySQL.
 
 But I am going to use HBase, not MySQL, and I can't seem to find a script
 language that provides any form of API to communicate with HBase.
 
 So what do I do?
 
 Do I have to make a web site with pure Java? Is that even possible?
 
 It is possible, if you know things like JSP, Java servelets etc.
 
 For people comfortable with PHP or Python, I think Apache Thrift
 (http://wiki.apache.org/thrift/) is an alternative.
 
 -b
 
 
 Or is using Hbase as the backend Database a bad idea in the first place?
 
 
 
 -- 
 Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Advice for a new open-source project and a license

2011-02-28 Thread Mark Kerzner
Thank you, Ted, indeed, not exactly what I am doing - which is even more
encouraging. Solr may be used for making the results easily available, so
thank you for pointing that out.

Cheers,
Mark

On Mon, Feb 28, 2011 at 5:28 PM, Ted Dunning tdunn...@maprtech.com wrote:


 Check out http://www.elasticsearch.org/

 http://www.elasticsearch.org/Not what you are doing, but possibly a
 helpful bit of the pie.

 Also, Solr integrates Tika and Lucene pretty nicely any more.  No Hbase
 yet, but it isn't hard to add that.

 On Mon, Feb 28, 2011 at 1:01 PM, Mark Kerzner markkerz...@gmail.comwrote:

 Hi,

 I am working on an open-source project that would be using
 Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
 searchable. Like Nutch, only applied to hard drives, and like Google
 Desktop
 Search, only I want to output information about every file found. Not a
 big
 difference though.

 I am looking for an advice on the following

   1. Have you heard of a similar project?
   2. What license should I use? I am thinking of Apache V2.0, because it

   relies on other Apache V2.0 projects;
   3. Any other advice?

 Thank you. Sincerely,
 Mark





Re: Advice for a new open-source project and a license

2011-02-28 Thread Greg Roelofs
Mark Kerzner markkerz...@gmail.com wrote:

 I am working on an open-source project that would be using
 Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
 searchable.

_A_ hard drive?  Hadoop?  Seems like a bad match.

Greg


Re: Advice for a new open-source project and a license

2011-02-28 Thread Mark Kerzner
Well, it's more complex than that. I packed all files (or selected
directories) into zip files, and those zip files go into HDFS, and they are
processed from there.

Mark

On Mon, Feb 28, 2011 at 9:53 PM, Greg Roelofs roel...@yahoo-inc.com wrote:

 Mark Kerzner markkerz...@gmail.com wrote:

  I am working on an open-source project that would be using
  Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
  searchable.

 _A_ hard drive?  Hadoop?  Seems like a bad match.

 Greg



Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Adarsh Sharma

Sonal Goyal wrote:

Adarsh,

Are you trying to distribute both the native library and the jcuda.jar?
Could you please explain your job's dependencies?
  


Yes Of course , I am trying to run a Juda program in Hadoop Cluster as I 
am able to run it simple through simple javac  java commands at 
standalone machine by setting PATH  LD_LIBRARY_PATH varibale to 
*/usr/local/cuda/lib*  */home/hadoop/project/jcuda_1.1_linux *folder.


I listed the contents  jars in these directories :

[hadoop@cuda1 lib]$ pwd
/usr/local/cuda/lib
[hadoop@cuda1 lib]$ ls -ls
total 158036
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcublas.so - 
libcublas.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcublas.so.3 - 
libcublas.so.3.2.16

81848 -rwxrwxrwx 1 root root 83720712 Feb 23 19:37 libcublas.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcudart.so - 
libcudart.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcudart.so.3 - 
libcudart.so.3.2.16

 424 -rwxrwxrwx 1 root root   423660 Feb 23 19:37 libcudart.so.3.2.16
   4 lrwxrwxrwx 1 root root   13 Feb 23 19:37 libcufft.so - 
libcufft.so.3
   4 lrwxrwxrwx 1 root root   18 Feb 23 19:37 libcufft.so.3 - 
libcufft.so.3.2.16

27724 -rwxrwxrwx 1 root root 28351780 Feb 23 19:37 libcufft.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcurand.so - 
libcurand.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcurand.so.3 - 
libcurand.so.3.2.16

4120 -rwxrwxrwx 1 root root  4209384 Feb 23 19:37 libcurand.so.3.2.16
   4 lrwxrwxrwx 1 root root   16 Feb 23 19:37 libcusparse.so - 
libcusparse.so.3
   4 lrwxrwxrwx 1 root root   21 Feb 23 19:37 libcusparse.so.3 - 
libcusparse.so.3.2.16

43048 -rwxrwxrwx 1 root root 44024836 Feb 23 19:37 libcusparse.so.3.2.16
 172 -rwxrwxrwx 1 root root   166379 Nov 25 11:29 
libJCublas-linux-x86_64.so
 152 -rwxrwxrwx 1 root root   144179 Nov 25 11:29 
libJCudaDriver-linux-x86_64.so

  16 -rwxrwxrwx 1 root root 8474 Mar 31  2009 libjcudafft.so
 136 -rwxrwxrwx 1 root root   128672 Nov 25 11:29 
libJCudaRuntime-linux-x86_64.so

  80 -rwxrwxrwx 1 root root70381 Mar 31  2009 libjcuda.so
  44 -rwxrwxrwx 1 root root38039 Nov 25 11:29 libJCudpp-linux-x86_64.so
  44 -rwxrwxrwx 1 root root38383 Nov 25 11:29 libJCufft-linux-x86_64.so
  48 -rwxrwxrwx 1 root root43706 Nov 25 11:29 
libJCurand-linux-x86_64.so
 140 -rwxrwxrwx 1 root root   133280 Nov 25 11:29 
libJCusparse-linux-x86_64.so


And the second folder as :

[hadoop@cuda1 jcuda_1.1_linux64]$ pwd
/home/hadoop/project/hadoop-0.20.2/jcuda_1.1_linux64
[hadoop@cuda1 jcuda_1.1_linux64]$ ls -ls
total 200
8 drwxrwxrwx 6 hadoop hadoop  4096 Feb 24 01:44 doc
8 drwxrwxrwx 3 hadoop hadoop  4096 Feb 24 01:43 examples
32 -rwxrwxr-x 1 hadoop hadoop 28484 Feb 24 01:43 jcuda.jar
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3.2.16
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3
4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3.2.16
16 -rwxr-xr-x 1 hadoop hadoop  8474 Mar  1 04:12 libjcudafft.so
80 -rwxr-xr-x 1 hadoop hadoop 70381 Mar  1 04:11 libjcuda.so
8 -rwxrwxr-x 1 hadoop hadoop   811 Feb 24 01:43 README.txt
8 drwxrwxrwx 2 hadoop hadoop  4096 Feb 24 01:43 resources
[hadoop@cuda1 jcuda_1.1_linux64]$

I think Hadoop would not able to recognize *jcuda.jar* in Tasktracker 
process. Please guide me how to make it available in it.



Thanks  best Regards,
Adrash Sharma


Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Mon, Feb 28, 2011 at 6:54 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Sonal Goyal wrote:



Hi Adarsh,

I think your mapred.cache.files property has an extra space at the end.
Try
removing that and let us know how it goes.
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal




  

Thanks a Lot Sonal but it doesn't succeed.
Please if possible tell me the proper steps that are need to be followed
after Configuring Hadoop Cluster.

I don't believe that a simple commands succeeded as

[root@cuda1 hadoop-0.20.2]# javac EnumDevices.java
[root@cuda1 hadoop-0.20.2]# java EnumDevices
Total number of devices: 1
Name: Tesla C1060
Version: 1.3
Clock rate: 1296000 MHz
Threads per block: 512


but in Map-reduce job 

Re: Library Issue

2011-02-28 Thread Lance Norskog
NVidia hardware is GPU-based math hardware, the chips in high-end 3d graphics 
cards. Cuda is the name of the NVidia development environment. Your Hadoop job 
want to run code on an NVidia card.

I'm guessing 'jcuda' is a utility program for the CUDA tools. What is your code 
written in? 

We need more information on what program you are running, where it came from, 
what programs it expects, etc.

Lance

On Feb 28, 2011, at 1:32 PM, Greg Roelofs wrote:

 Adarsh Sharma adarsh.sha...@orkash.com wrote:
 
 But Still don't know why it fails in Map-reduce job.
 
 [hadoop@ws37-mah-lin hadoop-0.20.2]$ bin/hadoop jar wordcount1.jar
 org.myorg.WordCount /user/hadoop/gutenberg /user/hadoop/output1
 11/02/28 15:01:45 INFO input.FileInputFormat: Total input paths to
 process : 3
 11/02/28 15:01:45 INFO mapred.JobClient: Running job: job_201102281104_0006
 11/02/28 15:01:46 INFO mapred.JobClient:  map 0% reduce 0%
 11/02/28 15:01:56 INFO mapred.JobClient: Task Id :
 attempt_201102281104_0006_m_00_0, Status : FAILED
 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
 [...]
 Caused by: java.lang.UnsatisfiedLinkError: no jcuda in java.library.path
 
 That would be why.  I don't know if jcuda is an external command, a
 shared library, or a jar of some sort, but Hadoop wants it in the Java
 library path (probably on all nodes).
 
 Greg



Re: How to make a CGI with HBase?

2011-02-28 Thread Lance Norskog
Java servlets for web development with databases is a learning curve.  HBase is 
a learning curve. You might want to learn one at a time :)  If you code the 
site in Java/JSPs, you want a JDBC driver for HBase. You can code calls to 
HBase directly in a JSP without writing Java.

Lance


On Feb 28, 2011, at 6:19 PM, edward choi wrote:

 Thanks for the reply. 
 Didn't know that Thrift was for such purpose. 
 Servlet and JSP is totally new to me. I skimmed through the concept on the 
 internet and they look fascinating. 
 I think I am gonna give servlet and JSP a try. 
 
 On 2011. 3. 1., at 오전 12:51, Usman Waheed usm...@opera.com wrote:
 HI,
 
 I have been using the Thrift Perl API to connect to Hbase for my web app. At 
 the moment i only perform random reads and scans based on date ranges and 
 some other search criteria.
 It works and I am still testing performance.
 
 -Usman
 
 On Mon, Feb 28, 2011 at 2:37 PM, edward choi mp2...@gmail.com wrote:
 Hi,
 
 I am planning to make a search engine for news articles. It will probably
 have over several billions of news articles so I thought HBase is the way 
 to
 go.
 
 However, I am very new to CGI. All I know is that you use php, python or
 java script with HTML to make a web site and communicate with the backend
 database such as MySQL.
 
 But I am going to use HBase, not MySQL, and I can't seem to find a script
 language that provides any form of API to communicate with HBase.
 
 So what do I do?
 
 Do I have to make a web site with pure Java? Is that even possible?
 
 It is possible, if you know things like JSP, Java servelets etc.
 
 For people comfortable with PHP or Python, I think Apache Thrift
 (http://wiki.apache.org/thrift/) is an alternative.
 
 -b
 
 
 Or is using Hbase as the backend Database a bad idea in the first place?
 
 
 
 -- 
 Using Opera's revolutionary email client: http://www.opera.com/mail/



Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
Hi Adarsh,

Have you placed jcuda.jar in HDFS? Your configuration says

hdfs://192.168.0.131:54310/jcuda.jar


Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoHadoop ETL and Data
Integrationhttps://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Tue, Mar 1, 2011 at 9:34 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 Sonal Goyal wrote:

 Adarsh,

 Are you trying to distribute both the native library and the jcuda.jar?
 Could you please explain your job's dependencies?



 Yes Of course , I am trying to run a Juda program in Hadoop Cluster as I am
 able to run it simple through simple javac  java commands at standalone
 machine by setting PATH  LD_LIBRARY_PATH varibale to */usr/local/cuda/lib*
  */home/hadoop/project/jcuda_1.1_linux *folder.

 I listed the contents  jars in these directories :

 [hadoop@cuda1 lib]$ pwd
 /usr/local/cuda/lib
 [hadoop@cuda1 lib]$ ls -ls
 total 158036
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcublas.so -
 libcublas.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcublas.so.3 -
 libcublas.so.3.2.16
 81848 -rwxrwxrwx 1 root root 83720712 Feb 23 19:37 libcublas.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcudart.so -
 libcudart.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcudart.so.3 -
 libcudart.so.3.2.16
  424 -rwxrwxrwx 1 root root   423660 Feb 23 19:37 libcudart.so.3.2.16
   4 lrwxrwxrwx 1 root root   13 Feb 23 19:37 libcufft.so -
 libcufft.so.3
   4 lrwxrwxrwx 1 root root   18 Feb 23 19:37 libcufft.so.3 -
 libcufft.so.3.2.16
 27724 -rwxrwxrwx 1 root root 28351780 Feb 23 19:37 libcufft.so.3.2.16
   4 lrwxrwxrwx 1 root root   14 Feb 23 19:37 libcurand.so -
 libcurand.so.3
   4 lrwxrwxrwx 1 root root   19 Feb 23 19:37 libcurand.so.3 -
 libcurand.so.3.2.16
 4120 -rwxrwxrwx 1 root root  4209384 Feb 23 19:37 libcurand.so.3.2.16
   4 lrwxrwxrwx 1 root root   16 Feb 23 19:37 libcusparse.so -
 libcusparse.so.3
   4 lrwxrwxrwx 1 root root   21 Feb 23 19:37 libcusparse.so.3 -
 libcusparse.so.3.2.16
 43048 -rwxrwxrwx 1 root root 44024836 Feb 23 19:37 libcusparse.so.3.2.16
  172 -rwxrwxrwx 1 root root   166379 Nov 25 11:29
 libJCublas-linux-x86_64.so
  152 -rwxrwxrwx 1 root root   144179 Nov 25 11:29
 libJCudaDriver-linux-x86_64.so
  16 -rwxrwxrwx 1 root root 8474 Mar 31  2009 libjcudafft.so
  136 -rwxrwxrwx 1 root root   128672 Nov 25 11:29
 libJCudaRuntime-linux-x86_64.so
  80 -rwxrwxrwx 1 root root70381 Mar 31  2009 libjcuda.so
  44 -rwxrwxrwx 1 root root38039 Nov 25 11:29 libJCudpp-linux-x86_64.so
  44 -rwxrwxrwx 1 root root38383 Nov 25 11:29 libJCufft-linux-x86_64.so
  48 -rwxrwxrwx 1 root root43706 Nov 25 11:29 libJCurand-linux-x86_64.so
  140 -rwxrwxrwx 1 root root   133280 Nov 25 11:29
 libJCusparse-linux-x86_64.so

 And the second folder as :

 [hadoop@cuda1 jcuda_1.1_linux64]$ pwd

 /home/hadoop/project/hadoop-0.20.2/jcuda_1.1_linux64
 [hadoop@cuda1 jcuda_1.1_linux64]$ ls -ls
 total 200
 8 drwxrwxrwx 6 hadoop hadoop  4096 Feb 24 01:44 doc
 8 drwxrwxrwx 3 hadoop hadoop  4096 Feb 24 01:43 examples
 32 -rwxrwxr-x 1 hadoop hadoop 28484 Feb 24 01:43 jcuda.jar
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcublas.so.3.2.16
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcudart.so.3.2.16
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcufft.so.3.2.16
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcurand.so.3.2.16
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3
 4 -rw-rw-r-- 1 hadoop hadoop 0 Mar  1 21:27 libcusparse.so.3.2.16
 16 -rwxr-xr-x 1 hadoop hadoop  8474 Mar  1 04:12 libjcudafft.so
 80 -rwxr-xr-x 1 hadoop hadoop 70381 Mar  1 04:11 libjcuda.so
 8 -rwxrwxr-x 1 hadoop hadoop   811 Feb 24 01:43 README.txt
 8 drwxrwxrwx 2 hadoop hadoop  4096 Feb 24 01:43 resources
 [hadoop@cuda1 jcuda_1.1_linux64]$

 I think Hadoop would not able to recognize *jcuda.jar* in Tasktracker
 process. Please guide me how to make it available in it.


 Thanks  best Regards,
 Adrash Sharma


  Thanks and Regards,
 Sonal
 https://github.com/sonalgoyal/hihoHadoop ETL and Data
 Integrationhttps://github.com/sonalgoyal/hiho
 Nube Technologies http://www.nubetech.co

 http://in.linkedin.com/in/sonalgoyal





 On Mon, Feb 28, 2011 at 6:54 PM, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:



 Sonal Goyal wrote:



 Hi Adarsh,

 I think your mapred.cache.files property has an extra space at the end.
 Try
 removing that and let us know how it goes.
 Thanks and Regards,
 Sonal
 https://github.com/sonalgoyal/hihoHadoop ETL and Data
 Integrationhttps://github.com/sonalgoyal/hiho
 Nube Technologies http://www.nubetech.co