Hadoop user event in Europe (interested ? )
Hi wondering if there is interest to organize a hadoop meet-up in Europe ( Geneva, Switzerland) ? It could be a 2 day event discussing use of Hadoop in industry/science project. If this interest you please let me know. cheers asif
Re: another quick question
Hi The tmp directory is local to the machine running the hadoop system, so if your hadoop is on a remote machine, tmp directory has to be on that machine Your question is not clear to me e.g. what you want to do? asif On Oct 6, 2010, at 9:55 PM, Maha A. Alabduljalil wrote: Hi again, I guess my questions are easy.. Since I'm installing hadoop in my school machine I have to veiw namenode online via hdfs://host-name:50070 instead of the default link provided by Hadoop Quick Start. (ie.hdfs://localhost:50070). Do you think I should set my hadoop.tmp.dir to the machine I'm currenlty working on and I can do the default way? Thank you, Maha
Re: Quick question
Hi check if the ports are open outside school network else you will have to use ssh tunneling if you want to access ports serving the webpages (as it is more likely that these are not open by default) try something like ssh -L50030:hadoop-host-address:50030 ur-usern...@cluster-head-node Then open localhost:50030 to see job-tracker page. cheers On Oct 6, 2010, at 9:14 PM, Maha A. Alabduljalil wrote: Hi Every one, I've started up hadoop (hdfs data and name nodes, JobTracker and TaskTrakers), using the quick start guidance. The web view of the filesystem and jobtracker suddenly started to give can't be found by safari. Notice I'm actually accessing hadoop via ssh to my school account. Could that be the problem? Thank you, Maha
mapside joins
Hi Does join only work with text like files ? Is it possible to do map-side join using custom Writable. Say I have writables, custom1 and custom2 and there is one common field (say id) that could join the records in these 2 objects. So will it be possible to join these 2 files and output writable custom3 ? thanks
Re: manipulate counters in new api
context.getCounter(Status.failed).increment(1); cheers On Jul 19, 2010, at 9:48 PM, Gang Luo wrote: Hi all, I find the map/reduce method in the new api look like map/ reduce(Object, Iterable, Context). No Reporter appears here as in the old api. How to add and modify counters in the new api? Thanks, -Gang
Re: Single Node with multiple mappers?
how is your data being spilt ? using mapred.map.tasks property should let you specify how many maps you would want to run (provided your input file is big enough to be spilt into multiple chunks) asif On Jul 16, 2010, at 11:03 AM, Moritz Krog wrote: Hi everyone, I was curious if there is any option to use Hadoop in single node mode in a way, that enables the process to use more system ressources. Right now, Hadoop uses one mapper and one reducer, leaving my i7 with about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically idling. Raising the number of map tasks doesn't seem to do much, as this parameter seems to more of a hint anyway. Still, I have lots of CPU time and RAM left. Any hints on how to use them? thanks in advance, Moritz
Re: how to do a reduce-only job
you need to join these files into 1; you could ether do a map-side join, or reduce-side join for map-side join (slightly more involved) look at example: org.apache.hadoop.examples.Join for reduce side join simply create 2 mappers (one for each file) and one reduce (as long as you keep key-value for both same) You will have to use mutliple input format for doing so. e.g. MultipleInputs.addInputPath(conf, path1, input_format1, mapper_class1) MultipleInputs.addInputPath(conf, path2, input_format2, mapper_class2) The javadoc of the class explains it further. cheers On Jul 15, 2010, at 10:26 PM, David Hawthorne wrote: I have two previously created output files of format: key[tab]value where key is text, value is an integer sum of how many times the key appeared. I would like to reduce these output files together into one new output file. I'm having problems finding out how to do this. I've found ways to specify a job with no reducers, but it doesn't look like there's a way to specify a reduce-only job, aside from using the streaming interface with 'cat' as the mapper. I'm not opposed to this, but I also couldn't find a way to specify 'cat' as a mapper and the reducer in my java class as the reducer. I'm also not sure this would work, as the reducer might simply see the entire line emitted by cat as the key. I could use awk as the reducer, but I've heard that streaming is less performant than java, and I've already got the java class written. I could write another java class with a mapper that splits in the value on tab and emits the two fields as , but that seems like it would be extra work and less optimal than being able to run a reduce-only job. So... what are the options? Is there a way to specify a reduce-only job?
Error compiling mapreduce
Hi I am getting this strange error in the target "compile-mapred- classes:" when compiling map-reduce on Mac OS X. The JAVA_HOME is properly set , and if I remove from compile-mapred- classes everything works fine. Is anything special is needed in order to get jasper to work ? thanks [jsp-compile] java.lang.IllegalStateException: No Java compiler available [jsp-compile] at org .apache .jasper .JspCompilationContext.createCompiler(JspCompilationContext.java:224) [jsp-compile] at org.apache.jasper.JspC.processFile(JspC.java:946) [jsp-compile] at org.apache.jasper.JspC.execute(JspC.java:1094) [jsp-compile] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [jsp-compile] at sun .reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39) [jsp-compile] at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) [jsp-compile] at java.lang.reflect.Method.invoke(Method.java:597) [jsp-compile] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java: 105) [jsp-compile] at org.apache.tools.ant.TaskAdapter.execute(TaskAdapter.java:134) [jsp-compile] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [jsp-compile] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) [jsp-compile] at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) [jsp-compile] at java.lang.reflect.Method.invoke(Method.java:597) [jsp-compile] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java: 105) [jsp-compile] at org.apache.tools.ant.Task.perform(Task.java:348) [jsp-compile] at org.apache.tools.ant.Target.execute(Target.java:357) [jsp-compile] at org.apache.tools.ant.Target.performTasks(Target.java: 385) [jsp-compile] at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) [jsp-compile] at org.apache.tools.ant.Project.executeTarget(Project.java:1298) [jsp-compile] at org .apache .tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java: 41) [jsp-compile] at org .eclipse .ant .internal .ui .antsupport .EclipseDefaultExecutor.executeTargets(EclipseDefaultExecutor.java:32) [jsp-compile] at org.apache.tools.ant.Project.executeTargets(Project.java:1181) [jsp-compile] at org .eclipse .ant .internal.ui.antsupport.InternalAntRunner.run(InternalAntRunner.java: 423) [jsp-compile] at org .eclipse .ant .internal.ui.antsupport.InternalAntRunner.main(InternalAntRunner.java: 137) BUILD FAILED /Users/asifjan/Documents/eclipse_workspaces/gaia/hadoop-mapreduce- trunk/build.xml:373: org.apache.jasper.JasperException: java.lang.IllegalStateException: No Java compiler available
exception related to logging (using latest sources)
Hi I am getting following exception when running map-reduce jobs. java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:69) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:222) at org.apache.hadoop.mapred.Child$4.run(Child.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org .apache .hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java: 813) at org.apache.hadoop.mapred.Child.main(Child.java:211) I am using latest sources (0.22.0-snapshot) that I have built myself. any ideas? thanks
How to use MapFile in mapreduce
Hi any pointers on how to use the MapFile with new mapreduce API. I did find the correspondinf output format e.g. org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat, but was not able to see how I can specify MapFileInputFormat ? (naively I thought that org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat; should work for MapFile as well) will I have to implement RecordReader in order to read from a MapFile ? Thanks
Re: calling C programs from Hadoop
Look at Hadoop streaming, may be it is helpful to you. asif On May 29, 2010, at 8:31 PM, Michael Robinson wrote: I am new to Hadoop. I have successfully run java programs from Hadoop and I would like to call C programs from Hadoop. Thank you for your help Michael -- View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p854833.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. Asif Jan Gaia Project SixSq Sarl / ISDC Astrophysics Data Centre & Geneva Observatory Chemin des Ecogia 16 CH-1290 Versoix Switzerland E-mail : asif@unige.ch Tel.: +41 22 37 92198 Fax : +41 22 37 92133
How to make latest build work ?
Hi I need to build Hadoop installation from the latest source code of hadoop/common; I checked out the latest source and ran ant target that makes a distribution tar (ant tar) when I try to run the system I get error HDFS not found. any idea how I can have a functional system from latest sources thanks