Help for the problem of running lucene on Hadoop
Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File(index)),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks in advance! -- Regards, Jander
RE: Retrying connect to server
Hi, I do get this: $ jps 6017 DataNode 5805 NameNode 6234 SecondaryNameNode 6354 Jps What can I do to start JobTracker? Here my config Files: $ cat mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelocalhost:9001/value descriptionThe host and port that the MapReduce job tracker runs at./description /property /configuration cat hdfs-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value descriptionThe actual number of replications can be specified when the file is created./description /property /configuration $ cat core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. /description /property /configuration -Original Message- From: James Seigel [mailto:ja...@tynt.com] Sent: Friday, December 31, 2010 4:56 AM To: common-user@hadoop.apache.org Subject: Re: Retrying connect to server Or 3) The configuration (or lack thereof) on the machine you are trying to run this, has no idea where your DFS or JobTracker is :) Cheers James. On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote: Cavus,M.,Fa. Post Direkt wrote: I process this ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount gutenberg gutenberg-output I get this Dıd anyone know why I get this Error? 10/12/30 16:48:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 0 time(s). 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1 time(s). 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 2 time(s). 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 3 time(s). 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 4 time(s). 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 5 time(s). 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 6 time(s). 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 7 time(s). 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 8 time(s). 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 9 time(s). Exception in thread main java.net.ConnectException: Call to localhost/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:932) at org.apache.hadoop.ipc.Client.call(Client.java:908) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224) at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:70) at org.apache.hadoop.mapreduce.Job.init(Job.java:129) at org.apache.hadoop.mapreduce.Job.init(Job.java:134) at org.postdirekt.hadoop.WordCount.main(WordCount.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417) at
RE: Retrying connect to server
Hi, I've forgotten to start start-mapred.sh Thanks All -Original Message- From: Cavus,M.,Fa. Post Direkt [mailto:m.ca...@postdirekt.de] Sent: Friday, December 31, 2010 10:20 AM To: common-user@hadoop.apache.org Subject: RE: Retrying connect to server Hi, I do get this: $ jps 6017 DataNode 5805 NameNode 6234 SecondaryNameNode 6354 Jps What can I do to start JobTracker? Here my config Files: $ cat mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuelocalhost:9001/value descriptionThe host and port that the MapReduce job tracker runs at./description /property /configuration cat hdfs-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value descriptionThe actual number of replications can be specified when the file is created./description /property /configuration $ cat core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://localhost:9000/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. /description /property /configuration -Original Message- From: James Seigel [mailto:ja...@tynt.com] Sent: Friday, December 31, 2010 4:56 AM To: common-user@hadoop.apache.org Subject: Re: Retrying connect to server Or 3) The configuration (or lack thereof) on the machine you are trying to run this, has no idea where your DFS or JobTracker is :) Cheers James. On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote: Cavus,M.,Fa. Post Direkt wrote: I process this ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount gutenberg gutenberg-output I get this Dıd anyone know why I get this Error? 10/12/30 16:48:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 0 time(s). 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1 time(s). 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 2 time(s). 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 3 time(s). 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 4 time(s). 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 5 time(s). 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 6 time(s). 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 7 time(s). 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 8 time(s). 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 9 time(s). Exception in thread main java.net.ConnectException: Call to localhost/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:932) at org.apache.hadoop.ipc.Client.call(Client.java:908) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224) at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:70) at org.apache.hadoop.mapreduce.Job.init(Job.java:129) at org.apache.hadoop.mapreduce.Job.init(Job.java:134) at org.postdirekt.hadoop.WordCount.main(WordCount.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at
ClassNotFoundException
I look in my Jar File but I get a ClassNotFoundException why?: $ jar -xvf hd.jar dekomprimiert: META-INF/MANIFEST.MF dekomprimiert: org/postdirekt/hadoop/Map.class dekomprimiert: org/postdirekt/hadoop/Map.java dekomprimiert: org/postdirekt/hadoop/WordCount.class dekomprimiert: org/postdirekt/hadoop/WordCount.java dekomprimiert: org/postdirekt/hadoop/Reduce2.class dekomprimiert: org/postdirekt/hadoop/Reduce2.java $ ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount gutenberg gutenberg-output 10/12/31 10:26:54 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 10/12/31 10:26:54 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 10/12/31 10:26:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 10/12/31 10:26:54 INFO input.FileInputFormat: Total input paths to process : 1 10/12/31 10:26:55 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 10/12/31 10:26:55 INFO mapreduce.JobSubmitter: number of splits:1 10/12/31 10:26:55 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null 10/12/31 10:26:55 INFO mapreduce.Job: Running job: job_201012311021_0002 10/12/31 10:26:56 INFO mapreduce.Job: map 0% reduce 0% 10/12/31 10:27:11 INFO mapreduce.Job: Task Id : attempt_201012311021_0002_m_00_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex tImpl.java:167) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio n.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.m 10/12/31 10:27:24 INFO mapreduce.Job: Task Id : attempt_201012311021_0002_m_00_1, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex tImpl.java:167) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio n.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.m 10/12/31 10:27:36 INFO mapreduce.Job: Task Id : attempt_201012311021_0002_m_00_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex tImpl.java:167) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio n.java:742) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
Re: ClassNotFoundException
The answer is in your log output: 10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). Alternatively, use Job.setJarByClass(Class class); On Fri, Dec 31, 2010 at 3:02 PM, Cavus,M.,Fa. Post Direkt m.ca...@postdirekt.de wrote: I look in my Jar File but I get a ClassNotFoundException why?: -- Harsh J www.harshj.com
Re: Help for the problem of running lucene on Hadoop
You'd better make the index in localfile,and copy the final index into the hdfs~~ It is not recommanded to using hdfs as the FileSystem for lucene(Though it can be used for search) 2010/12/31 Jander g jande...@gmail.com Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File(index)),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks in advance! -- Regards, Jander
Re: Help for the problem of running lucene on Hadoop
You should implement the Directory class by your self. Nutch provided one, named HDFSDirectory. You can use it to build the index, but when doing search on HDFS, it is relatively slower, especially on phrase queries. I recommend you to download it to disk when performing a search. On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote: Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File(index)),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks in advance! -- Regards, Jander
Re: Multiple Input Data Processing using MapReduce
You can use map.input.split(something like that, I can't remember..) param in Configuration. this param contains the input file path, you can use it to branch your logic this param can be found in TextInputFormat.java On Thu, Oct 14, 2010 at 10:03 PM, Matthew John tmatthewjohn1...@gmail.comwrote: Hi all , I have been recently working on a task where I need to take in two input (types) files , compare them and produce a result from it using a logic. But as I understand simple MapReduce implementations are for processing a single input type. The closest implementation I could think of similar to my work is Join MapReduce. But I am not able to understand much from the example provided in Hadoop .. Can someone provide a good pointer to such multiple input data processing ( or Join ) in mapreduce . It will also be great if you can send in some sample code for the same. Thanks , Matthew
Re: Multiple Input Data Processing using MapReduce
It is map.input.file [.start and .length also relate to the InputSplit for the mapper] For more: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse With a custom RR, you can put in this value yourself (FileSplit.getPath()) before control heads to the Mapper/MapRunner. On Fri, Dec 31, 2010 at 6:17 PM, Zhou, Yunqing azure...@gmail.com wrote: You can use map.input.split(something like that, I can't remember..) param in Configuration. this param contains the input file path, you can use it to branch your logic this param can be found in TextInputFormat.java -- Harsh J www.harshj.com
HDFS FS Commands Hanging System
Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon
Re: Is hadoop-streaming.jar part of the Apache distribution?
Found it under /opt/hadoop/contrib/streaming. I am now able to run Hadoop streaming jobs on my laptop. By the way, here is the documentation I found confusing: http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming This seems to apply to my install, but says that the streaming JAR should be in the home directory with the other JARs instead of under contrib. On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.comwrote: It is one of the contrib modules. If you look in the src dir you will see a contrib dir containing all the contrib modules. On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote: I installed the Apache distribution http://hadoop.apache.org/ of Hadoop on my laptop and set it up to run in local mode. It's working for me, but I can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop home directory. The root of the Hadoop home directory contains the following JARs: hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar The documentation makes it appear that streaming is part of the default install. I don't see anything that says I have to perform an extra step to get it installed. How do I get streaming installed on my laptop? Thanks.
Re: HDFS FS Commands Hanging System
Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon
Re: HDFS FS Commands Hanging System
Hi Jon, Try: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / -Todd On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman jon2...@gmail.com wrote: Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- Todd Lipcon Software Engineer, Cloudera
FW:FW
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you.
Re: Is hadoop-streaming.jar part of the Apache distribution?
The doc you mentioned is for Hadoop 0.15.2. But you seem to use 0.20.2. Probably you should read Hadoop docs for your installed version. Gerald On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill bill...@gmail.com wrote: Found it under /opt/hadoop/contrib/streaming. I am now able to run Hadoop streaming jobs on my laptop. By the way, here is the documentation I found confusing: http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming This seems to apply to my install, but says that the streaming JAR should be in the home directory with the other JARs instead of under contrib. On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.comwrote: It is one of the contrib modules. If you look in the src dir you will see a contrib dir containing all the contrib modules. On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote: I installed the Apache distribution http://hadoop.apache.org/ of Hadoop on my laptop and set it up to run in local mode. It's working for me, but I can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop home directory. The root of the Hadoop home directory contains the following JARs: hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar The documentation makes it appear that streaming is part of the default install. I don't see anything that says I have to perform an extra step to get it installed. How do I get streaming installed on my laptop? Thanks.
Re: Help for the problem of running lucene on Hadoop
This will not work for indexing. Lucene requires random read/write to a file and HDFS does not support this. HDFS only allows sequential writes: you start at the beginninig and copy the file in to block 0, block 1,...block N. For querying, if your HDFS implementation makes a local cache that appears as a file system (I think FUSE does this?) it might work well. But, yes, you should copy it down. On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing azure...@gmail.com wrote: You should implement the Directory class by your self. Nutch provided one, named HDFSDirectory. You can use it to build the index, but when doing search on HDFS, it is relatively slower, especially on phrase queries. I recommend you to download it to disk when performing a search. On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote: Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File(index)),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks in advance! -- Regards, Jander -- Lance Norskog goks...@gmail.com
Re: Is hadoop-streaming.jar part of the Apache distribution?
I went to the top Google hit for Hadoop streaming and didn't notice that this was the 0.15.2 documentation instead of the one that matches my version. However, the 0.20.2 documentation has the same error: http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming . I verified that this is also the case with the files installed locally in my /opt/local/hadoop-0.20.2/docs folder. Is there a place I should file a documentation bug? On Fri, Dec 31, 2010 at 12:22 PM, Zhenhua Guo jen...@gmail.com wrote: The doc you mentioned is for Hadoop 0.15.2. But you seem to use 0.20.2. Probably you should read Hadoop docs for your installed version. Gerald On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill bill...@gmail.com wrote: Found it under /opt/hadoop/contrib/streaming. I am now able to run Hadoop streaming jobs on my laptop. By the way, here is the documentation I found confusing: http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming This seems to apply to my install, but says that the streaming JAR should be in the home directory with the other JARs instead of under contrib. On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.com wrote: It is one of the contrib modules. If you look in the src dir you will see a contrib dir containing all the contrib modules. On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote: I installed the Apache distribution http://hadoop.apache.org/ of Hadoop on my laptop and set it up to run in local mode. It's working for me, but I can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop home directory. The root of the Hadoop home directory contains the following JARs: hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar The documentation makes it appear that streaming is part of the default install. I don't see anything that says I have to perform an extra step to get it installed. How do I get streaming installed on my laptop? Thanks.
Re: HDFS FS Commands Hanging System
I suggest you should look through the logs to see if there is any error. And the second point that I need to point out is which node you run the command hadoop fs -ls . If you run the command on Node A, the configuration item fs.default.name should point to the HDFS. On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman jon2...@gmail.com wrote: Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between the daemons? What should I be looking at there? Or could it be something else? When I do jps, I do see all the daemons listed. Any other thoughts. Thanks again and happy new year. -Jon On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote: Try checking your dfs status hadoop dfsadmin -safemode get Probably says ON hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: Jon Lederman [mailto:jon2...@gmail.com] Sent: Fri 12/31/2010 11:00 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:HDFS FS Commands Hanging System Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps. However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs. So it appears to me that some communication is not occurring properly. Does anyone have any suggestions what I look into in order to fix this problem? Thanks in advance. -Jon -- -李平
Re: Is hadoop-streaming.jar part of the Apache distribution?
Hello, On Sat, Jan 1, 2011 at 5:32 AM, W.P. McNeill bill...@gmail.com wrote: However, the 0.20.2 documentation has the same error: http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming . Looks like the current release (0.21.0) and trunk also have the same error. Is there a place I should file a documentation bug? Yes, there is the Apache JIRA issue-tracker available for Hadoop MapReduce here: https://issues.apache.org/jira/browse/MAPREDUCE -- [documentation component] In case you're interested in submitting a patch, the sources for the documentation is available at src/docs/src/documentation/content/xdocs/streaming.xml -- Harsh J www.harshj.com
Re: Help for the problem of running lucene on Hadoop
Thanks for all the above reply. Now my idea is: running word segmentation on Hadoop and creating the inverted index in mysql. As we know, Hadoop MR supports writing and reading to mysql. Does this have any problem? On Sat, Jan 1, 2011 at 7:49 AM, James Seigel ja...@tynt.com wrote: Check out katta for an example J Sent from my mobile. Please excuse the typos. On 2010-12-31, at 4:47 PM, Lance Norskog goks...@gmail.com wrote: This will not work for indexing. Lucene requires random read/write to a file and HDFS does not support this. HDFS only allows sequential writes: you start at the beginninig and copy the file in to block 0, block 1,...block N. For querying, if your HDFS implementation makes a local cache that appears as a file system (I think FUSE does this?) it might work well. But, yes, you should copy it down. On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing azure...@gmail.com wrote: You should implement the Directory class by your self. Nutch provided one, named HDFSDirectory. You can use it to build the index, but when doing search on HDFS, it is relatively slower, especially on phrase queries. I recommend you to download it to disk when performing a search. On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote: Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File(index)),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks in advance! -- Regards, Jander -- Lance Norskog goks...@gmail.com -- Thanks, Jander