Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Hi, all

I want  to run lucene on Hadoop, The problem as follows:

IndexWriter writer = new IndexWriter(FSDirectory.open(new
File(index)),new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED);

when using Hadoop, whether the first param must be the dir of HDFS? And how
to use?

Thanks in advance!

-- 
Regards,
Jander


RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi,
I do get this:
$ jps
6017 DataNode
5805 NameNode
6234 SecondaryNameNode
6354 Jps

What can I do to start JobTracker?

Here my config Files:
$ cat mapred-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namemapred.job.tracker/name
valuelocalhost:9001/value
descriptionThe host and port that the MapReduce job tracker runs
at./description
/property
 /configuration


cat hdfs-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namedfs.replication/name
value1/value
descriptionThe actual number of replications can be specified when the
file is created./description
/property

/configuration

$ cat core-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
descriptionThe name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
/description
/property
/configuration

-Original Message-
From: James Seigel [mailto:ja...@tynt.com] 
Sent: Friday, December 31, 2010 4:56 AM
To: common-user@hadoop.apache.org
Subject: Re: Retrying connect to server

Or
3) The configuration (or lack thereof) on the machine you are trying to 
run this, has no idea where your DFS or JobTracker  is :)

Cheers
James.

On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote:

 Cavus,M.,Fa. Post Direkt wrote:
 I process this
 
 ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount 
 gutenberg gutenberg-output
 
 I get this
 Dıd anyone know why I get this Error?
 
 10/12/30 16:48:59 INFO security.Groups: Group mapping 
 impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
 cacheTimeout=30
 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 0 time(s).
 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 1 time(s).
 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 2 time(s).
 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 3 time(s).
 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 4 time(s).
 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 5 time(s).
 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 6 time(s).
 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 7 time(s).
 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 8 time(s).
 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 9 time(s).
 Exception in thread main java.net.ConnectException: Call to 
 localhost/127.0.0.1:9001 failed on connection exception: 
 java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
  at org.apache.hadoop.ipc.Client.call(Client.java:908)
  at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
  at $Proxy0.getProtocolVersion(Unknown Source)
  at 
 org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
  at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82)
  at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94)
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:70)
  at org.apache.hadoop.mapreduce.Job.init(Job.java:129)
  at org.apache.hadoop.mapreduce.Job.init(Job.java:134)
  at org.postdirekt.hadoop.WordCount.main(WordCount.java:19)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
 Caused by: java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
  at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
  at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417)
  at 

RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi,
I've forgotten to start start-mapred.sh

Thanks All

-Original Message-
From: Cavus,M.,Fa. Post Direkt [mailto:m.ca...@postdirekt.de] 
Sent: Friday, December 31, 2010 10:20 AM
To: common-user@hadoop.apache.org
Subject: RE: Retrying connect to server

Hi,
I do get this:
$ jps
6017 DataNode
5805 NameNode
6234 SecondaryNameNode
6354 Jps

What can I do to start JobTracker?

Here my config Files:
$ cat mapred-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namemapred.job.tracker/name
valuelocalhost:9001/value
descriptionThe host and port that the MapReduce job tracker runs
at./description
/property
 /configuration


cat hdfs-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namedfs.replication/name
value1/value
descriptionThe actual number of replications can be specified when the
file is created./description
/property

/configuration

$ cat core-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
descriptionThe name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
/description
/property
/configuration

-Original Message-
From: James Seigel [mailto:ja...@tynt.com] 
Sent: Friday, December 31, 2010 4:56 AM
To: common-user@hadoop.apache.org
Subject: Re: Retrying connect to server

Or
3) The configuration (or lack thereof) on the machine you are trying to 
run this, has no idea where your DFS or JobTracker  is :)

Cheers
James.

On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote:

 Cavus,M.,Fa. Post Direkt wrote:
 I process this
 
 ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount 
 gutenberg gutenberg-output
 
 I get this
 Dıd anyone know why I get this Error?
 
 10/12/30 16:48:59 INFO security.Groups: Group mapping 
 impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
 cacheTimeout=30
 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 0 time(s).
 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 1 time(s).
 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 2 time(s).
 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 3 time(s).
 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 4 time(s).
 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 5 time(s).
 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 6 time(s).
 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 7 time(s).
 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 8 time(s).
 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9001. Already tried 9 time(s).
 Exception in thread main java.net.ConnectException: Call to 
 localhost/127.0.0.1:9001 failed on connection exception: 
 java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
  at org.apache.hadoop.ipc.Client.call(Client.java:908)
  at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
  at $Proxy0.getProtocolVersion(Unknown Source)
  at 
 org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
  at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82)
  at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94)
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:70)
  at org.apache.hadoop.mapreduce.Job.init(Job.java:129)
  at org.apache.hadoop.mapreduce.Job.init(Job.java:134)
  at org.postdirekt.hadoop.WordCount.main(WordCount.java:19)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
 Caused by: java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
  at 
 

ClassNotFoundException

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
I look in my Jar File but I get a ClassNotFoundException why?:

$ jar -xvf hd.jar
dekomprimiert: META-INF/MANIFEST.MF
dekomprimiert: org/postdirekt/hadoop/Map.class
dekomprimiert: org/postdirekt/hadoop/Map.java
dekomprimiert: org/postdirekt/hadoop/WordCount.class
dekomprimiert: org/postdirekt/hadoop/WordCount.java
dekomprimiert: org/postdirekt/hadoop/Reduce2.class
dekomprimiert: org/postdirekt/hadoop/Reduce2.java

$ ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount
gutenberg gutenberg-output


10/12/31 10:26:54 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
10/12/31 10:26:54 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
10/12/31 10:26:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the
same.
10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
10/12/31 10:26:54 INFO input.FileInputFormat: Total input paths to
process : 1
10/12/31 10:26:55 WARN conf.Configuration: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
10/12/31 10:26:55 INFO mapreduce.JobSubmitter: number of splits:1
10/12/31 10:26:55 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
10/12/31 10:26:55 INFO mapreduce.Job: Running job: job_201012311021_0002
10/12/31 10:26:56 INFO mapreduce.Job:  map 0% reduce 0%
10/12/31 10:27:11 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.m
10/12/31 10:27:24 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_1, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.m
10/12/31 10:27:36 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at 

Re: ClassNotFoundException

2010-12-31 Thread Harsh J
The answer is in your log output:

10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).

Alternatively, use Job.setJarByClass(Class class);

On Fri, Dec 31, 2010 at 3:02 PM, Cavus,M.,Fa. Post Direkt
m.ca...@postdirekt.de wrote:
 I look in my Jar File but I get a ClassNotFoundException why?:

-- 
Harsh J
www.harshj.com


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Eason.Lee
You'd better make the index in localfile,and copy the final index into the
hdfs~~
It is not recommanded to using hdfs as the FileSystem for lucene(Though it
can be used for search)



2010/12/31 Jander g jande...@gmail.com

 Hi, all

 I want  to run lucene on Hadoop, The problem as follows:

 IndexWriter writer = new IndexWriter(FSDirectory.open(new
 File(index)),new StandardAnalyzer(), true,
 IndexWriter.MaxFieldLength.LIMITED);

 when using Hadoop, whether the first param must be the dir of HDFS? And how
 to use?

 Thanks in advance!

 --
 Regards,
 Jander



Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Zhou, Yunqing
You should implement the Directory class by your self.
Nutch provided one, named HDFSDirectory.
You can use it to build the index, but when doing search on HDFS, it is
relatively slower, especially on phrase queries.
I recommend you to download it to disk when performing a search.

On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote:

 Hi, all

 I want  to run lucene on Hadoop, The problem as follows:

 IndexWriter writer = new IndexWriter(FSDirectory.open(new
 File(index)),new StandardAnalyzer(), true,
 IndexWriter.MaxFieldLength.LIMITED);

 when using Hadoop, whether the first param must be the dir of HDFS? And how
 to use?

 Thanks in advance!

 --
 Regards,
 Jander



Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Zhou, Yunqing
You can use map.input.split(something like that, I can't remember..) param
in Configuration.
this param contains the input file path, you can use it to branch your logic
this param can be found in TextInputFormat.java

On Thu, Oct 14, 2010 at 10:03 PM, Matthew John
tmatthewjohn1...@gmail.comwrote:

 Hi all ,

  I have been recently working on a task where I need to take in two input
 (types)  files , compare them and produce a result from it using a logic.
 But as I understand simple MapReduce implementations are for processing a
 single input type. The closest implementation I could think of similar to
 my
 work is Join MapReduce. But I am not able to understand much from the
 example provided in Hadoop .. Can someone provide a good pointer to such
 multiple input data processing ( or Join ) in mapreduce . It will also be
 great if you can send in some sample code for the same.

 Thanks ,

 Matthew



Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Harsh J
It is map.input.file [.start and .length also relate to the InputSplit
for the mapper]
For more: 
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse

With a custom RR, you can put in this value yourself
(FileSplit.getPath()) before control heads to the Mapper/MapRunner.

On Fri, Dec 31, 2010 at 6:17 PM, Zhou, Yunqing azure...@gmail.com wrote:
 You can use map.input.split(something like that, I can't remember..) param
 in Configuration.
 this param contains the input file path, you can use it to branch your logic
 this param can be found in TextInputFormat.java

-- 
Harsh J
www.harshj.com


HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi All,

I have been working on running Hadoop on a new microprocessor architecture in 
pseudo-distributed mode.  I have been successful in getting SSH configured.  I 
am also able to start a namenode, secondary namenode, tasktracker, jobtracker 
and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the 
simple command hadoop fs -ls, the system hangs.  So it appears to me that some 
communication is not occurring properly.  Does anyone have any suggestions what 
I look into in order to fix this problem?

Thanks in advance.

-Jon

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
Found it under /opt/hadoop/contrib/streaming.  I am now able to run Hadoop
streaming jobs on my laptop.

By the way, here is the documentation I found confusing:

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming

This seems to apply to my install, but says that the streaming JAR should be
in the home directory with the other JARs instead of under contrib.


On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.comwrote:

 It is one of the contrib modules. If you look in the src dir you will see a
 contrib dir containing all the contrib modules.
 On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote:
  I installed the Apache distribution http://hadoop.apache.org/ of
 Hadoop
 on
  my laptop and set it up to run in local mode. It's working for me, but I
  can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop
  home directory. The root of the Hadoop home directory contains the
  following JARs:
 
  hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
  hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
 
  The documentation makes it appear that streaming is part of the default
  install. I don't see anything that says I have to perform an extra step
 to
  get it installed.
 
  How do I get streaming installed on my laptop?
 
  Thanks.



Re: HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi Michael,

Thanks for your response.  It doesn't seem to be an issue with safemode.

Even when I try the command dfsadmin -safemode get, the system hangs.  I am 
unable to execute any FS shell commands other than hadoop fs -help.

I am wondering whether this an issue with communication between the daemons?  
What should I be looking at there?  Or could it be something else?

When I do jps, I do see all the daemons listed.

Any other thoughts.

Thanks again and happy new year.

-Jon
On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

 Try checking your dfs status
 
 hadoop dfsadmin -safemode get
 
 Probably says ON
 
 hadoop dfsadmin -safemode leave
 
 Somebody else can probably say how to make this happen every reboot
 
 Michael D. Black
 Senior Scientist
 Advanced Analytics Directorate
 Northrop Grumman Information Systems
 
 
 
 
 From: Jon Lederman [mailto:jon2...@gmail.com]
 Sent: Fri 12/31/2010 11:00 AM
 To: common-user@hadoop.apache.org
 Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
 Hi All,
 
 I have been working on running Hadoop on a new microprocessor architecture in 
 pseudo-distributed mode.  I have been successful in getting SSH configured.  
 I am also able to start a namenode, secondary namenode, tasktracker, 
 jobtracker and datanode as evidenced by the response I get from jps.
 
 However, when I attempt to interact with the file system in any way such as 
 the simple command hadoop fs -ls, the system hangs.  So it appears to me that 
 some communication is not occurring properly.  Does anyone have any 
 suggestions what I look into in order to fix this problem?
 
 Thanks in advance.
 
 -Jon 
 



Re: HDFS FS Commands Hanging System

2010-12-31 Thread Todd Lipcon
Hi Jon,

Try:
HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

-Todd

On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman jon2...@gmail.com wrote:

 Hi Michael,

 Thanks for your response.  It doesn't seem to be an issue with safemode.

 Even when I try the command dfsadmin -safemode get, the system hangs.  I am
 unable to execute any FS shell commands other than hadoop fs -help.

 I am wondering whether this an issue with communication between the
 daemons?  What should I be looking at there?  Or could it be something else?

 When I do jps, I do see all the daemons listed.

 Any other thoughts.

 Thanks again and happy new year.

 -Jon
 On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

  Try checking your dfs status
 
  hadoop dfsadmin -safemode get
 
  Probably says ON
 
  hadoop dfsadmin -safemode leave
 
  Somebody else can probably say how to make this happen every reboot
 
  Michael D. Black
  Senior Scientist
  Advanced Analytics Directorate
  Northrop Grumman Information Systems
 
 
  
 
  From: Jon Lederman [mailto:jon2...@gmail.com]
  Sent: Fri 12/31/2010 11:00 AM
  To: common-user@hadoop.apache.org
  Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
  Hi All,
 
  I have been working on running Hadoop on a new microprocessor
 architecture in pseudo-distributed mode.  I have been successful in getting
 SSH configured.  I am also able to start a namenode, secondary namenode,
 tasktracker, jobtracker and datanode as evidenced by the response I get from
 jps.
 
  However, when I attempt to interact with the file system in any way such
 as the simple command hadoop fs -ls, the system hangs.  So it appears to me
 that some communication is not occurring properly.  Does anyone have any
 suggestions what I look into in order to fix this problem?
 
  Thanks in advance.
 
  -Jon
 




-- 
Todd Lipcon
Software Engineer, Cloudera


FW:FW

2010-12-31 Thread He Chen
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Zhenhua Guo
The doc you mentioned is for Hadoop 0.15.2. But you seem to use
0.20.2. Probably you should read Hadoop docs for your installed
version.

Gerald

On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill bill...@gmail.com wrote:
 Found it under /opt/hadoop/contrib/streaming.  I am now able to run Hadoop
 streaming jobs on my laptop.

 By the way, here is the documentation I found confusing:

 http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming

 This seems to apply to my install, but says that the streaming JAR should be
 in the home directory with the other JARs instead of under contrib.


 On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.comwrote:

 It is one of the contrib modules. If you look in the src dir you will see a
 contrib dir containing all the contrib modules.
 On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote:
  I installed the Apache distribution http://hadoop.apache.org/ of
 Hadoop
 on
  my laptop and set it up to run in local mode. It's working for me, but I
  can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop
  home directory. The root of the Hadoop home directory contains the
  following JARs:
 
  hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
  hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
 
  The documentation makes it appear that streaming is part of the default
  install. I don't see anything that says I have to perform an extra step
 to
  get it installed.
 
  How do I get streaming installed on my laptop?
 
  Thanks.




Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Lance Norskog
This will not work for indexing. Lucene requires random read/write to
a file and HDFS does not support this. HDFS only allows sequential
writes: you start at the beginninig and copy the file in to block 0,
block 1,...block N.

For querying, if your HDFS implementation makes a local cache that
appears as a file system (I think FUSE does this?) it might work well.
But, yes, you should copy it down.

On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing azure...@gmail.com wrote:
 You should implement the Directory class by your self.
 Nutch provided one, named HDFSDirectory.
 You can use it to build the index, but when doing search on HDFS, it is
 relatively slower, especially on phrase queries.
 I recommend you to download it to disk when performing a search.

 On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote:

 Hi, all

 I want  to run lucene on Hadoop, The problem as follows:

 IndexWriter writer = new IndexWriter(FSDirectory.open(new
 File(index)),new StandardAnalyzer(), true,
 IndexWriter.MaxFieldLength.LIMITED);

 when using Hadoop, whether the first param must be the dir of HDFS? And how
 to use?

 Thanks in advance!

 --
 Regards,
 Jander





-- 
Lance Norskog
goks...@gmail.com


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
I went to the top Google hit for Hadoop streaming and didn't notice that
this was the 0.15.2 documentation instead of the one that matches my
version.

However, the 0.20.2 documentation has the same error:
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming
.

I verified that this is also the case with the files installed locally in my
/opt/local/hadoop-0.20.2/docs folder.

Is there a place I should file a documentation bug?

On Fri, Dec 31, 2010 at 12:22 PM, Zhenhua Guo jen...@gmail.com wrote:

 The doc you mentioned is for Hadoop 0.15.2. But you seem to use
 0.20.2. Probably you should read Hadoop docs for your installed
 version.

 Gerald

 On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill bill...@gmail.com wrote:
  Found it under /opt/hadoop/contrib/streaming.  I am now able to run
 Hadoop
  streaming jobs on my laptop.
 
  By the way, here is the documentation I found confusing:
 
 
 http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming
 
  This seems to apply to my install, but says that the streaming JAR should
 be
  in the home directory with the other JARs instead of under contrib.
 
 
  On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope kengoodh...@gmail.com
 wrote:
 
  It is one of the contrib modules. If you look in the src dir you will
 see a
  contrib dir containing all the contrib modules.
  On Dec 31, 2010 10:38 AM, W.P. McNeill bill...@gmail.com wrote:
   I installed the Apache distribution http://hadoop.apache.org/ of
  Hadoop
  on
   my laptop and set it up to run in local mode. It's working for me, but
 I
   can't find the hadoop-streaming.jar file. It is nowhere under the
 Hadoop
   home directory. The root of the Hadoop home directory contains the
   following JARs:
  
   hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar
 hadoop-0.20.2-tools.jar
   hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
  
   The documentation makes it appear that streaming is part of the
 default
   install. I don't see anything that says I have to perform an extra
 step
  to
   get it installed.
  
   How do I get streaming installed on my laptop?
  
   Thanks.
 
 



Re: HDFS FS Commands Hanging System

2010-12-31 Thread li ping
I suggest you should look through the logs to see if there is any error.
And the second point that I need to point out is which node you run the
command hadoop fs -ls . If you run the command on Node A, the
configuration item fs.default.name should point to the HDFS.

On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman jon2...@gmail.com wrote:

 Hi Michael,

 Thanks for your response.  It doesn't seem to be an issue with safemode.

 Even when I try the command dfsadmin -safemode get, the system hangs.  I am
 unable to execute any FS shell commands other than hadoop fs -help.

 I am wondering whether this an issue with communication between the
 daemons?  What should I be looking at there?  Or could it be something else?

 When I do jps, I do see all the daemons listed.

 Any other thoughts.

 Thanks again and happy new year.

 -Jon
 On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

  Try checking your dfs status
 
  hadoop dfsadmin -safemode get
 
  Probably says ON
 
  hadoop dfsadmin -safemode leave
 
  Somebody else can probably say how to make this happen every reboot
 
  Michael D. Black
  Senior Scientist
  Advanced Analytics Directorate
  Northrop Grumman Information Systems
 
 
  
 
  From: Jon Lederman [mailto:jon2...@gmail.com]
  Sent: Fri 12/31/2010 11:00 AM
  To: common-user@hadoop.apache.org
  Subject: EXTERNAL:HDFS FS Commands Hanging System
 
 
 
  Hi All,
 
  I have been working on running Hadoop on a new microprocessor
 architecture in pseudo-distributed mode.  I have been successful in getting
 SSH configured.  I am also able to start a namenode, secondary namenode,
 tasktracker, jobtracker and datanode as evidenced by the response I get from
 jps.
 
  However, when I attempt to interact with the file system in any way such
 as the simple command hadoop fs -ls, the system hangs.  So it appears to me
 that some communication is not occurring properly.  Does anyone have any
 suggestions what I look into in order to fix this problem?
 
  Thanks in advance.
 
  -Jon
 




-- 
-李平


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Harsh J
Hello,

On Sat, Jan 1, 2011 at 5:32 AM, W.P. McNeill bill...@gmail.com wrote:
 However, the 0.20.2 documentation has the same error:
 http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming
 .

Looks like the current release (0.21.0) and trunk also have the same error.

 Is there a place I should file a documentation bug?

Yes, there is the Apache JIRA issue-tracker available for Hadoop
MapReduce here: https://issues.apache.org/jira/browse/MAPREDUCE --
[documentation component]

In case you're interested in submitting a patch, the sources for the
documentation is available at
src/docs/src/documentation/content/xdocs/streaming.xml

-- 
Harsh J
www.harshj.com


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Thanks for all the above reply.

Now my idea is: running word segmentation on Hadoop and creating the
inverted index in mysql. As we know, Hadoop MR supports writing and reading
to mysql.

Does this have any problem?

On Sat, Jan 1, 2011 at 7:49 AM, James Seigel ja...@tynt.com wrote:

 Check out katta for an example

 J

 Sent from my mobile. Please excuse the typos.

 On 2010-12-31, at 4:47 PM, Lance Norskog goks...@gmail.com wrote:

  This will not work for indexing. Lucene requires random read/write to
  a file and HDFS does not support this. HDFS only allows sequential
  writes: you start at the beginninig and copy the file in to block 0,
  block 1,...block N.
 
  For querying, if your HDFS implementation makes a local cache that
  appears as a file system (I think FUSE does this?) it might work well.
  But, yes, you should copy it down.
 
  On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing azure...@gmail.com
 wrote:
  You should implement the Directory class by your self.
  Nutch provided one, named HDFSDirectory.
  You can use it to build the index, but when doing search on HDFS, it is
  relatively slower, especially on phrase queries.
  I recommend you to download it to disk when performing a search.
 
  On Fri, Dec 31, 2010 at 5:08 PM, Jander g jande...@gmail.com wrote:
 
  Hi, all
 
  I want  to run lucene on Hadoop, The problem as follows:
 
  IndexWriter writer = new IndexWriter(FSDirectory.open(new
  File(index)),new StandardAnalyzer(), true,
  IndexWriter.MaxFieldLength.LIMITED);
 
  when using Hadoop, whether the first param must be the dir of HDFS? And
 how
  to use?
 
  Thanks in advance!
 
  --
  Regards,
  Jander
 
 
 
 
 
  --
  Lance Norskog
  goks...@gmail.com




-- 
Thanks,
Jander