Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Thanks for all the above reply.

Now my idea is: running word segmentation on Hadoop and creating the
inverted index in mysql. As we know, Hadoop MR supports writing and reading
to mysql.

Does this have any problem?

On Sat, Jan 1, 2011 at 7:49 AM, James Seigel  wrote:

> Check out katta for an example
>
> J
>
> Sent from my mobile. Please excuse the typos.
>
> On 2010-12-31, at 4:47 PM, Lance Norskog  wrote:
>
> > This will not work for indexing. Lucene requires random read/write to
> > a file and HDFS does not support this. HDFS only allows sequential
> > writes: you start at the beginninig and copy the file in to block 0,
> > block 1,...block N.
> >
> > For querying, if your HDFS implementation makes a local cache that
> > appears as a file system (I think FUSE does this?) it might work well.
> > But, yes, you should copy it down.
> >
> > On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing 
> wrote:
> >> You should implement the Directory class by your self.
> >> Nutch provided one, named HDFSDirectory.
> >> You can use it to build the index, but when doing search on HDFS, it is
> >> relatively slower, especially on phrase queries.
> >> I recommend you to download it to disk when performing a search.
> >>
> >> On Fri, Dec 31, 2010 at 5:08 PM, Jander g  wrote:
> >>
> >>> Hi, all
> >>>
> >>> I want  to run lucene on Hadoop, The problem as follows:
> >>>
> >>> IndexWriter writer = new IndexWriter(FSDirectory.open(new
> >>> File("index")),new StandardAnalyzer(), true,
> >>> IndexWriter.MaxFieldLength.LIMITED);
> >>>
> >>> when using Hadoop, whether the first param must be the dir of HDFS? And
> how
> >>> to use?
> >>>
> >>> Thanks in advance!
> >>>
> >>> --
> >>> Regards,
> >>> Jander
> >>>
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
>



-- 
Thanks,
Jander


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Harsh J
Hello,

On Sat, Jan 1, 2011 at 5:32 AM, W.P. McNeill  wrote:
> However, the 0.20.2 documentation has the same error:
> http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming
> .

Looks like the current release (0.21.0) and trunk also have the same error.

> Is there a place I should file a documentation bug?

Yes, there is the Apache JIRA issue-tracker available for Hadoop
MapReduce here: https://issues.apache.org/jira/browse/MAPREDUCE --
["documentation" component]

In case you're interested in submitting a patch, the sources for the
documentation is available at
src/docs/src/documentation/content/xdocs/streaming.xml

-- 
Harsh J
www.harshj.com


Re: HDFS FS Commands Hanging System

2010-12-31 Thread li ping
I suggest you should look through the logs to see if there is any error.
And the second point that I need to point out is which node you run the
command "hadoop fs -ls ". If you run the command on Node A, the
configuration item "fs.default.name" should point to the HDFS.

On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman  wrote:

> Hi Michael,
>
> Thanks for your response.  It doesn't seem to be an issue with safemode.
>
> Even when I try the command dfsadmin -safemode get, the system hangs.  I am
> unable to execute any FS shell commands other than hadoop fs -help.
>
> I am wondering whether this an issue with communication between the
> daemons?  What should I be looking at there?  Or could it be something else?
>
> When I do jps, I do see all the daemons listed.
>
> Any other thoughts.
>
> Thanks again and happy new year.
>
> -Jon
> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>
> > Try checking your dfs status
> >
> > hadoop dfsadmin -safemode get
> >
> > Probably says "ON"
> >
> > hadoop dfsadmin -safemode leave
> >
> > Somebody else can probably say how to make this happen every reboot
> >
> > Michael D. Black
> > Senior Scientist
> > Advanced Analytics Directorate
> > Northrop Grumman Information Systems
> >
> >
> > 
> >
> > From: Jon Lederman [mailto:jon2...@gmail.com]
> > Sent: Fri 12/31/2010 11:00 AM
> > To: common-user@hadoop.apache.org
> > Subject: EXTERNAL:HDFS FS Commands Hanging System
> >
> >
> >
> > Hi All,
> >
> > I have been working on running Hadoop on a new microprocessor
> architecture in pseudo-distributed mode.  I have been successful in getting
> SSH configured.  I am also able to start a namenode, secondary namenode,
> tasktracker, jobtracker and datanode as evidenced by the response I get from
> jps.
> >
> > However, when I attempt to interact with the file system in any way such
> as the simple command hadoop fs -ls, the system hangs.  So it appears to me
> that some communication is not occurring properly.  Does anyone have any
> suggestions what I look into in order to fix this problem?
> >
> > Thanks in advance.
> >
> > -Jon
> >
>
>


-- 
-李平


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
I went to the top Google hit for "Hadoop streaming" and didn't notice that
this was the 0.15.2 documentation instead of the one that matches my
version.

However, the 0.20.2 documentation has the same error:
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming
.

I verified that this is also the case with the files installed locally in my
/opt/local/hadoop-0.20.2/docs folder.

Is there a place I should file a documentation bug?

On Fri, Dec 31, 2010 at 12:22 PM, Zhenhua Guo  wrote:

> The doc you mentioned is for Hadoop 0.15.2. But you seem to use
> 0.20.2. Probably you should read Hadoop docs for your installed
> version.
>
> Gerald
>
> On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill  wrote:
> > Found it under /opt/hadoop/contrib/streaming.  I am now able to run
> Hadoop
> > streaming jobs on my laptop.
> >
> > By the way, here is the documentation I found confusing:
> >
> >
> http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming
> >
> > This seems to apply to my install, but says that the streaming JAR should
> be
> > in the home directory with the other JARs instead of under contrib.
> >
> >
> > On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope  >wrote:
> >
> >> It is one of the contrib modules. If you look in the src dir you will
> see a
> >> contrib dir containing all the contrib modules.
> >> On Dec 31, 2010 10:38 AM, "W.P. McNeill"  wrote:
> >> > I installed the Apache distribution  of
> >> Hadoop
> >> on
> >> > my laptop and set it up to run in local mode. It's working for me, but
> I
> >> > can't find the hadoop-streaming.jar file. It is nowhere under the
> Hadoop
> >> > home directory. The root of the Hadoop home directory contains the
> >> > following JARs:
> >> >
> >> > hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar
> hadoop-0.20.2-tools.jar
> >> > hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
> >> >
> >> > The documentation makes it appear that streaming is part of the
> default
> >> > install. I don't see anything that says I have to perform an extra
> step
> >> to
> >> > get it installed.
> >> >
> >> > How do I get streaming installed on my laptop?
> >> >
> >> > Thanks.
> >>
> >
>


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread James Seigel
Check out katta for an example

J

Sent from my mobile. Please excuse the typos.

On 2010-12-31, at 4:47 PM, Lance Norskog  wrote:

> This will not work for indexing. Lucene requires random read/write to
> a file and HDFS does not support this. HDFS only allows sequential
> writes: you start at the beginninig and copy the file in to block 0,
> block 1,...block N.
>
> For querying, if your HDFS implementation makes a local cache that
> appears as a file system (I think FUSE does this?) it might work well.
> But, yes, you should copy it down.
>
> On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing  wrote:
>> You should implement the Directory class by your self.
>> Nutch provided one, named HDFSDirectory.
>> You can use it to build the index, but when doing search on HDFS, it is
>> relatively slower, especially on phrase queries.
>> I recommend you to download it to disk when performing a search.
>>
>> On Fri, Dec 31, 2010 at 5:08 PM, Jander g  wrote:
>>
>>> Hi, all
>>>
>>> I want  to run lucene on Hadoop, The problem as follows:
>>>
>>> IndexWriter writer = new IndexWriter(FSDirectory.open(new
>>> File("index")),new StandardAnalyzer(), true,
>>> IndexWriter.MaxFieldLength.LIMITED);
>>>
>>> when using Hadoop, whether the first param must be the dir of HDFS? And how
>>> to use?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Regards,
>>> Jander
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Lance Norskog
This will not work for indexing. Lucene requires random read/write to
a file and HDFS does not support this. HDFS only allows sequential
writes: you start at the beginninig and copy the file in to block 0,
block 1,...block N.

For querying, if your HDFS implementation makes a local cache that
appears as a file system (I think FUSE does this?) it might work well.
But, yes, you should copy it down.

On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing  wrote:
> You should implement the Directory class by your self.
> Nutch provided one, named HDFSDirectory.
> You can use it to build the index, but when doing search on HDFS, it is
> relatively slower, especially on phrase queries.
> I recommend you to download it to disk when performing a search.
>
> On Fri, Dec 31, 2010 at 5:08 PM, Jander g  wrote:
>
>> Hi, all
>>
>> I want  to run lucene on Hadoop, The problem as follows:
>>
>> IndexWriter writer = new IndexWriter(FSDirectory.open(new
>> File("index")),new StandardAnalyzer(), true,
>> IndexWriter.MaxFieldLength.LIMITED);
>>
>> when using Hadoop, whether the first param must be the dir of HDFS? And how
>> to use?
>>
>> Thanks in advance!
>>
>> --
>> Regards,
>> Jander
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Zhenhua Guo
The doc you mentioned is for Hadoop 0.15.2. But you seem to use
0.20.2. Probably you should read Hadoop docs for your installed
version.

Gerald

On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill  wrote:
> Found it under /opt/hadoop/contrib/streaming.  I am now able to run Hadoop
> streaming jobs on my laptop.
>
> By the way, here is the documentation I found confusing:
>
> http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming
>
> This seems to apply to my install, but says that the streaming JAR should be
> in the home directory with the other JARs instead of under contrib.
>
>
> On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope wrote:
>
>> It is one of the contrib modules. If you look in the src dir you will see a
>> contrib dir containing all the contrib modules.
>> On Dec 31, 2010 10:38 AM, "W.P. McNeill"  wrote:
>> > I installed the Apache distribution  of
>> Hadoop
>> on
>> > my laptop and set it up to run in local mode. It's working for me, but I
>> > can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop
>> > home directory. The root of the Hadoop home directory contains the
>> > following JARs:
>> >
>> > hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
>> > hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
>> >
>> > The documentation makes it appear that streaming is part of the default
>> > install. I don't see anything that says I have to perform an extra step
>> to
>> > get it installed.
>> >
>> > How do I get streaming installed on my laptop?
>> >
>> > Thanks.
>>
>


FW:FW

2010-12-31 Thread He Chen
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.


Re: HDFS FS Commands Hanging System

2010-12-31 Thread Todd Lipcon
Hi Jon,

Try:
HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

-Todd

On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman  wrote:

> Hi Michael,
>
> Thanks for your response.  It doesn't seem to be an issue with safemode.
>
> Even when I try the command dfsadmin -safemode get, the system hangs.  I am
> unable to execute any FS shell commands other than hadoop fs -help.
>
> I am wondering whether this an issue with communication between the
> daemons?  What should I be looking at there?  Or could it be something else?
>
> When I do jps, I do see all the daemons listed.
>
> Any other thoughts.
>
> Thanks again and happy new year.
>
> -Jon
> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>
> > Try checking your dfs status
> >
> > hadoop dfsadmin -safemode get
> >
> > Probably says "ON"
> >
> > hadoop dfsadmin -safemode leave
> >
> > Somebody else can probably say how to make this happen every reboot
> >
> > Michael D. Black
> > Senior Scientist
> > Advanced Analytics Directorate
> > Northrop Grumman Information Systems
> >
> >
> > 
> >
> > From: Jon Lederman [mailto:jon2...@gmail.com]
> > Sent: Fri 12/31/2010 11:00 AM
> > To: common-user@hadoop.apache.org
> > Subject: EXTERNAL:HDFS FS Commands Hanging System
> >
> >
> >
> > Hi All,
> >
> > I have been working on running Hadoop on a new microprocessor
> architecture in pseudo-distributed mode.  I have been successful in getting
> SSH configured.  I am also able to start a namenode, secondary namenode,
> tasktracker, jobtracker and datanode as evidenced by the response I get from
> jps.
> >
> > However, when I attempt to interact with the file system in any way such
> as the simple command hadoop fs -ls, the system hangs.  So it appears to me
> that some communication is not occurring properly.  Does anyone have any
> suggestions what I look into in order to fix this problem?
> >
> > Thanks in advance.
> >
> > -Jon
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi Michael,

Thanks for your response.  It doesn't seem to be an issue with safemode.

Even when I try the command dfsadmin -safemode get, the system hangs.  I am 
unable to execute any FS shell commands other than hadoop fs -help.

I am wondering whether this an issue with communication between the daemons?  
What should I be looking at there?  Or could it be something else?

When I do jps, I do see all the daemons listed.

Any other thoughts.

Thanks again and happy new year.

-Jon
On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

> Try checking your dfs status
> 
> hadoop dfsadmin -safemode get
> 
> Probably says "ON"
> 
> hadoop dfsadmin -safemode leave
> 
> Somebody else can probably say how to make this happen every reboot
> 
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
> 
> 
> 
> 
> From: Jon Lederman [mailto:jon2...@gmail.com]
> Sent: Fri 12/31/2010 11:00 AM
> To: common-user@hadoop.apache.org
> Subject: EXTERNAL:HDFS FS Commands Hanging System
> 
> 
> 
> Hi All,
> 
> I have been working on running Hadoop on a new microprocessor architecture in 
> pseudo-distributed mode.  I have been successful in getting SSH configured.  
> I am also able to start a namenode, secondary namenode, tasktracker, 
> jobtracker and datanode as evidenced by the response I get from jps.
> 
> However, when I attempt to interact with the file system in any way such as 
> the simple command hadoop fs -ls, the system hangs.  So it appears to me that 
> some communication is not occurring properly.  Does anyone have any 
> suggestions what I look into in order to fix this problem?
> 
> Thanks in advance.
> 
> -Jon 
> 



Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
Found it under /opt/hadoop/contrib/streaming.  I am now able to run Hadoop
streaming jobs on my laptop.

By the way, here is the documentation I found confusing:

http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming

This seems to apply to my install, but says that the streaming JAR should be
in the home directory with the other JARs instead of under contrib.


On Fri, Dec 31, 2010 at 10:54 AM, Ken Goodhope wrote:

> It is one of the contrib modules. If you look in the src dir you will see a
> contrib dir containing all the contrib modules.
> On Dec 31, 2010 10:38 AM, "W.P. McNeill"  wrote:
> > I installed the Apache distribution  of
> Hadoop
> on
> > my laptop and set it up to run in local mode. It's working for me, but I
> > can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop
> > home directory. The root of the Hadoop home directory contains the
> > following JARs:
> >
> > hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
> > hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
> >
> > The documentation makes it appear that streaming is part of the default
> > install. I don't see anything that says I have to perform an extra step
> to
> > get it installed.
> >
> > How do I get streaming installed on my laptop?
> >
> > Thanks.
>


Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Ken Goodhope
It is one of the contrib modules. If you look in the src dir you will see a
contrib dir containing all the contrib modules.
On Dec 31, 2010 10:38 AM, "W.P. McNeill"  wrote:
> I installed the Apache distribution  of Hadoop
on
> my laptop and set it up to run in local mode. It's working for me, but I
> can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop
> home directory. The root of the Hadoop home directory contains the
> following JARs:
>
> hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
> hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar
>
> The documentation makes it appear that streaming is part of the default
> install. I don't see anything that says I have to perform an extra step to
> get it installed.
>
> How do I get streaming installed on my laptop?
>
> Thanks.


Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
I installed the Apache distribution  of Hadoop on
my laptop and set it up to run in local mode.  It's working for me, but I
can't find the hadoop-streaming.jar file.  It is nowhere under the Hadoop
home directory.  The root of the Hadoop home directory contains the
following JARs:

hadoop-0.20.2-ant.jar hadoop-0.20.2-examples.jar hadoop-0.20.2-tools.jar
hadoop-0.20.2-core.jar hadoop-0.20.2-test.jar

The documentation makes it appear that streaming is part of the default
install.  I don't see anything that says I have to perform an extra step to
get it installed.

How do I get streaming installed on my laptop?

Thanks.


RE:HDFS FS Commands Hanging System

2010-12-31 Thread Black, Michael (IS)
Try checking your dfs status
 
hadoop dfsadmin -safemode get
 
Probably says "ON"
 
hadoop dfsadmin -safemode leave
 
Somebody else can probably say how to make this happen every reboot
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 



From: Jon Lederman [mailto:jon2...@gmail.com]
Sent: Fri 12/31/2010 11:00 AM
To: common-user@hadoop.apache.org
Subject: EXTERNAL:HDFS FS Commands Hanging System



Hi All,

I have been working on running Hadoop on a new microprocessor architecture in 
pseudo-distributed mode.  I have been successful in getting SSH configured.  I 
am also able to start a namenode, secondary namenode, tasktracker, jobtracker 
and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the 
simple command hadoop fs -ls, the system hangs.  So it appears to me that some 
communication is not occurring properly.  Does anyone have any suggestions what 
I look into in order to fix this problem?

Thanks in advance.

-Jon 



HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi All,

I have been working on running Hadoop on a new microprocessor architecture in 
pseudo-distributed mode.  I have been successful in getting SSH configured.  I 
am also able to start a namenode, secondary namenode, tasktracker, jobtracker 
and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the 
simple command hadoop fs -ls, the system hangs.  So it appears to me that some 
communication is not occurring properly.  Does anyone have any suggestions what 
I look into in order to fix this problem?

Thanks in advance.

-Jon

Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Harsh J
It is map.input.file [.start and .length also relate to the InputSplit
for the mapper]
For more: 
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse

With a custom RR, you can put in this value yourself
(FileSplit.getPath()) before control heads to the Mapper/MapRunner.

On Fri, Dec 31, 2010 at 6:17 PM, Zhou, Yunqing  wrote:
> You can use "map.input.split"(something like that, I can't remember..) param
> in Configuration.
> this param contains the input file path, you can use it to branch your logic
> this param can be found in TextInputFormat.java

-- 
Harsh J
www.harshj.com


Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Zhou, Yunqing
You can use "map.input.split"(something like that, I can't remember..) param
in Configuration.
this param contains the input file path, you can use it to branch your logic
this param can be found in TextInputFormat.java

On Thu, Oct 14, 2010 at 10:03 PM, Matthew John
wrote:

> Hi all ,
>
>  I have been recently working on a task where I need to take in two input
> (types)  files , compare them and produce a result from it using a logic.
> But as I understand simple MapReduce implementations are for processing a
> single input type. The closest implementation I could think of similar to
> my
> work is Join MapReduce. But I am not able to understand much from the
> example provided in Hadoop .. Can someone provide a good pointer to such
> multiple input data processing ( or Join ) in mapreduce . It will also be
> great if you can send in some sample code for the same.
>
> Thanks ,
>
> Matthew
>


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Zhou, Yunqing
You should implement the Directory class by your self.
Nutch provided one, named HDFSDirectory.
You can use it to build the index, but when doing search on HDFS, it is
relatively slower, especially on phrase queries.
I recommend you to download it to disk when performing a search.

On Fri, Dec 31, 2010 at 5:08 PM, Jander g  wrote:

> Hi, all
>
> I want  to run lucene on Hadoop, The problem as follows:
>
> IndexWriter writer = new IndexWriter(FSDirectory.open(new
> File("index")),new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>
> when using Hadoop, whether the first param must be the dir of HDFS? And how
> to use?
>
> Thanks in advance!
>
> --
> Regards,
> Jander
>


Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Eason.Lee
You'd better make the index in localfile,and copy the final index into the
hdfs~~
It is not recommanded to using hdfs as the FileSystem for lucene(Though it
can be used for search)



2010/12/31 Jander g 

> Hi, all
>
> I want  to run lucene on Hadoop, The problem as follows:
>
> IndexWriter writer = new IndexWriter(FSDirectory.open(new
> File("index")),new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>
> when using Hadoop, whether the first param must be the dir of HDFS? And how
> to use?
>
> Thanks in advance!
>
> --
> Regards,
> Jander
>


Re: ClassNotFoundException

2010-12-31 Thread Harsh J
The answer is in your log output:

10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).

Alternatively, use Job.setJarByClass(Class class);

On Fri, Dec 31, 2010 at 3:02 PM, Cavus,M.,Fa. Post Direkt
 wrote:
> I look in my Jar File but I get a ClassNotFoundException why?:

-- 
Harsh J
www.harshj.com


ClassNotFoundException

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
I look in my Jar File but I get a ClassNotFoundException why?:

$ jar -xvf hd.jar
dekomprimiert: META-INF/MANIFEST.MF
dekomprimiert: org/postdirekt/hadoop/Map.class
dekomprimiert: org/postdirekt/hadoop/Map.java
dekomprimiert: org/postdirekt/hadoop/WordCount.class
dekomprimiert: org/postdirekt/hadoop/WordCount.java
dekomprimiert: org/postdirekt/hadoop/Reduce2.class
dekomprimiert: org/postdirekt/hadoop/Reduce2.java

$ ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount
gutenberg gutenberg-output


10/12/31 10:26:54 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
10/12/31 10:26:54 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
10/12/31 10:26:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the
same.
10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
10/12/31 10:26:54 INFO input.FileInputFormat: Total input paths to
process : 1
10/12/31 10:26:55 WARN conf.Configuration: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
10/12/31 10:26:55 INFO mapreduce.JobSubmitter: number of splits:1
10/12/31 10:26:55 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
10/12/31 10:26:55 INFO mapreduce.Job: Running job: job_201012311021_0002
10/12/31 10:26:56 INFO mapreduce.Job:  map 0% reduce 0%
10/12/31 10:27:11 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.m
10/12/31 10:27:24 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_1, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.m
10/12/31 10:27:36 INFO mapreduce.Job: Task Id :
attempt_201012311021_0002_m_00_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.postdirekt.hadoop.Map
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1128)
at
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContex
tImpl.java:167)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:612)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
n.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: org.postdirekt.hadoop.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java

RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi,
I've forgotten to start start-mapred.sh

Thanks All

-Original Message-
From: Cavus,M.,Fa. Post Direkt [mailto:m.ca...@postdirekt.de] 
Sent: Friday, December 31, 2010 10:20 AM
To: common-user@hadoop.apache.org
Subject: RE: Retrying connect to server

Hi,
I do get this:
$ jps
6017 DataNode
5805 NameNode
6234 SecondaryNameNode
6354 Jps

What can I do to start JobTracker?

Here my config Files:
$ cat mapred-site.xml







mapred.job.tracker
localhost:9001
The host and port that the MapReduce job tracker runs
at.

 


cat hdfs-site.xml







dfs.replication
1
The actual number of replications can be specified when the
file is created.




$ cat core-site.xml







fs.default.name
hdfs://localhost:9000
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.




-Original Message-
From: James Seigel [mailto:ja...@tynt.com] 
Sent: Friday, December 31, 2010 4:56 AM
To: common-user@hadoop.apache.org
Subject: Re: Retrying connect to server

Or
3) The configuration (or lack thereof) on the machine you are trying to 
run this, has no idea where your DFS or JobTracker  is :)

Cheers
James.

On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote:

> Cavus,M.,Fa. Post Direkt wrote:
>> I process this
>> 
>> ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount 
>> gutenberg gutenberg-output
>> 
>> I get this
>> Dıd anyone know why I get this Error?
>> 
>> 10/12/30 16:48:59 INFO security.Groups: Group mapping 
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
>> cacheTimeout=30
>> 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 0 time(s).
>> 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 1 time(s).
>> 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 2 time(s).
>> 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 3 time(s).
>> 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 4 time(s).
>> 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 5 time(s).
>> 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 6 time(s).
>> 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 7 time(s).
>> 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 8 time(s).
>> 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 9 time(s).
>> Exception in thread "main" java.net.ConnectException: Call to 
>> localhost/127.0.0.1:9001 failed on connection exception: 
>> java.net.ConnectException: Connection refused
>>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
>>  at org.apache.hadoop.ipc.Client.call(Client.java:908)
>>  at 
>> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>>  at $Proxy0.getProtocolVersion(Unknown Source)
>>  at 
>> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
>>  at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82)
>>  at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94)
>>  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:70)
>>  at org.apache.hadoop.mapreduce.Job.(Job.java:129)
>>  at org.apache.hadoop.mapreduce.Job.(Job.java:134)
>>  at org.postdirekt.hadoop.WordCount.main(WordCount.java:19)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
>> Caused by: java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>  at 
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
>>  at 
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417)
>>  at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:207)
>>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1025)
>>  at org.apache.hadoop.ipc.Client.call(Client.java:885)
>>  ... 15 more
>>  
> This is the most common issue occured after configuring Hadoop Cluster.
> 
> 

RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi,
I do get this:
$ jps
6017 DataNode
5805 NameNode
6234 SecondaryNameNode
6354 Jps

What can I do to start JobTracker?

Here my config Files:
$ cat mapred-site.xml







mapred.job.tracker
localhost:9001
The host and port that the MapReduce job tracker runs
at.

 


cat hdfs-site.xml







dfs.replication
1
The actual number of replications can be specified when the
file is created.




$ cat core-site.xml







fs.default.name
hdfs://localhost:9000
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.




-Original Message-
From: James Seigel [mailto:ja...@tynt.com] 
Sent: Friday, December 31, 2010 4:56 AM
To: common-user@hadoop.apache.org
Subject: Re: Retrying connect to server

Or
3) The configuration (or lack thereof) on the machine you are trying to 
run this, has no idea where your DFS or JobTracker  is :)

Cheers
James.

On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote:

> Cavus,M.,Fa. Post Direkt wrote:
>> I process this
>> 
>> ./hadoop jar ../../hadoopjar/hd.jar org.postdirekt.hadoop.WordCount 
>> gutenberg gutenberg-output
>> 
>> I get this
>> Dıd anyone know why I get this Error?
>> 
>> 10/12/30 16:48:59 INFO security.Groups: Group mapping 
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
>> cacheTimeout=30
>> 10/12/30 16:49:01 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 0 time(s).
>> 10/12/30 16:49:02 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 1 time(s).
>> 10/12/30 16:49:03 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 2 time(s).
>> 10/12/30 16:49:04 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 3 time(s).
>> 10/12/30 16:49:05 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 4 time(s).
>> 10/12/30 16:49:06 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 5 time(s).
>> 10/12/30 16:49:07 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 6 time(s).
>> 10/12/30 16:49:08 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 7 time(s).
>> 10/12/30 16:49:09 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 8 time(s).
>> 10/12/30 16:49:10 INFO ipc.Client: Retrying connect to server: 
>> localhost/127.0.0.1:9001. Already tried 9 time(s).
>> Exception in thread "main" java.net.ConnectException: Call to 
>> localhost/127.0.0.1:9001 failed on connection exception: 
>> java.net.ConnectException: Connection refused
>>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
>>  at org.apache.hadoop.ipc.Client.call(Client.java:908)
>>  at 
>> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>>  at $Proxy0.getProtocolVersion(Unknown Source)
>>  at 
>> org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:228)
>>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:224)
>>  at org.apache.hadoop.mapreduce.Cluster.createRPCProxy(Cluster.java:82)
>>  at org.apache.hadoop.mapreduce.Cluster.createClient(Cluster.java:94)
>>  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:70)
>>  at org.apache.hadoop.mapreduce.Job.(Job.java:129)
>>  at org.apache.hadoop.mapreduce.Job.(Job.java:134)
>>  at org.postdirekt.hadoop.WordCount.main(WordCount.java:19)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
>> Caused by: java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>  at 
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
>>  at 
>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417)
>>  at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:207)
>>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1025)
>>  at org.apache.hadoop.ipc.Client.call(Client.java:885)
>>  ... 15 more
>>  
> This is the most common issue occured after configuring Hadoop Cluster.
> 
> Reason :
> 
> 1. Your NameNode, JobTracker is not running. Verify through Web UI and jps 
> commands.
> 2. DNS Resolution. You must have IP hostname enteries if all nodes in 
> /etc/hosts file.
> 
> 
> 
> Best Regards
> 
> Adarsh Sharma



Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Hi, all

I want  to run lucene on Hadoop, The problem as follows:

IndexWriter writer = new IndexWriter(FSDirectory.open(new
File("index")),new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED);

when using Hadoop, whether the first param must be the dir of HDFS? And how
to use?

Thanks in advance!

-- 
Regards,
Jander