Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Zhou, Yunqing
You should implement the Directory class by your self.
Nutch provided one, named HDFSDirectory.
You can use it to build the index, but when doing search on HDFS, it is
relatively slower, especially on phrase queries.
I recommend you to download it to disk when performing a search.

On Fri, Dec 31, 2010 at 5:08 PM, Jander g  wrote:

> Hi, all
>
> I want  to run lucene on Hadoop, The problem as follows:
>
> IndexWriter writer = new IndexWriter(FSDirectory.open(new
> File("index")),new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>
> when using Hadoop, whether the first param must be the dir of HDFS? And how
> to use?
>
> Thanks in advance!
>
> --
> Regards,
> Jander
>


Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Zhou, Yunqing
You can use "map.input.split"(something like that, I can't remember..) param
in Configuration.
this param contains the input file path, you can use it to branch your logic
this param can be found in TextInputFormat.java

On Thu, Oct 14, 2010 at 10:03 PM, Matthew John
wrote:

> Hi all ,
>
>  I have been recently working on a task where I need to take in two input
> (types)  files , compare them and produce a result from it using a logic.
> But as I understand simple MapReduce implementations are for processing a
> single input type. The closest implementation I could think of similar to
> my
> work is Join MapReduce. But I am not able to understand much from the
> example provided in Hadoop .. Can someone provide a good pointer to such
> multiple input data processing ( or Join ) in mapreduce . It will also be
> great if you can send in some sample code for the same.
>
> Thanks ,
>
> Matthew
>