Check out katta, as it can pull indexes from hdfs and deploy them into your search cluster. Katta also handles index directories that have been packed into a zip file. Katta can pull indexes from any file system that hadoop supports, hdfs, s3, hftp, file etc.
We have been doing this with our solr (solr-1301) indexes and getting an 80% reduction in size, which is a big gain for us. I need to feed a 2 line change back into solr-1301 as the close method can fail to heart beat while the optimize is happening, in some situations right now. On Tue, Oct 6, 2009 at 9:30 PM, ctam <ctamra...@gmail.com> wrote: > > hi Ning , I am also looking at different approaches on indexing with hadoop > , > I could index using contrib package for hadoop into HDFS but since its not > designed for random access what would be the other recommended ways to > move > them to Local file system > > Also what would be the best approach to begin with ? should we look into > katta or solr integrations ? > > thanks in advance. > > > Ning Li-5 wrote: > > > >> I'm missing why you would ever want the Lucene index in HDFS for > >> reading. > > > > The Lucene indexes are written to HDFS, but that does not mean you > > conduct search on the indexes stored in HDFS directly. HDFS is not > > designed for random access. Usually the indexes are copied to the > > nodes where search will be served. With > > http://issues.apache.org/jira/browse/HADOOP-4801, however, it may > > become feasible to search on HDFS directly. > > > > Cheers, > > Ning > > > > > > On Mon, Mar 16, 2009 at 4:52 PM, Ian Soboroff <ian.sobor...@nist.gov> > > wrote: > >> > >> Does anyone have stats on how multiple readers on an optimized Lucene > >> index in HDFS compares with a ParallelMultiReader (or whatever its > >> called) over RPC on a local filesystem? > >> > >> I'm missing why you would ever want the Lucene index in HDFS for > >> reading. > >> > >> Ian > >> > >> Ning Li <ning.li...@gmail.com> writes: > >> > >>> I should have pointed out that Nutch index build and contrib/index > >>> targets different applications. The latter is for applications who > >>> simply want to build Lucene index from a set of documents - e.g. no > >>> link analysis. > >>> > >>> As to writing Lucene indexes, both work the same way - write the final > >>> results to local file system and then copy to HDFS. In contrib/index, > >>> the intermediate results are in memory and not written to HDFS. > >>> > >>> Hope it clarifies things. > >>> > >>> Cheers, > >>> Ning > >>> > >>> > >>> On Mon, Mar 16, 2009 at 2:57 PM, Ian Soboroff <ian.sobor...@nist.gov> > >>> wrote: > >>>> > >>>> I understand why you would index in the reduce phase, because the > >>>> anchor > >>>> text gets shuffled to be next to the document. However, when you > index > >>>> in the map phase, don't you just have to reindex later? > >>>> > >>>> The main point to the OP is that HDFS is a bad FS for writing Lucene > >>>> indexes because of how Lucene works. The simple approach is to write > >>>> your index outside of HDFS in the reduce phase, and then merge the > >>>> indexes from each reducer manually. > >>>> > >>>> Ian > >>>> > >>>> Ning Li <ning.li...@gmail.com> writes: > >>>> > >>>>> Or you can check out the index contrib. The difference of the two is > >>>>> that: > >>>>> - In Nutch's indexing map/reduce job, indexes are built in the > >>>>> reduce phase. Afterwards, they are merged into smaller number of > >>>>> shards if necessary. The last time I checked, the merge process does > >>>>> not use map/reduce. > >>>>> - In contrib/index, small indexes are built in the map phase. They > >>>>> are merged into the desired number of shards in the reduce phase. In > >>>>> addition, they can be merged into existing shards. > >>>>> > >>>>> Cheers, > >>>>> Ning > >>>>> > >>>>> > >>>>> On Fri, Mar 13, 2009 at 1:34 AM, 王红宝 <imcap...@126.com> wrote: > >>>>>> you can see the nutch code. > >>>>>> > >>>>>> 2009/3/13 Mark Kerzner <markkerz...@gmail.com> > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> How do I allow multiple nodes to write to the same index file in > >>>>>>> HDFS? > >>>>>>> > >>>>>>> Thank you, > >>>>>>> Mark > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Creating-Lucene-index-in-Hadoop-tp22490120p25780366.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals