trying to select technology
Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
Think of Lucene and Apache SOLR Cheers, Jagaran From: cs230 chintanjs...@gmail.com To: core-u...@hadoop.apache.org Sent: Tue, 31 May, 2011 10:50:49 AM Subject: trying to select technology Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes. --Matt On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
To pile on, thousands or millions of documents are well within the range that is well addressed by Lucene. Solr may be an even better option than bare Lucene since it handles lots of the boilerplate problems like document parsing and index update scheduling. On Tue, May 31, 2011 at 11:56 AM, Matthew Foley ma...@yahoo-inc.com wrote: Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes. --Matt On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
Hi, I think you should check out MarkLogic, a product with database and search capabilities especially designed for XML and unstructured data. We also allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic. For more information on MarkLogic, please check out: http://www.marklogic.com/products/overview.html Thanks, Jane --- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote: From: cs230 chintanjs...@gmail.com Subject: trying to select technology To: core-u...@hadoop.apache.org Date: Tuesday, May 31, 2011, 10:50 AM Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
my suggestion, ElasticSearch:http://elasticsearch.org -原始邮件- From: Jane Chen Sent: Wednesday, June 01, 2011 12:19 PM To: core-u...@hadoop.apache.org ; common-user@hadoop.apache.org Subject: Re: trying to select technology Hi, I think you should check out MarkLogic, a product with database and search capabilities especially designed for XML and unstructured data. We also allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic. For more information on MarkLogic, please check out: http://www.marklogic.com/products/overview.html Thanks, Jane --- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote: From: cs230 chintanjs...@gmail.com Subject: trying to select technology To: core-u...@hadoop.apache.org Date: Tuesday, May 31, 2011, 10:50 AM Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.