trying to select technology

2011-05-31 Thread cs230

Hello All,

I am planning to start project where I have to do extensive storage of xml
and text files. On top of that I have to implement efficient algorithm for
searching over thousands or millions of files, and also do some indexes to
make search faster next time. 

I looked into Oracle database but it delivers very poor result. Can I use
Hadoop for this? Which Hadoop project would be best fit for this? 

Is there anything from Google I can use? 

Thanks a lot in advance.
-- 
View this message in context: 
http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: trying to select technology

2011-05-31 Thread jagaran das
Think of Lucene and Apache SOLR

Cheers,
Jagaran 




From: cs230 chintanjs...@gmail.com
To: core-u...@hadoop.apache.org
Sent: Tue, 31 May, 2011 10:50:49 AM
Subject: trying to select technology


Hello All,

I am planning to start project where I have to do extensive storage of xml
and text files. On top of that I have to implement efficient algorithm for
searching over thousands or millions of files, and also do some indexes to
make search faster next time. 

I looked into Oracle database but it delivers very poor result. Can I use
Hadoop for this? Which Hadoop project would be best fit for this? 

Is there anything from Google I can use? 

Thanks a lot in advance.
-- 
View this message in context: 
http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: trying to select technology

2011-05-31 Thread Matthew Foley
Sounds like you're looking for a full-text inverted index.  Lucene is a good 
opensource implementation of that.  I believe it has an option for storing the 
original full text as well as the indexes.
--Matt

On May 31, 2011, at 10:50 AM, cs230 wrote:


Hello All,

I am planning to start project where I have to do extensive storage of xml
and text files. On top of that I have to implement efficient algorithm for
searching over thousands or millions of files, and also do some indexes to
make search faster next time. 

I looked into Oracle database but it delivers very poor result. Can I use
Hadoop for this? Which Hadoop project would be best fit for this? 

Is there anything from Google I can use? 

Thanks a lot in advance.
-- 
View this message in context: 
http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: trying to select technology

2011-05-31 Thread Ted Dunning
To pile on, thousands or millions of documents are well within the range
that is well addressed by Lucene.

Solr may be an even better option than bare Lucene since it handles lots of
the boilerplate problems like document parsing and index update scheduling.

On Tue, May 31, 2011 at 11:56 AM, Matthew Foley ma...@yahoo-inc.com wrote:

 Sounds like you're looking for a full-text inverted index.  Lucene is a
 good opensource implementation of that.  I believe it has an option for
 storing the original full text as well as the indexes.
 --Matt

 On May 31, 2011, at 10:50 AM, cs230 wrote:


 Hello All,

 I am planning to start project where I have to do extensive storage of xml
 and text files. On top of that I have to implement efficient algorithm for
 searching over thousands or millions of files, and also do some indexes to
 make search faster next time.

 I looked into Oracle database but it delivers very poor result. Can I use
 Hadoop for this? Which Hadoop project would be best fit for this?

 Is there anything from Google I can use?

 Thanks a lot in advance.
 --
 View this message in context:
 http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: trying to select technology

2011-05-31 Thread Jane Chen
Hi,

I think you should check out MarkLogic, a product with database and search 
capabilities especially designed for XML and unstructured data.  We also allow 
you to run Hadoop MapReduce jobs on top of data stored in MarkLogic.

For more information on MarkLogic, please check out: 
http://www.marklogic.com/products/overview.html

Thanks,
Jane

--- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote:

 From: cs230 chintanjs...@gmail.com
 Subject: trying to select technology
 To: core-u...@hadoop.apache.org
 Date: Tuesday, May 31, 2011, 10:50 AM
 
 Hello All,
 
 I am planning to start project where I have to do extensive
 storage of xml
 and text files. On top of that I have to implement
 efficient algorithm for
 searching over thousands or millions of files, and also do
 some indexes to
 make search faster next time. 
 
 I looked into Oracle database but it delivers very poor
 result. Can I use
 Hadoop for this? Which Hadoop project would be best fit for
 this? 
 
 Is there anything from Google I can use? 
 
 Thanks a lot in advance.
 -- 
 View this message in context: 
 http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
 Sent from the Hadoop core-user mailing list archive at
 Nabble.com.
 
 


Re: trying to select technology

2011-05-31 Thread medcl

my suggestion,
ElasticSearch:http://elasticsearch.org


-原始邮件- 
From: Jane Chen

Sent: Wednesday, June 01, 2011 12:19 PM
To: core-u...@hadoop.apache.org ; common-user@hadoop.apache.org
Subject: Re: trying to select technology

Hi,

I think you should check out MarkLogic, a product with database and search 
capabilities especially designed for XML and unstructured data.  We also 
allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic.


For more information on MarkLogic, please check out:
http://www.marklogic.com/products/overview.html

Thanks,
Jane

--- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote:


From: cs230 chintanjs...@gmail.com
Subject: trying to select technology
To: core-u...@hadoop.apache.org
Date: Tuesday, May 31, 2011, 10:50 AM

Hello All,

I am planning to start project where I have to do extensive
storage of xml
and text files. On top of that I have to implement
efficient algorithm for
searching over thousands or millions of files, and also do
some indexes to
make search faster next time.

I looked into Oracle database but it delivers very poor
result. Can I use
Hadoop for this? Which Hadoop project would be best fit for
this?

Is there anything from Google I can use?

Thanks a lot in advance.
--
View this message in context: 
http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html

Sent from the Hadoop core-user mailing list archive at
Nabble.com.