Re: trying to select technology

2011-05-31 Thread Jane Chen
Hi,

I think you should check out MarkLogic, a product with database and search 
capabilities especially designed for XML and unstructured data.  We also allow 
you to run Hadoop MapReduce jobs on top of data stored in MarkLogic.

For more information on MarkLogic, please check out: 
http://www.marklogic.com/products/overview.html

Thanks,
Jane

--- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote:

 From: cs230 chintanjs...@gmail.com
 Subject: trying to select technology
 To: core-u...@hadoop.apache.org
 Date: Tuesday, May 31, 2011, 10:50 AM
 
 Hello All,
 
 I am planning to start project where I have to do extensive
 storage of xml
 and text files. On top of that I have to implement
 efficient algorithm for
 searching over thousands or millions of files, and also do
 some indexes to
 make search faster next time. 
 
 I looked into Oracle database but it delivers very poor
 result. Can I use
 Hadoop for this? Which Hadoop project would be best fit for
 this? 
 
 Is there anything from Google I can use? 
 
 Thanks a lot in advance.
 -- 
 View this message in context: 
 http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html
 Sent from the Hadoop core-user mailing list archive at
 Nabble.com.
 
 


Re: HDFS disk consumption.

2010-12-29 Thread Jane Chen
You are right.  There's only one replica.  When does the deleted file space get 
reclaimed?

--- On Tue, 12/28/10, Hemanth Yamijala yhema...@gmail.com wrote:

 From: Hemanth Yamijala yhema...@gmail.com
 Subject: Re: HDFS disk consumption.
 To: common-user@hadoop.apache.org
 Date: Tuesday, December 28, 2010, 8:43 PM
 Hi,
 
 On Wed, Dec 29, 2010 at 5:51 AM, Jane Chen jxchen_us_1...@yahoo.com
 wrote:
  Is setting dfs.replication to 1 sufficient to stop
 replication?  How do I verify that?  I have a pseudo
 cluster running 0.21.0.  It seems that the hdfs disk
 consumption triples the amount of data stored.
 
 Setting to 1 is sufficient to stop replication. Can you
 check if the
 web UI for NameNode has a way to show the replicas of
 blocks for a
 file ?
 
 
  Thanks,
  Jane
 
 
 
 
 





HDFS disk consumption.

2010-12-28 Thread Jane Chen
Is setting dfs.replication to 1 sufficient to stop replication?  How do I 
verify that?  I have a pseudo cluster running 0.21.0.  It seems that the hdfs 
disk consumption triples the amount of data stored.

Thanks,
Jane