Re: streaming split sizes

2009-01-20 Thread Delip Rao
Hi Dmitry, Not a direct answer to your question but I think the right approach would be to not load your database into memory during config() but instead lookup the database from map() via Hbase or something similar. That way you don't have to worry about the split sizes. In fact using fewer

Re: Indexed Hashtables

2009-01-16 Thread Delip Rao
, Delip Rao delip...@gmail.com wrote: Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on the lines of HashStore (www.cellspark.com/hashstore.html) should

Re: @hadoop on twitter

2009-01-16 Thread Delip Rao
is it twitter.com/hadoop ? On Fri, Jan 16, 2009 at 10:04 AM, Tom White t...@cloudera.com wrote: Thanks flip. I've signed up for the hadoop account - be great to get some help with getting it going. Tom On Wed, Jan 14, 2009 at 6:33 AM, Philip (flip) Kromer f...@infochimps.org wrote: Hey

Indexed Hashtables

2009-01-14 Thread Delip Rao
Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on the lines of HashStore (www.cellspark.com/hashstore.html) should be fine. Thanks, Delip

Re: Run Map-Reduce multiple times

2008-12-26 Thread Delip Rao
...@attributor.com wrote: in 19 there is a chaining facility, I haven't looked at it yet, but it may provide an alternative to the rather standard pattern of looping. You may also what to check what mahout is doing as it is a common problem in that space. Delip Rao wrote: Thanks Chris! I ended up

Re: Simple data transformations in Hadoop?

2008-12-13 Thread Delip Rao
On Sat, Dec 13, 2008 at 9:32 PM, Stuart White stuart.whi...@gmail.com wrote: (I'm quite new to hadoop and map/reduce, so some of these questions might not make complete sense.) I want to perform simple data transforms on large datasets, and it seems Hadoop is an appropriate tool. As a simple

Gzip compressed input?

2008-12-11 Thread Delip Rao
I am having trouble reading gzip compressed input. Is this a known problem? Any workarounds? (I am using gzip 1.3.3 ) Thanks, Delip $ hadoop dfs -ls input Found 1 items -rw-r--r-- 3 huser supergroup 17532230 2008-12-11 23:52 /user/huser/input/words.gz $ hadoop jar hadoop-0.19.0-examples.jar

File I/O from Mapper.configure

2008-12-08 Thread Delip Rao
Hi, How do I read regular text files on HDFS from configure() in my Mapper? I am doing the following and my jobs appear to fail randomly (i.e., it works sometimes but mostly fails). FileSystem fs = FileSystem.get(conf); Path path = new

Run Map-Reduce multiple times

2008-12-07 Thread Delip Rao
Hi, I need to run my map-reduce routines for several iterations so that the output of an iteration becomes the input to the next iteration. Is there a standard pattern to do this instead of calling JobClient.runJob() in a loop? Thanks, Delip

Re: Run Map-Reduce multiple times

2008-12-07 Thread Delip Rao
the previous iteration to be the input path in the next iteration. This at least lets you control whether you decide to keep around results of intermediate iterations or erase them... -Chris On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao [EMAIL PROTECTED] wrote: Hi, I need to run my map-reduce