Re: hadoop file system browser

2008-01-24 Thread Enis Soztutar
Yes, you can solve the bottleneck by starting a webdav server on each 
client. But this would include the burden to manage the servers etc. and 
it may not be the intended use case for webdav. But we can further 
discuss the architecture in the relevant issue.


Alban Chevignard wrote:

Thanks for the clarification. I agree that running a single WebDAV
server for all clients would make it a bottleneck. But I can't see
anything in the current WebDAV server implementation that precludes
running an instance of it on each client. It seems to me that would
solve any bottleneck issue.

-Alban

On Jan 23, 2008 2:53 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote:
  

As you know, dfs client connects to the individual datanodes to
read/write data and has a minimal interaction with the Namenode, which
improves the io rate linearly(theoretically 1:1). However current
implementation of webdav interface, is just a server working on a single
machine, which translates the webdav requests to namenode. Thus the
whole traffic passes through this webdav server, which makes it a
bottleneck. I was planning to integrate webdav server with
namenode/datanode, and forward the requests to the other datanodes, so
that we can do io in parallel, but my focus on webdav has faded for now.




Alban Chevignard wrote:


What are the scalability issues associated with the current WebDAV interface?

Thanks,
-Alban

On Jan 22, 2008 7:27 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote:

  

Webdav interface for hadoop works as it is, but it needs a major
redesign to be scalable, however it is still useful. It has even been
used with windows explorer defining the webdav server as a remote service.


Ted Dunning wrote:



There has been significant work on building a web-DAV interface for HDFS.  I
haven't heard any news for some time, however.


On 1/21/08 11:32 AM, "Dawid Weiss" <[EMAIL PROTECTED]> wrote:



  

The Eclipse plug-in also features a DFS browser.


  

Yep. That's all true, I don't mean to self-promote, because there really isn't
that much to advertise ;) I was just quite attached to file manager-like user
interface; the mucommander clone I posted served me as a browser, but also for
rudimentary file operations (copying to/from, deleting folders etc.). In my
experience it's been quite handy.

It would be probably a good idea to implement a commons-vfs plugin for Hadoop
so
that HDFS filesystem is transparent to use for other apps.

Dawid



  
  


  


hadoop and local files

2008-01-24 Thread jerrro

Hello,

When launching a map-reduce job, I am interested in copying a certain file
to the datanodes, but not HDFS - the local file system, so I can access that
file from my job on the datanode. (The file is around 500KB, so I don't
think there will be much overhead). Is there a way to tell hadoop to do that
(I heard it is possible, but not sure how)? Also, how do I know where the
file is copied to? (I understood it can be copied to /tmp or something of
that sort of the datanode).

Thanks.



Jerr.

-- 
View this message in context: 
http://www.nabble.com/hadoop-and-local-files-tp15068393p15068393.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: hadoop and local files

2008-01-24 Thread Johannes Zillmann

Hi Jerrro,

take a look at 
http://hadoop.apache.org/core/docs/r0.15.3/mapred_tutorial.html#DistributedCache
The DistributedCache looks like what you are searching for. I think the 
interesting part is the example 
http://hadoop.apache.org/core/docs/r0.15.3/mapred_tutorial.html#Example%3A+WordCount+v2.0


Johannes


jerrro wrote:

Hello,

When launching a map-reduce job, I am interested in copying a certain file
to the datanodes, but not HDFS - the local file system, so I can access that
file from my job on the datanode. (The file is around 500KB, so I don't
think there will be much overhead). Is there a way to tell hadoop to do that
(I heard it is possible, but not sure how)? Also, how do I know where the
file is copied to? (I understood it can be copied to /tmp or something of
that sort of the datanode).

Thanks.



Jerr.

  



--
~~~ 
101tec GmbH


Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com



RE: hadoop and local files

2008-01-24 Thread Hairong Kuang
You can either pack the files with your job jar or use the distributed
cache if the file size is big. See http://wiki.apache.org/hadoop/FAQ#8. 

Hairong  

-Original Message-
From: jerrro [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 24, 2008 8:06 AM
To: [EMAIL PROTECTED]
Subject: hadoop and local files


Hello,

When launching a map-reduce job, I am interested in copying a certain
file to the datanodes, but not HDFS - the local file system, so I can
access that file from my job on the datanode. (The file is around 500KB,
so I don't think there will be much overhead). Is there a way to tell
hadoop to do that (I heard it is possible, but not sure how)? Also, how
do I know where the file is copied to? (I understood it can be copied to
/tmp or something of that sort of the datanode).

Thanks.



Jerr.

--
View this message in context:
http://www.nabble.com/hadoop-and-local-files-tp15068393p15068393.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: hadoop file system browser

2008-01-24 Thread Vetle Roeim
On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher  
<[EMAIL PROTECTED]> wrote:



we use FUSE: who wants a gui when you could  have a shell?
http://issues.apache.org/jira/browse/HADOOP-4


Does this work with newer versions of Hadoop?


[...]
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Re: hadoop file system browser

2008-01-24 Thread Jason Venner

With very minor changes it works with 0.15.2, read only.


Vetle Roeim wrote:
On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher 
<[EMAIL PROTECTED]> wrote:



we use FUSE: who wants a gui when you could  have a shell?
http://issues.apache.org/jira/browse/HADOOP-4


Does this work with newer versions of Hadoop?


[...]

--
Jason Venner
Attributor - Publish with Confidence 
Attributor is hiring Hadoop Wranglers, contact if interested


Re: hadoop file system browser

2008-01-24 Thread Pete Wyckoff

Right now its tested with 0.14.4. It also includes rmdir, rm, mkdir, mv.
I¹ve implemented write, but it has to wait for appends to work in Hadoop
because of the Fuse protocol.

Our strategy thus far has been to use FUSE on a single box and then NFS
export it to other machines. We don¹t do heavy, heavy operations on it, so
it isn¹t a performance problem.  The things I think are most useful anyway
are ls, find, du, mkdir, rmdir, rm and mv ­ none of which tax FUSe much.

-- pete


On 1/24/08 10:39 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:

> On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher
> <[EMAIL PROTECTED]> wrote:
> 
>> > we use FUSE: who wants a gui when you could  have a shell?
>> > http://issues.apache.org/jira/browse/HADOOP-4
> 
> Does this work with newer versions of Hadoop?
> 
> 
> [...]
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> 




Re: hadoop file system browser

2008-01-24 Thread Pete Wyckoff

Another note is we implemented the trash feature in fuse. This could be
turned on and off with ioctl. We also don¹t allow removal of certain
directories which again can be configured with ioctl. (but isn¹t yet)

-- pete



On 1/24/08 10:39 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:

> On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher
> <[EMAIL PROTECTED]> wrote:
> 
>> > we use FUSE: who wants a gui when you could  have a shell?
>> > http://issues.apache.org/jira/browse/HADOOP-4
> 
> Does this work with newer versions of Hadoop?
> 
> 
> [...]
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> 




Re: hadoop file system browser

2008-01-24 Thread Vetle Roeim

Great! Where can I get it? :)

On Thu, 24 Jan 2008 19:48:57 +0100, Pete Wyckoff <[EMAIL PROTECTED]>  
wrote:




Right now its tested with 0.14.4. It also includes rmdir, rm, mkdir, mv.
I¹ve implemented write, but it has to wait for appends to work in Hadoop
because of the Fuse protocol.

Our strategy thus far has been to use FUSE on a single box and then NFS
export it to other machines. We don¹t do heavy, heavy operations on it,  
so
it isn¹t a performance problem.  The things I think are most useful  
anyway

are ls, find, du, mkdir, rmdir, rm and mv ­ none of which tax FUSe much.

-- pete


On 1/24/08 10:39 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:


On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher
<[EMAIL PROTECTED]> wrote:


> we use FUSE: who wants a gui when you could  have a shell?
> http://issues.apache.org/jira/browse/HADOOP-4


Does this work with newer versions of Hadoop?


[...]
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/








--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Re: hadoop file system browser

2008-01-24 Thread Jason Venner
The only change needed for 0.15.2 was to change the references to 
info[0].mCreationTime into references to info[0].mLastMod


Pete Wyckoff wrote:

Right now its tested with 0.14.4. It also includes rmdir, rm, mkdir, mv.
I¹ve implemented write, but it has to wait for appends to work in Hadoop
because of the Fuse protocol.

Our strategy thus far has been to use FUSE on a single box and then NFS
export it to other machines. We don¹t do heavy, heavy operations on it, so
it isn¹t a performance problem.  The things I think are most useful anyway
are ls, find, du, mkdir, rmdir, rm and mv ­ none of which tax FUSe much.

-- pete


On 1/24/08 10:39 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:

  

On Tue, 22 Jan 2008 22:03:03 +0100, Jeff Hammerbacher
<[EMAIL PROTECTED]> wrote:



we use FUSE: who wants a gui when you could  have a shell?
http://issues.apache.org/jira/browse/HADOOP-4


Does this work with newer versions of Hadoop?


[...]
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/






  


--
Jason Venner
Attributor - Publish with Confidence 
Attributor is hiring Hadoop Wranglers, contact if interested


Re: hadoop file system browser

2008-01-24 Thread Pete Wyckoff

I can post it again, but it doesn¹t include ioctl commands so the trash
feature cannot be configured. I can still create a flag and default it to
false. And also the directory protection isn¹t configurable so I can set a
flag to false. The main directory we protect here is /user/facebook for data
(and job :) ) protection purposes.


On 1/24/08 10:55 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:

>> >




Re: hadoop file system browser

2008-01-24 Thread Vetle Roeim
Yes, please post it again. :) Lack of trash and directory protection  
shouldn't be an issue for my needs.


On Thu, 24 Jan 2008 20:11:26 +0100, Pete Wyckoff <[EMAIL PROTECTED]>  
wrote:




I can post it again, but it doesn¹t include ioctl commands so the trash
feature cannot be configured. I can still create a flag and default it to
false. And also the directory protection isn¹t configurable so I can set  
a
flag to false. The main directory we protect here is /user/facebook for  
data

(and job :) ) protection purposes.


On 1/24/08 10:55 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:


>







--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Re: hadoop file system browser

2008-01-24 Thread Pete Wyckoff

I attached the newest version to:
https://issues.apache.org/jira/browse/HADOOP-4

Still a work in progress and any help appreciated. Not much by way of
instructions but here are some:

1. download and install fuse and do a modprobe fuse
2. modify fuse_dfs.c¹s Makefile to have the right paths for fuse, hdfs.h and
jni 
3. ensure you have hadoop in your class path and the jni stuff in your
library path 
4. mkdir /tmp/hdfs 
5. ./fuse_dfs dfs://hadoop_namenode:9000 /tmp/hdfs ­d

Probably will be missing things in your class path and LD_LIBRARY_PATH when
you do 5, so just add them and iterate.  To run this as production quality,
you basically need fuse_dfs in root¹s path and add a line to /etc/fstab. For
people interested, I can give you my config line.

-- pete


On 1/24/08 10:55 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:

>> >




Re: hadoop file system browser

2008-01-24 Thread Vetle Roeim

Thanks!

On Thu, 24 Jan 2008 20:29:20 +0100, Pete Wyckoff <[EMAIL PROTECTED]>  
wrote:




I attached the newest version to:
https://issues.apache.org/jira/browse/HADOOP-4

Still a work in progress and any help appreciated. Not much by way of
instructions but here are some:

1. download and install fuse and do a modprobe fuse
2. modify fuse_dfs.c¹s Makefile to have the right paths for fuse, hdfs.h  
and

jni
3. ensure you have hadoop in your class path and the jni stuff in your
library path
4. mkdir /tmp/hdfs
5. ./fuse_dfs dfs://hadoop_namenode:9000 /tmp/hdfs ­d

Probably will be missing things in your class path and LD_LIBRARY_PATH  
when
you do 5, so just add them and iterate.  To run this as production  
quality,
you basically need fuse_dfs in root¹s path and add a line to /etc/fstab.  
For

people interested, I can give you my config line.

-- pete


On 1/24/08 10:55 AM, "Vetle Roeim" <[EMAIL PROTECTED]> wrote:


>







--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Region offline issues

2008-01-24 Thread Marc Harris
Is anyone else having the same problems as me with regard to frequently
seeing "NotServingRegionException" and "IllegalStateException: region
offline" exceptions when trying to load data into an hbase instance?

My setup uses
- hadoop (2008-01-14 snapshot)
- a single server hbase cluster, as described in
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
- a single server hadoop cluster, as described in
http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
hardware as the hadoop cluster)
- client java application running on the same hardware

I have 1 table containing 4 column families, and I am attempting to
write about 5,000,000 rows of data, averaging at most few kb each, in a
single thread.

What happens then is that several tens of thousands of rows write
perfectly, and then the server I guess starts to split regions, and then
starts throwing the above mentioned exceptions all the time. The client
will then start waiting and retrying all the time and my upload rate
drops from several thousand rows per minute to less than 50 rows per
minute with a couple of errors (i.e. exceptions that made their way all
the way out to the client application) per minute. Regions, and
therefore hbase itself, seem to spend more time offline than on line.

Did anyone else have a similar experience, and does anyone have a
suggestion for how to improve the reliability of my setup?

- Marc



conf files needed by a java client

2008-01-24 Thread Marc Harris
Does an hbase client java application just need a correctly configured
hbase-site.xml or does it need a hadoop-site.xml as well? By client
application I mean not a map-reduce job but something similar to the
sample application on the hbase FAQ page
http://wiki.apache.org/hadoop/Hbase/FAQ#1

Thanks,
- Marc



Re: Region offline issues

2008-01-24 Thread Bryan Duxbury
When there are splits going on, NSREs are expected. I would say that  
it is fairly unexpected for them to bubble all the way up to the  
client application, though.


Is there anything else in your master or regionserver logs? Are you  
running at DEBUG log level for HBase? I'd like to try and figure this  
one out if possible.


I will say that I am running much bigger imports than what it sounds  
like you're doing, and it's working, albeit on a 13-node cluster, not  
a single machine. It's possible you're just trying to write too fast  
for your hardware to keep up, since it is playing every role, but I'd  
still expect it to keep working.


-Bryan

On Jan 24, 2008, at 12:23 PM, Marc Harris wrote:

Is anyone else having the same problems as me with regard to  
frequently

seeing "NotServingRegionException" and "IllegalStateException: region
offline" exceptions when trying to load data into an hbase instance?

My setup uses
- hadoop (2008-01-14 snapshot)
- a single server hbase cluster, as described in
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_ 
(Single-Node_Cluster)

- a single server hadoop cluster, as described in
http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
hardware as the hadoop cluster)
- client java application running on the same hardware

I have 1 table containing 4 column families, and I am attempting to
write about 5,000,000 rows of data, averaging at most few kb each,  
in a

single thread.

What happens then is that several tens of thousands of rows write
perfectly, and then the server I guess starts to split regions, and  
then
starts throwing the above mentioned exceptions all the time. The  
client

will then start waiting and retrying all the time and my upload rate
drops from several thousand rows per minute to less than 50 rows per
minute with a couple of errors (i.e. exceptions that made their way  
all

the way out to the client application) per minute. Regions, and
therefore hbase itself, seem to spend more time offline than on line.

Did anyone else have a similar experience, and does anyone have a
suggestion for how to improve the reliability of my setup?

- Marc





Re: Region offline issues

2008-01-24 Thread stack
I've seen the ISE's myself (HADOOP-2692).  As Bryan says, the NSREs are 
part of 'normal' operation; they only show if running at DEBUG level 
unless we run out of retries and then the NSRE is thrown as an error.


FYI, 5M rows single-threaded will take forever to load.  I'd suggest you 
set up a MR job.


Lots of splitting will have regions offline for a while and if all is 
running on the one server, could take a while for them to come back on 
(Digging in logs, you should be able to figure the story).


Also of note, when a regionserver judges' itself overloaded, it'll block 
updates until its had chance to catch its breath.  If these intervals go 
on too long, this could be another reason you're clients fail.


St.Ack

Bryan Duxbury wrote:
When there are splits going on, NSREs are expected. I would say that 
it is fairly unexpected for them to bubble all the way up to the 
client application, though.


Is there anything else in your master or regionserver logs? Are you 
running at DEBUG log level for HBase? I'd like to try and figure this 
one out if possible.


I will say that I am running much bigger imports than what it sounds 
like you're doing, and it's working, albeit on a 13-node cluster, not 
a single machine. It's possible you're just trying to write too fast 
for your hardware to keep up, since it is playing every role, but I'd 
still expect it to keep working.


-Bryan

On Jan 24, 2008, at 12:23 PM, Marc Harris wrote:


Is anyone else having the same problems as me with regard to frequently
seeing "NotServingRegionException" and "IllegalStateException: region
offline" exceptions when trying to load data into an hbase instance?

My setup uses
- hadoop (2008-01-14 snapshot)
- a single server hbase cluster, as described in
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) 


- a single server hadoop cluster, as described in
http://wiki.apache.org/hadoop/Hbase/10Minutes (running on the same
hardware as the hadoop cluster)
- client java application running on the same hardware

I have 1 table containing 4 column families, and I am attempting to
write about 5,000,000 rows of data, averaging at most few kb each, in a
single thread.

What happens then is that several tens of thousands of rows write
perfectly, and then the server I guess starts to split regions, and then
starts throwing the above mentioned exceptions all the time. The client
will then start waiting and retrying all the time and my upload rate
drops from several thousand rows per minute to less than 50 rows per
minute with a couple of errors (i.e. exceptions that made their way all
the way out to the client application) per minute. Regions, and
therefore hbase itself, seem to spend more time offline than on line.

Did anyone else have a similar experience, and does anyone have a
suggestion for how to improve the reliability of my setup?

- Marc







Re: conf files needed by a java client

2008-01-24 Thread stack
Yes, unless you copy all of your hadoop-site.xml to hbase-site.xml.  
Hbase on startup -- server or client -- will add $HBASE_CONF_DIR and 
$HADOOP_CONF_DIR to its CLASSPATH. Any hadoop-*xml and hbase-*.xml 
configuration files found therein will be loaded.  Hbase then uses such 
as the hadoop conf fs.default.name to figure what filesystem to write 
to, etc.


St.Ack

Marc Harris wrote:

Does an hbase client java application just need a correctly configured
hbase-site.xml or does it need a hadoop-site.xml as well? By client
application I mean not a map-reduce job but something similar to the
sample application on the hbase FAQ page
http://wiki.apache.org/hadoop/Hbase/FAQ#1

Thanks,
- Marc


  




questions about the configuration file

2008-01-24 Thread Yunhong Gu1


Hi,

I have some questions on the network settings.

What is the different between the following two entries?
The first one is obvious, but what does the second one mean? How is it 
related to DNS?



  dfs.datanode.bindAddress
  0.0.0.0
  
the address where the datanode will listen to.
  



  dfs.datanode.dns.interface
  default
  The name of the Network Interface from which a data node should
  report its IP address.
  



For example, if my server has two IPs:

eth2192.168.0.1
eth310.0.0.1
lo  127.0.0.1

I want to use 10.0.0.1. So I set the first entry as 10.0.0.1. How about 
the second one? eth3?


I use IP addresses and my server does not use domain name to talk to other 
servers (10.0.0.x).


Does the host name matter with regard to the above setting? Does 
/etc/hosts matter?


Does DNS matter if I dont use domain name at all? I asked this question 
because Hadoop always use host name in its report and I worry that it 
cannot locate the slave servers by the host name because no DNS is set.


Thanks
Yunhong


MapReduce usage with Lucene Indexing

2008-01-24 Thread roger dimitri
Hi,
   I am very new to Hadoop, and I have a project where I need to use Lucene to 
index some input given either as a a huge collection of Java objects or one 
huge java object. 
  I read about Hadoop's MapReduce utilities and I want to leverage that feature 
in my case described above. 
  Can some one please tell me how I can approach the problem described above. 
Because all the Hadoop's MapReduce examples out there show only File based 
input and don't explicitly deal with data coming in as a huge Java object or so 
to speak.

Any help is greatly appreciated.

Thanks,
Roger




  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Re: MapReduce usage with Lucene Indexing

2008-01-24 Thread Bradford Stephens
I'm actually going to be doing something similar, with Nutch. I just
started learning about Hadoop this week, so I'm interested in what
everyone has to say :)

On Jan 24, 2008 5:00 PM, roger dimitri <[EMAIL PROTECTED]> wrote:
> Hi,
>I am very new to Hadoop, and I have a project where I need to use Lucene 
> to index some input given either as a a huge collection of Java objects or 
> one huge java object.
>   I read about Hadoop's MapReduce utilities and I want to leverage that 
> feature in my case described above.
>   Can some one please tell me how I can approach the problem described above. 
> Because all the Hadoop's MapReduce examples out there show only File based 
> input and don't explicitly deal with data coming in as a huge Java object or 
> so to speak.
>
> Any help is greatly appreciated.
>
> Thanks,
> Roger
>
>
>
>
>   
> 
> Never miss a thing.  Make Yahoo your home page.
> http://www.yahoo.com/r/hs


how to stop regionserver

2008-01-24 Thread ma qiang
Hi all;
 When I start my hbase,the error print as follows: localhost:
regionserver running as process 6893. Stop it first.

Can you tell me how to solve this problem ?Why after I stop my
hbase the regionserver still run?

Best Wishes


Re: how to stop regionserver

2008-01-24 Thread stack
Its safe to 'kill' it if it won't go down.  See logs to see if you can 
figure why it didn't go down when master went down.

St.Ack

ma qiang wrote:

Hi all;
 When I start my hbase,the error print as follows: localhost:
regionserver running as process 6893. Stop it first.

Can you tell me how to solve this problem ?Why after I stop my
hbase the regionserver still run?

Best Wishes
  




Re: MapReduce usage with Lucene Indexing

2008-01-24 Thread Rajagopal Natarajan
On Jan 25, 2008 6:30 AM, roger dimitri <[EMAIL PROTECTED]> wrote:

> Hi,
>   I am very new to Hadoop, and I have a project where I need to use Lucene
> to index some input given either as a a huge collection of Java objects or
> one huge java object.
>  I read about Hadoop's MapReduce utilities and I want to leverage that
> feature in my case described above.
>  Can some one please tell me how I can approach the problem described
> above. Because all the Hadoop's MapReduce examples out there show only File
> based input and don't explicitly deal with data coming in as a huge Java
> object or so to speak.


Something that came just out of my head. When your input is a collection of
smaller objects, each independent of the other, you could serialize all the
objects and write to a file, specify the RecordReader and the reducer would
deserialize each object and perform indexing. I'll have to look into more
details on java.io.Serializable and lucene API to be able to comment more on
it.

-- 
N. Rajagopal,
Visit me at http://www.raja-gopal.com