Hi Sonal,
thank you, I have just implemented a solution similar to yours (without
copying to a temp file as suggested in my inital post), and it seems to
work.
Best Regards,
Jérôme
2011/1/7 Sonal Goyal sonalgoy...@gmail.com
Jerome,
You can take a look at FileStreamInputFormat at
https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input
This provides an input stream per file. In our case, we are using the input
stream to load data into the database directly. Maybe you can use this or a
similar approach for working with your videos.
HTH
Thanks and Regards,
Sonal
https://github.com/sonalgoyal/hihoConnect Hadoop with databases,
Salesforce, FTP servers and others https://github.com/sonalgoyal/hiho
Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
On Thu, Jan 6, 2011 at 4:23 PM, Jérôme Thièvre jthie...@gmail.com wrote:
Hi,
we are currently using Hadoop (version 0.20.2) to manage some web
archiving
processes like fulltext indexing, and it works very well with small
records
that contains html.
Now, we would like to work with other type of web data like videos. These
kind of data could be really large and of course these records doesn't
fit
in memory.
Is it possible to manage record which content doesn't reside in memory
but
on disk.
A possibility would be to implements a Writable that read its content
from
a
DataInput but doesn't load it in memory, instead it would copy that
content
to a temporary file in the local file system and allows to stream its
content using an InputStream (an InputStreamWritable).
Has somebody tested a similar approach, and if not do you think some big
problems could happen (that impacts performance) with this method ?
Thanks,
Jérôme Thièvre