Why use storm, unless you need to transform the data in a global way (aka
streaming join or per-agragation) and the data is too big for a single box to
handle, or this is going to be an ongoing process. Just write a little app
that takes a file on the command line and uploads it to HBase. Then you can
use find and xargs to upload them in parallel
find ./dir/to/data -type f | xargs -n 1 -P <NUM_IN_PARALLEL> <COMMAND_TO_UPLOAD>
If you need multiple machines to do this, then launch subsets of the data on
multiple different machines. Storm just feels like overkill for a simple
upload.
- Bobby
On Tuesday, June 9, 2015 11:32 AM, "Rajabhathor, Selvaraj (Contractor)"
<[email protected]> wrote:
Hi
I am working on a POC to migrate current unix based files into HBase .
We believe we can use Storm to migrate these files - I have successfully
implemented a POC to read one file and emit it across a test Topology that I
dfined.
My next goal is to sort of "loop through' a directory and emit several files in
a similar fashion.
How can I implement such a Topology using Storm?
Thanks
Raj
Regards,
Raj Rajabhathor
Big Data Architect,
Capco Contractor @ FannieMae<mailto:Contractor@FannieMae(917)952>
(917)952<mailto:Contractor@FannieMae(917)952>-5597 (cell)
703-833-2539 (direct)
This e-mail and its attachments are confidential and solely for the intended
addressee(s). Do not share or use them without Fannie Mae's approval. If
received in error, delete them and contact the sender.