Hey people,

I have a plain text file.I want to parse it using M/R line by line. When I
am saying line it means plain text line that ends with a DOT.
Can I use M/R to do this kind of job. I know if I have to do it like this,
I have to write my own InputFormat.
Can someone guide me/or share their experience on this kind of problem ?

For better context.. suppose my files looks like this :

"Depending on your data processing needs, your Hadoop workload can vary
widely
over time. You may have a few large data processing jobs that occasionally
take advantage
of hundreds of nodes, but those same nodes will sit idle the rest of the
time.
You may be new to Hadoop and want to get familiar with it first before
investing in
a dedicated cluster. You may own a startup that needs to conserve cash and
wants
to avoid the capital expense of a Hadoop cluster."

I want to make the following K-V pair like this :

K1 - V1 -->Depending on your data processing needs, your Hadoop workload
can vary widely over time.
K2 - V2 -->You may have a few large data processing jobs that occasionally
take advantage of hundreds of nodes, but those same nodes will sit idle the
rest of the time.
K3 - V3 -->You may be new to Hadoop and want to get familiar with it first
before investing in a dedicated cluster.
K4 - V4 --> You may own a startup that needs to conserve cash and wants to
avoid the capital expense of a Hadoop cluster.

Thanks.
Praveenesh

Reply via email to