Re: can hadoop read files backwards

Elia Mazzawi Fri, 18 Jul 2008 13:06:37 -0700

well here is the problem I'm trying to solve,

I have a data set that looks like this:


ID    type   Timestamp

A1    X   1215647404
A2    X   1215647405
A3    X   1215647406
A1   Y   1215647409

I want to count how many A1 Y, show up within 5 seconds of an A1 X

I was planning to have the data sorted by ID then timestamp,
then read it backwards,  (or have it sorted by reverse timestamp)

go through it cashing all Y's for the same ID for 5 seconds to eitherfind a matching X or not.


the results don't need to be 100% accurate.

so if hadoop gives the same file with the same lines in order then thiswill work.

seems hadoop is really good at solving problems that depend on 1 line ata time? but not multi lines?

hadoop has to get data in order, and be able to work on multi lines,otherwise how can it be setting records in data sorts.


I'd appreciate other suggestions to go about doing this.

Jim R. Wilson wrote:

does wordcount get the lines in order? or are they random? can i have
hadoop return them in reverse order?


You can't really depend on the order that the lines are given - it's
best to think of them as random.  The purpose of MapReduce/Hadoop is
to distribute a problem among a number of cooperating nodes.

The idea is that any given line can be interpreted separately,
completely independent of any other line.  So in wordcount, this makes
sense.  For example, say you and I are nodes. Each of us gets half the
lines in a file and we can count the words we see and report on them -
it doesn't matter what order we're given the lines, or which lines
we're given, or even whether we get the same number of lines (if
you're faster at it, or maybe you get shorter lines, you may get more
lines to process in the interest of saving time).

So if the project you're working on requires getting the lines in a
particular order, then you probably need to rethink your approach. It
may be that hadoop isn't right for your problem, or maybe that the
problem just needs to be attacked in a different way.  Without knowing
more about what you're trying to achieve, I can't offer any specifics.

Good luck!

-- Jim

On Thu, Jul 17, 2008 at 4:41 PM, Elia Mazzawi
<[EMAIL PROTECTED]> wrote:

I have a program based on wordcount.java
and I have files that are smaller than 64mb files (so i believe each file is
one task )

do does wordcount get the lines in order? or are they random? can i have
hadoop return them in reverse order?

Jim R. Wilson wrote:

It sounds to me like you're talking about hadoop streaming (correct me
if I'm wrong there).  In that case, there's really no "order" to the
lines being doled out as I understand it.  Any given line could be
handed to any given mapper task running on any given node.

I may be wrong, of course, someone closer to the project could give
you the right answer in that case.

-- Jim R. Wilson (jimbojw)

On Thu, Jul 17, 2008 at 4:06 PM, Elia Mazzawi
<[EMAIL PROTECTED]> wrote:

is there a way to have hadoop hand over the lines of a file backwards to
my
mapper ?

as in give the last line first.

Re: can hadoop read files backwards

Reply via email to