TupleWritable is not a general-purpose type. It's used for map-side joins, where the arity of a tuple is fixed by construction. Its intent is a transient type with very, very specific applications in mind.

It sounds like you don't need a general list type, as you don't need to worry about encoding the type of object your list contains. Writables are *not* supposed to read to the end of the stream they're given; they are to consume a full instance from the stream (i.e. it must consume all "its" bytes from a stream, even if it ultimately discards them). Given these constraints, Writable types of variable size almost always encode their length explicitly. As Joman mentioned, your constructor must initialize all its elements. Further, readFields must not retain any state from the value it formerly contained, so you need to clear the list before you add more values to it. This means your getNameList method will need to do a shallow copy of its elements if the caller stores a reference to the list.

This should work:

  public void readFields(DataInput in) throws IOException {
    nameList.clear();
    score = in.readDouble();
    final int len = WritableUtils.readVInt(in);
    for (int i = 0; i < len; ++i) {
      nameList.add(Text.readString(in));
    }
  }

  public void write(DataOutput out) throws IOException {
    out.writeDouble(score);
    WritableUtils.writeVInt(out, nameList.size());
    for (String name : nameList) {
      Text.writeString(out, name);
    }
  }

You can improve your performance by (re)using a collection of Text instead of String (since the latter is immutable), but that requires more work. -C

On Oct 19, 2008, at 3:39 PM, Yih Sun Khoo wrote:

I think when it comes to the TupleWritable being part of a custm writable,
you cannot just say tupleWritable.readFields(in) and
tupleWritable.write(out)

I might be wrong. Has anyone successfully implemented a TupleWritable with
,say, a DoubleWritable in a custom writable?

On Sun, Oct 19, 2008 at 3:33 AM, Joman Chu <[EMAIL PROTECTED]> wrote:

hrm, try implementing the read(DataInput in) method, as well as a
blank constructor MyWritable() that fills dummy values into your
instance variables. For example this should be all you need for
read(DataInput in),

public static MyWritable read(DataInput in) throws IOException {
      MyWritable w = new MyWritable();
      w.readFields(in);
      return w;
}

EDIT: I was able to sort of replicate your error. In my constructor, i
had my instance variables assigned to null. Make sure you assign them
to new instances of whatever Writable you are using.


Joman Chu
http://www.notatypewriter.com/
AIM: ARcanUSNUMquam



On Sun, Oct 19, 2008 at 5:10 AM, Yih Sun Khoo <[EMAIL PROTECTED]> wrote:
Joman to add a little bit more to one of my previous mails about the
readFields methods

Have you ever had something like this?

public class MyWritable implements Writable {
  private DoubleWritable doubleWritable;
  private TupleWritable tupleWritable;

  public void readFields(DataInput in) throws IOException {
      doubleWritable.readFields(in);
      tupleWritable.readFields(in);
  }

  public void write(DataOutput out) throws IOException {
      doubleWritable.write(out);
      tupleWritable.write(out);
  }


}

On Sun, Oct 19, 2008 at 1:59 AM, Joman Chu <[EMAIL PROTECTED]> wrote:

I've never used TupleWritable, so hopefully somebody else can help you
with that.
Joman Chu
http://www.notatypewriter.com/
AIM: ARcanUSNUMquam



On Sun, Oct 19, 2008 at 4:40 AM, Yih Sun Khoo <[EMAIL PROTECTED]> wrote:
Also, I've noticed TupleWritable to be quite useful.
What are good techniques for using TupleWritable in a mapping phase
for a
"list of Text" when you do not know the size of that "list" ahead of
time

Say I had a custom writable which implemented TupleWritable and the
custom
writable contained a setter method
mycustomwritable.setTupleWritable( ...  )

Where the ellipsis is, there lies the TupleWritable.  However I'm
wondering
since TupleWritable can be constructed using
TupleWritable(Writable[]),
how
do I dynamically resize the Writable[] and add Text elements to it
when I
don't know the size of the Writable[] very well.  Does this make
sense?


On Sun, Oct 19, 2008 at 1:32 AM, Yih Sun Khoo <[EMAIL PROTECTED]>
wrote:

Let's say in the reduce phase your value happens to hold an
ArrayListWritable
In this example, value is of type ArrayListWritable
Maybe I've not thought about this or done this before, but how does
one
"read data in from the DataInput stream" in the reduce phase so that
the
ArrayListWritable which is a value already passed to the reducer can
be
used
as ArrayListWritable


On Sun, Oct 19, 2008 at 1:25 AM, Joman Chu <[EMAIL PROTECTED]>
wrote:

Since the ArrayListWritable extends ArrayList, you have access to
all
the ArrayList methods as well. Once you read data in from the
DataInput stream, you should be able to use ArrayListWritable just
like a regular ArrayList.
Joman Chu
http://www.notatypewriter.com/
AIM: ARcanUSNUMquam



On Sun, Oct 19, 2008 at 4:16 AM, Yih Sun Khoo <[EMAIL PROTECTED]>
wrote:
Hmm, what method from ArrayListWritable allows you to access the
different
elements of the ArrayList? Would it be readFields? for example,
in
a
reduce phase, if I needed to know the size of the array list, it
would
be
easy if i were dealing with an arraylist because i could just say
arraylist.size.  How would i accomplish that with the writable
counterpart?

On Sun, Oct 19, 2008 at 1:04 AM, Joman Chu <[EMAIL PROTECTED]>
wrote:

Hi,

For the ArrayList object, try taking a look at the implementation
of
ArrayListWritable by Jimmy Lin at UMD here:





https://subversion.umiacs.umd.edu/umd-hadoop/core/trunk/src/edu/umd/cloud9/io/ArrayListWritable.java

But basically in the readFields methods, I prefer using each
Writable
object's readFields method to read the data in. For example, for
your
double variable, I would use a DoubleWritable object and in the
MyWritable.readFields(DataInput in), I would use
nameofdoublewritable.readFields(in). For the
MyWritable.write(DataOutput out) method, I would use
nameofdoublewritable.write(out).

Have a good one,

Joman Chu
http://www.notatypewriter.com/
AIM: ARcanUSNUMquam



On Sun, Oct 19, 2008 at 3:30 AM, Yih Sun Khoo <[EMAIL PROTECTED] >
wrote:
I don't quite know how to write the read and write functions,
but
I
want
to
write my own writable, which should have a
DoubleWritable/double
value
followed by a list of Strings/Text. This Writable will be used
as
a
value.
Is the code below the best way to go about writing such a
writable?

import java.io.DataInput;
import java.io.DataOutput;
import java.io.EOFException;
import java.io.IOException;
import java.util.ArrayList;

import org.apache.hadoop.io.Writable;

public class MyWritable implements Writable {
  private double score;
  private ArrayList<String> nameList;

  public void setScore(double score) {
      this.score= score;
  }

  public void setNameList(ArrayList<String> nameList) {
      this.nameList= nameList;
  }

  public double getScore() {
      return score;
  }

  public ArrayList<String> getNameList() {
      return nameList;
  }

  public void readFields(DataInput in) throws IOException {
      score= in.readDouble();
      try {
          do {
              nameList.add(in.readUTF());
          } while (true);
      } catch (EOFException eofe) {
          // continue; done
      }
  }

  public void write(DataOutput out) throws IOException {
      out.writeDouble(score);
      for (String name: nameList) {
          out.writeUTF(name);
      }
  }
}











Reply via email to