You can easily make a custom Writable delegating to the existing writables. For example, if your writable is just a bunch of strings, use the existing Text writables in your class and use them in your read/write methods. For example
class MyWritable implements Writable { private Text fieldA; private Text fieldB; .... public void write(DataOutput dataOutput) throws IOException { fieldA.write(dataOutput); fieldB.write(dataOutput); } public void readFields(DataInput dataInput) throws IOException { fieldA.readFields(dataInput); fieldB.readFields(dataInput); } } dave On Wed, Feb 2, 2011 at 3:34 PM, Adeel Qureshi <adeelmahm...@gmail.com>wrote: > huh this interesting .. obviously I am not thinking about this whole thing > right .. > > so in your mapper you parse the line into tokens and set the appropriate > values on your writable by constructor or setters .. and let hadoop do all > the serialization and deserialization .. and you tell hadoop how to do that > by the read and write methods .. okay that makes more sense .. one last > thing i still dont understand is what is the proper implementation of read > and write methods .. if i have a bunch of strings in my writable then what > should be the read method implementation .. > > I really appreciate the help from all you guys .. > > On Wed, Feb 2, 2011 at 12:52 PM, David Sinclair < > dsincl...@chariotsolutions.com> wrote: > > > So create your writable as normal, and hadoop takes care of the > > serialization/deserialization between mappers and reducers. > > > > For example, MyWritable is the same as you had previously, then in your > > mapper output that writable > > > > class MyMapper extends Mapper<LongWritable, Text, LongWritable, > MyWritable> > > { > > > > private MyWritable writable =new MyWritable(); > > > > protected void map(LongWritable key, Text value, Context context) > throws > > IOException, InterruptedException { > > // parse text > > writable.setCounter(parseddata); > > writable.setTimestamp(parseddata); > > > > // don't know what your key is > > context.write(key, writable); > > } > > } > > > > and make sure you set the key/value output > > > > job.setMapOutputKeyClass(LongWritable .class); > > job.setMapOutputValueClass(MyWritable.class); > > > > dave > > > > > > On Wed, Feb 2, 2011 at 1:39 PM, Adeel Qureshi <adeelmahm...@gmail.com > > >wrote: > > > > > i m reading text data and outputting text data so yeah its all text .. > > the > > > reason why i wanted to use custom writable classes is not for the > mapper > > > purposes .. you are right .. the easiest thing for is to receive the > > > LongWritable and Text input in the mapper ... parse the text .. and > deal > > > with it .. but where I am having trouble is in passing the parsed > > > information to the reducer .. right now I am putting a bunch of things > as > > > text and sending the same LongWritable and Text output to reducer but > my > > > text includes a bunch of things e.g. several fields separated by a > > > delimiter > > > .. this is the part that I am trying to improve .. instead of sending a > > > bunch of delimited text I want to send an actual object to my reducer > > > > > > On Wed, Feb 2, 2011 at 12:33 PM, David Sinclair < > > > dsincl...@chariotsolutions.com> wrote: > > > > > > > Are you storing your data as text or binary? > > > > > > > > If you are storing as text, your mapper is going to get Keys of > > > > type LongWritable and values of type Text. Inside your mapper you > would > > > > parse out the strings and wouldn't be using your custom writable; > that > > is > > > > unless you wanted your mapper/reducer to produce these. > > > > > > > > If you are storing as Binary, e.g. SequenceFiles, you use > > > > the SequenceFileInputFormat and the sequence file reader will create > > the > > > > writables according to the mapper. > > > > > > > > dave > > > > > > > > On Wed, Feb 2, 2011 at 1:16 PM, Adeel Qureshi < > adeelmahm...@gmail.com > > > > >wrote: > > > > > > > > > okay so then the main question is how do I get the input line .. so > > > that > > > > I > > > > > could parse it .. I am assuming it will then be passed to me in via > > > data > > > > > input stream .. > > > > > > > > > > So in my readFields function .. I am assuming I will get the whole > > line > > > > .. > > > > > then I can parse it out and set my params .. something like this > > > > > > > > > > readFields(){ > > > > > String line = in.readLine(); read the whole line > > > > > > > > > > //now apply the regular expression to parse it out > > > > > data = pattern.group(1); > > > > > time = pattern.group(2); > > > > > user = pattern.group(3); > > > > > } > > > > > > > > > > Is that right ??? > > > > > > > > > > > > > > > > > > > > On Wed, Feb 2, 2011 at 12:11 PM, Vijay <tec...@gmail.com> wrote: > > > > > > > > > > > Hadoop is not going to parse the line for you. Your mapper will > > take > > > > the > > > > > > line, parse it and then turn it into your Writable so the next > > phase > > > > can > > > > > > just work with your object. > > > > > > > > > > > > Thanks, > > > > > > Vijay > > > > > > On Feb 2, 2011 9:51 AM, "Adeel Qureshi" <adeelmahm...@gmail.com> > > > > wrote: > > > > > > > thanks for your reply .. so lets say my input files are > formatted > > > > like > > > > > > this > > > > > > > > > > > > > > each line looks like this > > > > > > > DATE TIME SERVER USER URL QUERY PORT ... > > > > > > > > > > > > > > so to read this I would create a writable mapper > > > > > > > > > > > > > > public class MyMapper implements Writable { > > > > > > > Date date > > > > > > > long time > > > > > > > String server > > > > > > > String user > > > > > > > String url > > > > > > > String query > > > > > > > int port > > > > > > > > > > > > > > readFields(){ > > > > > > > date = readDate(in); //not concerned with the actual date > reading > > > > > > function > > > > > > > time = readLong(in); > > > > > > > server = readText(in); > > > > > > > ..... > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > but I still dont understand how is hadoop gonna know to parse > my > > > line > > > > > > into > > > > > > > these tokens .. instead of map be using the whole line as one > > token > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 2, 2011 at 11:42 AM, Harsh J < > qwertyman...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > >> See it this way: > > > > > > >> > > > > > > >> readFields(...) provides a DataInput stream that reads bytes > > from > > > a > > > > > > >> binary stream, and write(...) provides a DataOutput stream > that > > > > writes > > > > > > >> bytes to a binary stream. > > > > > > >> > > > > > > >> Now your data-structure may be a complex one, perhaps an array > > of > > > > > > >> items or a mapping of some, or just a set of different types > of > > > > > > >> objects. All you need to do is to think about how would you > > > > > > >> _serialize_ your data structure into a binary stream, so that > > you > > > > may > > > > > > >> _de-serialize_ it back from the same stream when required. > > > > > > >> > > > > > > >> About what goes where, I think looking up the definition of > > > > > > >> 'serialization' will help. It is all in the ordering. If you > > wrote > > > A > > > > > > >> before B, you read A before B - simple as that. > > > > > > >> > > > > > > >> This, or you could use a neat serialization library like > Apache > > > Avro > > > > > > >> (http://avro.apache.org) and solve it in a simpler way with a > > > > schema. > > > > > > >> I'd recommend learning/using Avro for all > > > > > > >> serialization/de-serialization needs. Especially for Hadoop > > > > use-cases. > > > > > > >> > > > > > > >> On Wed, Feb 2, 2011 at 10:51 PM, Adeel Qureshi < > > > > > adeelmahm...@gmail.com> > > > > > > >> wrote: > > > > > > >> > I have been trying to understand how to write a simple > custom > > > > > writable > > > > > > >> class > > > > > > >> > and I find the documentation available very vague and > unclear > > > > about > > > > > > >> certain > > > > > > >> > things. okay so here is the sample writable implementation > in > > > > > javadoc > > > > > > of > > > > > > >> > Writable interface > > > > > > >> > > > > > > > >> > public class MyWritable implements Writable { > > > > > > >> > // Some data > > > > > > >> > private int counter; > > > > > > >> > private long timestamp; > > > > > > >> > > > > > > > >> > *public void write(DataOutput out) throws IOException { > > > > > > >> > out.writeInt(counter); > > > > > > >> > out.writeLong(timestamp); > > > > > > >> > }* > > > > > > >> > > > > > > > >> > * public void readFields(DataInput in) throws IOException { > > > > > > >> > counter = in.readInt(); > > > > > > >> > timestamp = in.readLong(); > > > > > > >> > }* > > > > > > >> > > > > > > > >> > public static MyWritable read(DataInput in) throws > IOException > > { > > > > > > >> > MyWritable w = new MyWritable(); > > > > > > >> > w.readFields(in); > > > > > > >> > return w; > > > > > > >> > } > > > > > > >> > } > > > > > > >> > > > > > > > >> > so in readFields function we are simply saying read an int > > from > > > > the > > > > > > >> > datainput and put that in counter .. and then read a long > and > > > put > > > > > that > > > > > > in > > > > > > >> > timestamp variable .. what doesnt makes sense to me is what > is > > > the > > > > > > format > > > > > > >> of > > > > > > >> > DataInput here .. what if there are multiple ints and > multiple > > > > longs > > > > > > .. > > > > > > >> how > > > > > > >> > is the correct int gonna go in counter .. what if the data I > > am > > > > > > reading > > > > > > >> in > > > > > > >> > my mapper is a string line .. and I am using regular > > expression > > > to > > > > > > parse > > > > > > >> the > > > > > > >> > tokens .. how do I specify which field goes where .. simply > > > saying > > > > > > >> readInt > > > > > > >> > or readText .. how does that gets connected to the right > stuff > > > .. > > > > > > >> > > > > > > > >> > so in my case like I said I am reading from iis log files > > where > > > my > > > > > > mapper > > > > > > >> > input is a log line which contains usual log information > like > > > > data, > > > > > > time, > > > > > > >> > user, server, url, qry, responseTme etc .. I want to parse > > these > > > > > into > > > > > > an > > > > > > >> > object that can be passed to reducer instead of dumping all > > that > > > > > > >> information > > > > > > >> > as text .. > > > > > > >> > > > > > > > >> > I would appreciate any help. > > > > > > >> > Thanks > > > > > > >> > Adeel > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > >> Harsh J > > > > > > >> www.harshj.com > > > > > > >> > > > > > > > > > > > > > > > > > > > > >