Is the tab the delimiter between records or between keys and values on the input?
in other words does the input file look like this: a\tb b\tc c\ta or does it look like this: a b\tb c\tc a\t ? Jeff On Thu, Jul 15, 2010 at 6:18 PM, Nikolay Korovaiko <korovai...@gmail.com>wrote: > Hi everyone, > > I hope this is the right place for my question. If not, please, feel free > to > ignore it ;) and I'm sorry for any inconvenience made :( > > I'm writing a simple program for enumerating triangles in directed graphs > for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab > symbol serves as a delimiter) I want my map function output the following > pairs ([a, to_b], [b, from_a], [a_b, -1]): > > public void map(LongWritable key, Text value, > > OutputCollector<Text, Text> output, > > Reporter reporter) throws IOException { > > String line = value.toString(); > > String [] tokens = line.split(" "); > > output.collect(new Text(tokens[0]), new Text("to_"+tokens[1])); > > output.collect(new Text(tokens[1]), new Text("from_"+tokens[0])); > > output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1")); > > } > > Now my reduce function is supposed to cross join all pairs that have both > to_'s and from_'s and to simply propogate any other pairs whose keys > contain > "_". > > public void reduce(Text key, Iterator<Text> values, > > OutputCollector<Text, Text> output, > > Reporter reporter) throws IOException { > > String key_s = key.toString(); > > if (key_s.indexOf("_")>0) > > output.collect(key, new Text("completed")); > > else { > > HashMap <String, ArrayList<String>> lists = new HashMap > <String, ArrayList<String>> (); > > while (values.hasNext()) { > > String line = values.next().toString(); > > String[] tokens = line.split("_"); > > if (!lists.containsKey(tokens[0])) { > > lists.put(tokens[0], new ArrayList<String>()); > > } > lists.get(tokens[0]).add(tokens[1]); > > } > > for (String t : lists.get("to")) > > for (String f : lists.get("from")) > > output.collect(new Text(t+"_"+f), key); > > > } > > } > > And this is where the most exciting stuff happens. tokens[1] yields an > ArrayOutOfBounds exception. If you scroll up, you can see that by this > point > the iterator should give values like "to_a", "from_b", "to_b", etc... when > I > just output these values, everything looks ok and I have "to_a", "from_b". > But split() don't work at all, moreover line.length() is always 1 and > indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys... > where we have pairs whose keys contain "_"* and look like "a_b", "b_c" > > I'm really puzzled with all this. MapReduce is supposed to save lives > making > everything simple. Instead I spent several hours to just spot this... > > I'd really appreciate your help, guys!!! Thanks in advance! >