Right now data is received in parallel and is written to a queue, then a single thread reads the queue and writes those messages to a FSDataOutputStream which is kept open, but the messages never get flushed. Tried flush() and sync() with no joy. 1. outputStream.writeBytes(rawMessage.toString());
2. log.debug("Flushing stream, size = " + s.getOutputStream().size()); s.getOutputStream().sync(); log.debug("Flushed stream, size = " + s.getOutputStream().size()); or log.debug("Flushing stream, size = " + s.getOutputStream().size()); s.getOutputStream().flush(); log.debug("Flushed stream, size = " + s.getOutputStream().size()); Just see the size() remain the same after performing this action. This is using hadoop-0.20.0. -sd On Sun, May 10, 2009 at 4:45 PM, Stefan Podkowinski <spo...@gmail.com>wrote: > You just can't have many distributed jobs write into the same file > without locking/synchronizing these writes. Even with append(). Its > not different than using a regular file from multiple processes in > this respect. > Maybe you need to collect your data in front before processing them in > hadoop? > Have a look at Chukwa, http://wiki.apache.org/hadoop/Chukwa > > > On Sat, May 9, 2009 at 9:44 AM, Sasha Dolgy <sdo...@gmail.com> wrote: > > Would WritableFactories not allow me to open one outputstream and > continue > > to write() and sync() ? > > > > Maybe I'm reading into that wrong. Although UUID would be nice, it would > > still leave me in the problem of having lots of little files instead of a > > few large files. > > > > -sd > > > > On Sat, May 9, 2009 at 8:37 AM, jason hadoop <jason.had...@gmail.com> > wrote: > > > >> You must create unique file names, I don't believe (but I do not know) > that > >> the append could will allow multiple writers. > >> > >> Are you writing from within a task, or as an external application > writing > >> into hadoop. > >> > >> You may try using UUID, > >> http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html, as part of > >> your > >> filename. > >> Without knowing more about your goals, environment and constraints it is > >> hard to offer any more detailed suggestions. > >> You could also have an application aggregate the streams and write out > >> chunks, with one or more writers, one per output file. >