The messy code is my mistake. After using the SequenceFileInputFormat ,the file is clear . But the metadata in value is mixed with my log . Add a \n after the metadata is better.
On Sat, Nov 20, 2010 at 2:24 AM, Jerome Boulon <[email protected]> wrote: > Just a warning if you are using Text output format then you will have some > hard time with ā\nā inside your logs like stackTrace for example. > Also, text file will either be non-compressed or non-splittable. > > /Jerome. > > > On 11/19/10 9:30 AM, "Eric Yang" <[email protected]> wrote: > > > > > On 11/19/10 12:37 AM, "Ying Tang" <[email protected]> wrote: > > Hi all , > 1. I have install 2 nodes chukwa for testing , one agent and one > collector . And also i have an hdfs , but i found the log collected by the > collector in hdfs , the file name is > time+logsourcehost+java.rmi.server.UID() > time's format is yyyyddHHmmssSSS , there is no month ? And this > is been written in the code . > I need the month , so i must change the code and recompile it ? > 2. And another question , the log content in the log file(in the > hdfs) , the metadata is messy code , the log content from the agent is ok. > My adaptor is UTF8 , how to solve this? > > > > 1. Looks like a mistake on the temp filename. Please open a jira and > we will fix it. > 2. The data is recorded in sequence file format to make the data easier > to process with mapreduce. If you are expecting plain text of the log > content, you will need to write a map/reduce job with output format to text > output format and channel the log files types according. > > > Regards, > Eric > > -- Best regards, Ivy Tang
