apologies, you are correct. Hbase is fine. column=f1:response_bytes, timestamp=1317954678177, value=0
Cheers, AD On Thu, Oct 6, 2011 at 2:46 AM, Mingjie Lai <[email protected]> wrote: > AD. > > IMO, the issue only occurs for console (maybe also text, but haven't tried) > sinks who call the Attributes.toString() method. > > The hbase sink should be fine. Have you verified to write to hbase? I don't > think I had the problem before. > > Thanks, > Mingjie > > > On 10/05/2011 06:23 AM, AD wrote: > >> i am pumping the results into Hbase and the value is showing up as 48 >> and not 0 which is a bit of an issue. >> >> On Tue, Oct 4, 2011 at 11:41 PM, NerdyNick <[email protected] >> <mailto:[email protected]>> wrote: >> >> If you plan on using the attributes you extract in any of the >> escaped/formated output paths or strings they will be fine. As those >> decorators/sinks/source actually convert the bite array. The fact that >> console doesn't make me think it should be flagged as a bug and should >> be fixed as to reduce confusion. However I do see it as beneficial for >> developers to have the raw bit values. So maybe we should also be >> logging a DEBUG level message for that version of the output. >> >> On Tue, Oct 4, 2011 at 7:19 PM, AD <[email protected] >> <mailto:straightflush@gmail.**com <[email protected]>>> wrote: >> > Thanks, so is this a bug? My issue is that i am storing the >> number of >> > "bytes" served from my apache log, and when its 0, i will end up >> storing 48 >> > and skewing the reports. >> > Any thoughts? >> > >> > Thanks for the find. >> > _AD >> > >> > On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> AD. >> >> >> >> I noticed the issue before. It's actually not a regex problem, >> but the way >> >> flume printing byte array as string at collector side. >> >> >> >> You can also reproduce it by: >> >> # bin/flume node_nowatch -1 -s -n dump -c 'dump: >> tail("/tmp/integer") | { >> >> value("bb", "b") => console}; >> >> >> >> Below is the piece of code (Attributes.java). It takes a bytes >> array whose >> >> length is 1, 4, or 8 and print them as int or long. In case of >> length 1, it >> >> only prints the byte value. >> >> >> >> --------------- >> >> // this is a hack that prints in int, string and double >> format when >> >> there >> >> // are 8 bytes. >> >> // TODO (jon) this gets grosser and grosser. make a final >> decision on >> >> how >> >> // these attributes are going to be >> >> if (bytes.length == 8) { >> >> >> >> return "(long)" + readLong(e, attr).toString() + " >> (string) '" >> >> + readString(e, attr) + "'" + " (double)" >> >> + readDouble(e, attr).toString(); >> >> } >> >> >> >> // this is a similar hack that prints in int and string >> format when >> >> there >> >> // are 4 bytes. >> >> if (bytes.length == 4) { >> >> return readInt(e, attr).toString() + " '" + readString(e, >> attr) + >> >> "'"; >> >> } >> >> >> >> if (bytes.length == 1) { >> >> return "" + (((int) bytes[0]) & 0xff); >> >> } >> >> >> >> --------------- >> >> >> >> -mingjie >> >> >> >> On 10/03/2011 07:40 PM, AD wrote: >> >>> >> >>> Hello, >> >>> >> >>> I noticed when trying to use regex to parse an integer from a >> file, a >> >>> number of 0 was populating the number 48 into the output on the >> flume >> >>> command line instead. has anyone come across this before? >> Example >> >>> below: >> >>> >> >>> bash-3.2# cat /tmp/integer >> >>> 0 >> >>> >> >>> bash-3.2# cat parse.int <http://parse.int> <http://parse.int> >> >> >>> ./flume node_nowatch -1 -s -n dump -c 'dump: >> tail("/tmp/integer") | { >> >>> regexAll("^(\\d+)","mynum") => console }; ' >> >>> >> >>> bash-3.2# ./parse.int <http://parse.int> <http://parse.int> >> >> 2>&1 | grep mynum >> >>> >> >>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System >> property >> >>> sun.java.command=com.cloudera.**flume.agent.FlumeNode -1 -s -n >> dump -c >> >>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") => >> console }; >> >>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading >> spec from >> >>> command line: 'dump: tail("/tmp/integer") | { >> >>> regexAll("^(\\d+)","mynum") => console }; ' >> >>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } { >> >>> tailSrcFile : integer } 0 >> >>> >> >>> Cheers, >> >>> AD >> > >> > >> >> >> >> -- >> Nick Verbeck - NerdyNick >> ------------------------------**---------------------- >> NerdyNick.com >> Coloco.ubuntu-rocks.org >> <http://Coloco.ubuntu-rocks.**org<http://Coloco.ubuntu-rocks.org> >> > >> >> >>
