AD.
IMO, the issue only occurs for console (maybe also text, but haven't
tried) sinks who call the Attributes.toString() method.
The hbase sink should be fine. Have you verified to write to hbase? I
don't think I had the problem before.
Thanks,
Mingjie
On 10/05/2011 06:23 AM, AD wrote:
i am pumping the results into Hbase and the value is showing up as 48
and not 0 which is a bit of an issue.
On Tue, Oct 4, 2011 at 11:41 PM, NerdyNick <[email protected]
<mailto:[email protected]>> wrote:
If you plan on using the attributes you extract in any of the
escaped/formated output paths or strings they will be fine. As those
decorators/sinks/source actually convert the bite array. The fact that
console doesn't make me think it should be flagged as a bug and should
be fixed as to reduce confusion. However I do see it as beneficial for
developers to have the raw bit values. So maybe we should also be
logging a DEBUG level message for that version of the output.
On Tue, Oct 4, 2011 at 7:19 PM, AD <[email protected]
<mailto:[email protected]>> wrote:
> Thanks, so is this a bug? My issue is that i am storing the
number of
> "bytes" served from my apache log, and when its 0, i will end up
storing 48
> and skewing the reports.
> Any thoughts?
>
> Thanks for the find.
> _AD
>
> On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai <[email protected]
<mailto:[email protected]>> wrote:
>>
>> AD.
>>
>> I noticed the issue before. It's actually not a regex problem,
but the way
>> flume printing byte array as string at collector side.
>>
>> You can also reproduce it by:
>> # bin/flume node_nowatch -1 -s -n dump -c 'dump:
tail("/tmp/integer") | {
>> value("bb", "b") => console};
>>
>> Below is the piece of code (Attributes.java). It takes a bytes
array whose
>> length is 1, 4, or 8 and print them as int or long. In case of
length 1, it
>> only prints the byte value.
>>
>> ---------------
>> // this is a hack that prints in int, string and double
format when
>> there
>> // are 8 bytes.
>> // TODO (jon) this gets grosser and grosser. make a final
decision on
>> how
>> // these attributes are going to be
>> if (bytes.length == 8) {
>>
>> return "(long)" + readLong(e, attr).toString() + "
(string) '"
>> + readString(e, attr) + "'" + " (double)"
>> + readDouble(e, attr).toString();
>> }
>>
>> // this is a similar hack that prints in int and string
format when
>> there
>> // are 4 bytes.
>> if (bytes.length == 4) {
>> return readInt(e, attr).toString() + " '" + readString(e,
attr) +
>> "'";
>> }
>>
>> if (bytes.length == 1) {
>> return "" + (((int) bytes[0]) & 0xff);
>> }
>>
>> ---------------
>>
>> -mingjie
>>
>> On 10/03/2011 07:40 PM, AD wrote:
>>>
>>> Hello,
>>>
>>> I noticed when trying to use regex to parse an integer from a
file, a
>>> number of 0 was populating the number 48 into the output on the
flume
>>> command line instead. has anyone come across this before? Example
>>> below:
>>>
>>> bash-3.2# cat /tmp/integer
>>> 0
>>>
>>> bash-3.2# cat parse.int <http://parse.int> <http://parse.int>
>>> ./flume node_nowatch -1 -s -n dump -c 'dump:
tail("/tmp/integer") | {
>>> regexAll("^(\\d+)","mynum") => console }; '
>>>
>>> bash-3.2# ./parse.int <http://parse.int> <http://parse.int>
2>&1 | grep mynum
>>>
>>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System
property
>>> sun.java.command=com.cloudera.flume.agent.FlumeNode -1 -s -n
dump -c
>>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") =>
console };
>>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading
spec from
>>> command line: 'dump: tail("/tmp/integer") | {
>>> regexAll("^(\\d+)","mynum") => console }; '
>>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* } {
>>> tailSrcFile : integer } 0
>>>
>>> Cheers,
>>> AD
>
>
--
Nick Verbeck - NerdyNick
----------------------------------------------------
NerdyNick.com
Coloco.ubuntu-rocks.org <http://Coloco.ubuntu-rocks.org>