First off, Parth, this is a really exciting project, and I'm glad you're taking 
it on.

As an SIEM refugee, I have a few questions about the proposal and a few 
thoughts about syslog, generally, and that may help you work out some of your 
thoughts about data types and how you plan to structure data in Riak.

As far as I understand, you're talking about a mapping from keys to sets, but 
I'm unclear on a few things.  What are the keys you are thinking about?  Time 
stamps?  If timestamps, these are presumably the timestamps of the syslog 
event?  Just a word of warning, if so.   You might find a lot of variation in 
timestamp formats and granularity.  Perhaps you can get something reliable out 
of syslog-ng, but that won't help you in the case where syslog-ng is 
functioning as a syslog relay, and you want to preserve the timestamp of the 
originator, which you should, if you want to preserve integrity of the logs 
(e.g, for compliance).  Or are you talking about a key being a (course grained) 
timestamp, say, an integral value in UTC seconds, for example?  And the 
value(s) being all logs in that interval?  Is that your motivation for sets?

How much of the syslog payload are you planning to parse?  RFC-3164 and 
RFC-5424 provide enough BNF to allow standard syslog producers/consumers to 
provide pretty elaborately structured data in a syslog "datagram" (be it sent 
via UDP or TCP, or what have you).  RFC-5424, in particular, has support for 
arbitrarily structured data in a syslog header, which is pretty nice.  However, 
I personally have run into a few issues with this RFC.

First, very few, if any syslog generators support this RFC.  Certainly the 
"legacy" enterprise log sources (operating systems, firewall vendors, etc) 
don't, and even the syslog API [1] doesn't provide enough parameters to make 
structured logs a possibility.  I think there may be some work to improve APIs 
in the community, but to my knowledge nothing has been standardized, no vendors 
are taking up the work.  Besides, most of the syslog implementations out there 
obey the "non-normative" behavior of RFC-3164, so you can get some pretty 
quirky logs in the wild.

Another interesting problem is that the STRUCTURED-DATA element of 5424 uses 
OIDs to discriminate different data types that are encoded in the header.  And 
while there is a kind of loosely coupled authority for OIDs, there is no 
infrastructure for determining a parsing strategy for these fields.  They could 
really be anything, in the worst case.

But regardless of the deeply structured data, you could get some very 
interesting traction by just taking standard headers and indexing them through 
Yokozuna.  Certainly, indexing the body of a syslog message is a great idea, as 
these messages are generally unstructured and fodder for lucene.  This is 
something that Logstash/ElasticSearch can do pretty effectively today, and it 
would be cool to see the same in Riak + some syslog provider.

Finally, it would be really nice if you could structure your plugin in such a 
way that they could eventually be ported to rsyslog [2].  The rsyslogd daemon 
is deployed by default on certain Linux favors and enjoys fairly widespread 
distribution.  You might be able to get it supported in that community, as well.

Best of luck,

-Fred

[1] See http://linux.die.net/man/3/syslog for example.
[2] http://www.rsyslog.com/doc/v8-stable/

> On May 5, 2015, at 8:11 AM, Christopher Meiklejohn <cmeiklej...@basho.com> 
> wrote:
> 
> 
>> On May 5, 2015, at 1:01 PM, Gergely Nagy <alger...@madhouse-project.org> 
>> wrote:
>> 
>>>>>>> "Christopher" == Christopher Meiklejohn <cmeiklej...@basho.com> writes:
>> 
>>   Christopher> I’m a bit concerned with your use of the set embedded in the
>>   Christopher> map.
>> 
>> The original idea was to use a Set directly. The Set-in-Map thing was
>> just a thought experiment (Map-in-Set would make more sense).
>> 
>>   Christopher> Large objects have traditionally been a big problem in Riak 
>> due
>>   Christopher> to the use of distributed Erlang and head of line blocking. 
>> I’m
>>   Christopher> curious if you could elaborate on what type of data you will 
>> be
>>   Christopher> storing in the set: how big you expect each item to be, how 
>> big you
>>   Christopher> expect the map to be, and the overall layout of data inside 
>> of the
>>   Christopher> data structure.
>> 
>> The intention is to store log messages in each element of the set:
>> either as a string (syslog or json, or whatever else the user sees fit),
>> or as a map of key-value pairs (where values themselves can be maps
>> too).
>> 
>> On average, the log messages are a few kilobytes in size. There may be
>> exceptions, but >1mb ones are fairly rare. How much data the set would
>> hold... now that's a question that can't really be answered. It is
>> really up to the syslog-ng user to configure that.
> 
> I’m referring to the size of the entire set, not the objects that will be 
> members of 
> the set. Therefore, the performance penalty seen when using large objects 
> would 
> be observed as soon as the size of the entire set (or map) has reached ~1 MB. 
> Given that restriction, I’d imagine you would only be able to store a few 
> messages
> in each set.  That granularity seems like you are no longer getting the 
> benefits 
> of the set.
> 
> Additionally, the primary benefit of the data types in Riak is that they 
> converge 
> deterministically when dealing with concurrent operations.  I’m curious if 
> the set
> is the right choice here; could you just use a custom set format inside of a 
> normal
> Riak object (or store one message per Riak object, given the write will be an 
> immutable log entry?)
> 
> Thanks,
> - Chris
> 
> Christopher Meiklejohn
> Senior Software Engineer
> Basho Technologies, Inc.
> cmeiklej...@basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to