Re: [Dev] How to do secondary indexing with bam-data-publisher for Stratos-Logging

Amani Soysa Wed, 04 Jul 2012 00:11:48 -0700

On Wed, Jul 4, 2012 at 12:30 PM, Srinath Perera <srin...@wso2.com> wrote:


> Hi All,
>
> Sorry for late reply
>
> As I see it, secondary indexes are not a part of the agent API, which is
> about collecting the data.
>
> Secondary index is about analyzing / presenting the data. So if user
> need secondary indexes, they will have to go to Cassandra directly, and
> define them. I think we can do the same in logging impl.
>
> Yes, after last friday  stratos-logging review, we decided to have real
time logs from the memory and to view older logs using bam analytics.
So we don't have this requirement anymore, We are writing a separate log
tool kit for BAM so that users can download long running logs separately.



> --Srinath
>
>
> On Fri, Jun 22, 2012 at 12:34 AM, Suhothayan Sriskandarajah <s...@wso2.com
> > wrote:
>
>>
>>
>> On Fri, Jun 22, 2012 at 2:36 PM, Amani Soysa <am...@wso2.com> wrote:
>>
>>>
>>>
>>> On Fri, Jun 22, 2012 at 2:29 PM, Tharindu Mathew <thari...@wso2.com>wrote:
>>>
>>>> This can be a useful feature for realtime requirements. But we need to
>>>> follow some proper convention as this changes the event stream definition
>>>> This is then an extensive change, and we are close to feature freezing.
>>>>
>>>> On Fri, Jun 22, 2012 at 2:15 PM, Amani Soysa <am...@wso2.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 22, 2012 at 2:02 PM, Deependra Ariyadewa <d...@wso2.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 22, 2012 at 1:45 PM, Amani Soysa <am...@wso2.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 22, 2012 at 1:24 PM, Tharindu Mathew 
>>>>>>> <thari...@wso2.com>wrote:
>>>>>>>
>>>>>>>> This will be useful for folks who want real time data access, but
>>>>>>>> BAM is not designed to be real time. I don't want the Agent API to be
>>>>>>>> specific to Cassandra, either.
>>>>>>>>
>>>>>>>> There should be a clean way to do this. How did you decide to do it
>>>>>>>> this way? Was there a discussion?
>>>>>>>>
>>>>>>> Yes there was a discussion on this some time back on Architecture
>>>>>>> -"RFC: Architecture for Stratos Log Processing"  where we decided to 
>>>>>>> push
>>>>>>> logs to bam event receiver through the publisher and view logs using 
>>>>>>> hector
>>>>>>> api.
>>>>>>>
>>>>>>
>>>>>> Initially we tried to use flume as the Stratos log collector/manager
>>>>>> but we stop flume evaluation because BAM's capability to cover the same 
>>>>>> use
>>>>>> case.
>>>>>>
>>>>>>  There are several workarounds like create relevant keyspaces in the
>>>>>> tenant creation or create extended event receiver only for logging.
>>>>>>
>>>>> If we do it in the tenant creation time their will be around 10
>>>>> keyspaces per each tenant (given that for each server, we create a 
>>>>> keyspace
>>>>> as I explained earlier) So even a user doesn't use a particular server
>>>>> their will be a keyspace for it. So if there are 1000 tenants there will 
>>>>> be
>>>>> 10 000 keyspace (even if some keyspaces are not used at all) So I think 
>>>>> its
>>>>> better to create keyspaces when ever logs are publishing to that 
>>>>> particular
>>>>> keyspace.
>>>>>
>>>> 10 keryspaces per tenant?? Are you sure that's right...
>>>>
>>> Yes for each sever (maybe more than 10 :) )
>>> ie - data services,appserver,esb,mb,cep,bps,brs etc all the products we
>>> offer for stratos deployment , because we store logs for each server in
>>> different keyspace(if not we have to keep everything in a single keyspace
>>> per tenant and it will be an expensive search when we filter the logs from
>>> the server level because we give users server specific logs)
>>> In our earlier syslog implementation we divided logs per each server for
>>> fast result (as logs generates). Thats why its very important to not to
>>> create keyspaces if users are not using a particular product.
>>>
>>
>> I also agree with tharindu, that we should not make stream definition
>> Cassandra specific, since this can also be used for JDBC data-store or
>> InMemory data-store for CEP.
>>
>> Since this is an Cassandra specific issue and since we know the stream
>> definition in advance, I believe its appropriate to have a Cassandra data
>> store configuration which maps the stream definition to the appropriate
>> Cassandra create key store query. Through this when the client request sent
>> for defineEventStream the Cassandra data-store can first checks if it match
>> one of the entry in the configuration, if so it runs the create key store
>> query given in the configuration to create the key store with indexes, else
>> it will create the key store in the normal way.
>> Through this we can also restrict the number of unnecessary key store
>> creation.
>>
>> Regards
>> Suho
>>
>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Deependra.
>>>>>>
>>>>>>>
>>>>>>>> On Fri, Jun 22, 2012 at 8:45 AM, Amani Soysa <am...@wso2.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Currently we are sending LogEvent data through bam data publisher
>>>>>>>>> to bam event receiver using a custom log4j appender. And we retrieve 
>>>>>>>>> logs
>>>>>>>>> using the hector API for the carbon log viewer. However, we need to 
>>>>>>>>> have
>>>>>>>>> secondary indexes for several columns so that we can filter log 
>>>>>>>>> information
>>>>>>>>> for a given column ( such as date, applicationName, priority,logger 
>>>>>>>>> etc)
>>>>>>>>> when creating the data publisher (keyspace). From the current Bam Data
>>>>>>>>> publisher implementation we cannot do secondary indexing all we can 
>>>>>>>>> do is
>>>>>>>>> define the column name and the data type of that column, and reciver
>>>>>>>>> creates the keyspaces for given columns with their data types.
>>>>>>>>>
>>>>>>>>>  streamId = dataPublisher.defineEventStream("{" + "
>>>>>>>>> 'name':'org.wso2.carbon.logging.$tenantId.$serverName',"
>>>>>>>>>                        + "  'version':'1.0.0'," + "  'nickName':
>>>>>>>>> 'Logs',"
>>>>>>>>>                        + "  'description': 'Logging Event'," + "
>>>>>>>>>  'metaData':["
>>>>>>>>>                        + "   {'name':'clientType','type':'STRING'}"
>>>>>>>>> + "  ],"
>>>>>>>>>                        + "  'payloadData':["
>>>>>>>>>                        + "          {'name':'tenantID','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'serverName','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'appName','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'logTime','type':'
>>>>>>>>> LONG'},"
>>>>>>>>>                        + "          {'name':'logger','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'priority','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'message','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                        + "          {'name':'ip','type':'STRING'},
>>>>>>>>> "
>>>>>>>>>                        + "          {'name':'stacktrace','type':'
>>>>>>>>> STRING'},"
>>>>>>>>>                         + "          {'name':'instance','type':'
>>>>>>>>> STRING'}"
>>>>>>>>>                        + "  ]"
>>>>>>>>>                        + "}");
>>>>>>>>>
>>>>>>>>> Is it possible to have a cassandra specific event receiver (for
>>>>>>>>> logging purposes) so that we can create key spaces with secondary 
>>>>>>>>> indexes?[
>>>>>>>>> 1 <https://wso2.org/jira/browse/CARBON-13468>] and it will create
>>>>>>>>> keyspaces when ever logs are published . Or do we need to create 
>>>>>>>>> keyspaces
>>>>>>>>> at tenant creation time?. For a given tenant we need to create several
>>>>>>>>> keyspaces, depending on the server (and if possible for applications 
>>>>>>>>> as
>>>>>>>>> well so we can have better performance when viewing logs).
>>>>>>>>> ie - keyspace1 - org_wso2_logging_tenant1_application_server
>>>>>>>>> (store AS specific logs)
>>>>>>>>>      keyspace2 -
>>>>>>>>> org_wso2_logging_tenant1_data_services_server  (store DSS specific 
>>>>>>>>> logs)
>>>>>>>>>
>>>>>>>>> Please note that we cannot use BAM analytics  to view logs because
>>>>>>>>> we need a real time log-viwer.
>>>>>>>>>
>>>>>>>>> [1] - https://wso2.org/jira/browse/CARBON-13468
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Amani
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Tharindu
>>>>>>>>
>>>>>>>> blog: http://mackiemathew.com/
>>>>>>>> M: +94777759908
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Deependra Ariyadewa
>>>>>> WSO2, Inc. http://wso2.com/ http://wso2.org
>>>>>>
>>>>>> email d...@wso2.com; cell +94 71 403 5996 ;
>>>>>> Blog http://risenfall.wordpress.com/
>>>>>> PGP info: KeyID: 'DC627E6F'
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Tharindu
>>>>
>>>> blog: http://mackiemathew.com/
>>>> M: +94777759908
>>>>
>>>>
>>>
>>
>>
>> --
>> *S. Suhothayan
>> *
>> Software Engineer,
>> Data Technologies Team,
>>  *WSO2, Inc. **http://wso2.com
>>  <http://wso2.com/>*
>> *lean.enterprise.middleware.*
>>
>> *email: **s...@wso2.com* <s...@wso2.com>* cell: (+94) 779 756 757
>> blog: **http://suhothayan.blogspot.com/*<http://suhothayan.blogspot.com/>
>> *
>> twitter: **http://twitter.com/suhothayan* <http://twitter.com/suhothayan>
>> *
>> linked-in: **http://lk.linkedin.com/in/suhothayan*
>> *
>> *
>>
>>
>
>
> --
> ============================
> Srinath Perera, Ph.D.
>   Senior Software Architect, WSO2 Inc.
>   Visiting Faculty, University of Moratuwa
>   Member, Apache Software Foundation
>   Research Scientist, Lanka Software Foundation
>   Blog: http://srinathsview.blogspot.com/
>   Photos: http://www.flickr.com/photos/hemapani/
>  Phone: 0772360902
>

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] How to do secondary indexing with bam-data-publisher for Stratos-Logging

Reply via email to