Re: Modeling Audit Trail on Cassandra
> > Is text the most appropriate data type to store JSON that contain couple > of dozen lines ? > It sure is the simplest way to store JSON. The query requirement is "where executedby = ?”. > Since executedby is a timeuuid, I guess you don't want to query a single record, since that would require you to know the exact timeuuid. Do you mean that you would like to query all changes in a certain time frame, e.g. today? In that case, you would have to group your rows in time buckets, e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any other period that suits your situation. Retrieving all changes in a specific time frame is done by retrieving all relevant periods. Tom
Re: Modeling Audit Trail on Cassandra
executedby is the ID assigned to an employee. I'm presuming that JSON is to be used for objectbefore/after. This suggests no ability to query by individual object fields. I didn't sense any other columns that would be JSON. -- Jack Krupansky On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge wrote: > Is text the most appropriate data type to store JSON that contain couple >> of dozen lines ? >> > > It sure is the simplest way to store JSON. > > The query requirement is "where executedby = ?”. >> > > Since executedby is a timeuuid, I guess you don't want to query a single > record, since that would require you to know the exact timeuuid. Do you > mean that you would like to query all changes in a certain time frame, e.g. > today? In that case, you would have to group your rows in time buckets, > e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any > other period that suits your situation. Retrieving all changes in a > specific time frame is done by retrieving all relevant periods. > > Tom >
Re: Modeling Audit Trail on Cassandra
I would arrange your primary key by how You intend to query. Primary key ((executedby), auditid) This allows you to query for who did it, and optionally on a time range for when it occurred. Retrieving in chronological order. You could do it with your proposed schema and Lucene but for what you have stated your requirements are lucene is not necessary. This proposed structure could result in wide partitions, depending on how busy individuals are, and if so introducing a granular bucket to the primary key can combat this. The overhead of doing so is relatively minor. You can of course use Lucene with this model as well for filtering on other fields. (Or add more fields to your index as appropriate) Clint On Mar 16, 2016 4:52 PM, "Jack Krupansky" wrote: > Stratio (or DSE Search) should be good for ad hoc or complex queries, but > if there are some fixed/common query patterns you might be better off > implementing query tables or using materialized views. The latter allows > you to include a non-PK data column in the PK of the MV so that you can > directly access the indexed row without the complexity of Lucene/DSE. This > also lets you effectively cluster data that will be commonly accessed > together on a single node/partition, and to do it automatically without any > application logic to manually duplicate/update data. > > (3.x still has the restriction that an MV PK can only include one non-PK > data column - CASSANDRA-9928 > <https://issues.apache.org/jira/browse/CASSANDRA-9928>.) > > -- Jack Krupansky > > On Wed, Mar 16, 2016 at 4:40 PM, I PVP wrote: > >> Jack/Tom >> Thanks for answering. >> >> Here is the table definition so far: >> >> CREATE TABLE audit_trail ( >> auditid timeuuid, >> actiontype text, >> objecttype text, >> executedby uuid ( or timeuuid?), >> executedat timestamp, >> objectbefore text, >> objectafter text, >> clientipaddr text, >> serveripaddr text, >> servername text, >> channel text, >> PRIMARY KEY (auditid) >> ); >> >> objectbefore/after are the only ones that will have JSON content. quering >> based on the contents of these two columns are not a requirement. >> >> At this moment the queries are going to be mainly on executedby ( the >> employee id). >> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering >> on executedat (timestamp) ,objecttype(order, customer, ticket, >> message,account, paymenttransaction,refund etc.) and actiontype(create, >> retrieve, update, delete, approve, activate, unlock, lock etc.) . >> >> I am considering to count exclusively on Stratio’s Cassandra Lucene >> filtering and avoid to add “period” columns like month(int), year(int), >> day (int). >> >> Thanks >> >> -- >> IPVP >> >> >> From: Jack Krupansky >> >> Reply: user@cassandra.apache.org > >> >> Date: March 16, 2016 at 5:22:36 PM >> To: user@cassandra.apache.org > >> >> Subject: Re: Modeling Audit Trail on Cassandra >> >> executedby is the ID assigned to an employee. >> >> I'm presuming that JSON is to be used for objectbefore/after. This >> suggests no ability to query by individual object fields. I didn't sense >> any other columns that would be JSON. >> >> >> >> -- Jack Krupansky >> >> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge >> wrote: >> >>> Is text the most appropriate data type to store JSON that contain couple >>>> of dozen lines ? >>>> >>> >>> It sure is the simplest way to store JSON. >>> >>> The query requirement is "where executedby = ?”. >>>> >>> >>> Since executedby is a timeuuid, I guess you don't want to query a single >>> record, since that would require you to know the exact timeuuid. Do you >>> mean that you would like to query all changes in a certain time frame, e.g. >>> today? In that case, you would have to group your rows in time buckets, >>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any >>> other period that suits your situation. Retrieving all changes in a >>> specific time frame is done by retrieving all relevant periods. >>> >>> Tom >>> >> >> >
Re: Modeling Audit Trail on Cassandra
Stratio (or DSE Search) should be good for ad hoc or complex queries, but if there are some fixed/common query patterns you might be better off implementing query tables or using materialized views. The latter allows you to include a non-PK data column in the PK of the MV so that you can directly access the indexed row without the complexity of Lucene/DSE. This also lets you effectively cluster data that will be commonly accessed together on a single node/partition, and to do it automatically without any application logic to manually duplicate/update data. (3.x still has the restriction that an MV PK can only include one non-PK data column - CASSANDRA-9928 <https://issues.apache.org/jira/browse/CASSANDRA-9928>.) -- Jack Krupansky On Wed, Mar 16, 2016 at 4:40 PM, I PVP wrote: > Jack/Tom > Thanks for answering. > > Here is the table definition so far: > > CREATE TABLE audit_trail ( > auditid timeuuid, > actiontype text, > objecttype text, > executedby uuid ( or timeuuid?), > executedat timestamp, > objectbefore text, > objectafter text, > clientipaddr text, > serveripaddr text, > servername text, > channel text, > PRIMARY KEY (auditid) > ); > > objectbefore/after are the only ones that will have JSON content. quering > based on the contents of these two columns are not a requirement. > > At this moment the queries are going to be mainly on executedby ( the > employee id). > Stratio’s Cassandra Lucene Index will be used to allow querying/filtering > on executedat (timestamp) ,objecttype(order, customer, ticket, > message,account, paymenttransaction,refund etc.) and actiontype(create, > retrieve, update, delete, approve, activate, unlock, lock etc.) . > > I am considering to count exclusively on Stratio’s Cassandra Lucene > filtering and avoid to add “period” columns like month(int), year(int), > day (int). > > Thanks > > -- > IPVP > > > From: Jack Krupansky > Reply: user@cassandra.apache.org > > > Date: March 16, 2016 at 5:22:36 PM > To: user@cassandra.apache.org > > > Subject: Re: Modeling Audit Trail on Cassandra > > executedby is the ID assigned to an employee. > > I'm presuming that JSON is to be used for objectbefore/after. This > suggests no ability to query by individual object fields. I didn't sense > any other columns that would be JSON. > > > > -- Jack Krupansky > > On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge > wrote: > >> Is text the most appropriate data type to store JSON that contain couple >>> of dozen lines ? >>> >> >> It sure is the simplest way to store JSON. >> >> The query requirement is "where executedby = ?”. >>> >> >> Since executedby is a timeuuid, I guess you don't want to query a single >> record, since that would require you to know the exact timeuuid. Do you >> mean that you would like to query all changes in a certain time frame, e.g. >> today? In that case, you would have to group your rows in time buckets, >> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any >> other period that suits your situation. Retrieving all changes in a >> specific time frame is done by retrieving all relevant periods. >> >> Tom >> > >
Re: Modeling Audit Trail on Cassandra
Jack/Tom Thanks for answering. Here is the table definition so far: CREATE TABLE audit_trail ( auditid timeuuid, actiontype text, objecttype text, executedby uuid ( or timeuuid?), executedat timestamp, objectbefore text, objectafter text, clientipaddr text, serveripaddr text, servername text, channel text, PRIMARY KEY (auditid) ); objectbefore/after are the only ones that will have JSON content. quering based on the contents of these two columns are not a requirement. At this moment the queries are going to be mainly on executedby ( the employee id). Stratio’s Cassandra Lucene Index will be used to allow querying/filtering on executedat (timestamp) ,objecttype(order, customer, ticket, message,account, paymenttransaction,refund etc.) and actiontype(create, retrieve, update, delete, approve, activate, unlock, lock etc.) . I am considering to count exclusively on Stratio’s Cassandra Lucene filtering and avoid to add “period” columns like month(int), year(int), day (int). Thanks -- IPVP From: Jack Krupansky <mailto:jack.krupan...@gmail.com> Reply: user@cassandra.apache.org ><mailto:user@cassandra.apache.org> Date: March 16, 2016 at 5:22:36 PM To: user@cassandra.apache.org ><mailto:user@cassandra.apache.org> Subject: Re: Modeling Audit Trail on Cassandra executedby is the ID assigned to an employee. I'm presuming that JSON is to be used for objectbefore/after. This suggests no ability to query by individual object fields. I didn't sense any other columns that would be JSON. -- Jack Krupansky On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge mailto:t...@drillster.com>> wrote: Is text the most appropriate data type to store JSON that contain couple of dozen lines ? It sure is the simplest way to store JSON. The query requirement is "where executedby = ?”. Since executedby is a timeuuid, I guess you don't want to query a single record, since that would require you to know the exact timeuuid. Do you mean that you would like to query all changes in a certain time frame, e.g. today? In that case, you would have to group your rows in time buckets, e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any other period that suits your situation. Retrieving all changes in a specific time frame is done by retrieving all relevant periods. Tom
Modeling Audit Trail on Cassandra
Hi everyone, I am looking for your feedback or advice on modeling an audit trail log table on Cassandra that stores information from tracking everything an employee changes within the application. The existing application is being migrated from mysql to Cassandra. Is text the most appropriate data type to store JSON that contain couple of dozen lines ? CREATE TABLE audit_trail ( auditid timeuuid, actiontype text, objecttype text, executedby timeuuid, executedat text, objectbefore text, objectafter text, clientipaddr text, serveripaddr text, servername text, channel text, PRIMARY KEY (auditid) ); auditid // an UUID of the audit log actiontype // create, retrieve, update, delete, approve, activate, unlock, lock etc. objecttype // order, customer, ticket, message,account, paymenttransaction,refund executedby // the UUID of the employee executedat // timestamp when the action happened objectbefore // the json of the object before the change objectafter // the json of the object after the change clientipaddr // the ip address of the client serveripaddr // the server ip address that handled the request servername // the server name that handled the request channel //web, mobile, call center The query requirement is "where executedby = ?”. Will be using Stratio’s Cassandra Lucene Index to support querying/filtering. Thanks! -- IPVP