Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Tom van den Berge
>
> Is text the most appropriate data type to store JSON that contain couple
> of dozen lines ?
>

It sure is the simplest way to store JSON.

The query requirement  is  "where executedby = ?”.
>

Since executedby is a timeuuid, I guess you don't want to query a single
record, since that would require you to know the exact timeuuid. Do you
mean that you would like to query all changes in a certain time frame, e.g.
today? In that case, you would have to group your rows in time buckets,
e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
other period that suits your situation. Retrieving all changes in a
specific time frame is done by retrieving all relevant periods.

Tom


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
executedby is the ID assigned to an employee.

I'm presuming that JSON is to be used for objectbefore/after. This suggests
no ability to query by individual object fields. I didn't sense any other
columns that would be JSON.



-- Jack Krupansky

On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
wrote:

> Is text the most appropriate data type to store JSON that contain couple
>> of dozen lines ?
>>
>
> It sure is the simplest way to store JSON.
>
> The query requirement  is  "where executedby = ?”.
>>
>
> Since executedby is a timeuuid, I guess you don't want to query a single
> record, since that would require you to know the exact timeuuid. Do you
> mean that you would like to query all changes in a certain time frame, e.g.
> today? In that case, you would have to group your rows in time buckets,
> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
> other period that suits your situation. Retrieving all changes in a
> specific time frame is done by retrieving all relevant periods.
>
> Tom
>


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Clint Martin
I would arrange your primary key by how You intend to query.

Primary key ((executedby), auditid)

This allows you to query for who did it, and optionally on a time range for
when it occurred.  Retrieving in chronological order.

You could do it with your proposed schema and Lucene but for what you have
stated your requirements are lucene is not necessary.

This proposed structure could result in wide partitions, depending on how
busy individuals are, and if so introducing a granular bucket to the
primary key can combat this. The overhead of doing so is relatively minor.

You can of course use Lucene with this model as well for filtering on other
fields. (Or add more fields to your index as appropriate)

Clint
On Mar 16, 2016 4:52 PM, "Jack Krupansky"  wrote:

> Stratio (or DSE Search) should be good for ad hoc or complex queries, but
> if there are some fixed/common query patterns you might be better off
> implementing query tables or using materialized views. The latter allows
> you to include a non-PK data column in the PK of the MV so that you can
> directly access the indexed row without the complexity of Lucene/DSE. This
> also lets you effectively cluster data that will be commonly accessed
> together on a single node/partition, and to do it automatically without any
> application logic to manually duplicate/update data.
>
> (3.x still has the restriction that an MV PK can only include one non-PK
> data column - CASSANDRA-9928
> <https://issues.apache.org/jira/browse/CASSANDRA-9928>.)
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 4:40 PM, I PVP  wrote:
>
>> Jack/Tom
>> Thanks for answering.
>>
>> Here is the table definition so far:
>>
>> CREATE TABLE audit_trail (
>> auditid timeuuid,
>> actiontype text,
>> objecttype text,
>> executedby uuid ( or timeuuid?),
>> executedat timestamp,
>> objectbefore text,
>> objectafter text,
>> clientipaddr text,
>> serveripaddr text,
>> servername text,
>> channel text,
>> PRIMARY KEY (auditid)
>> );
>>
>> objectbefore/after are the only ones that will have JSON content. quering
>> based on the contents of these two  columns are not a requirement.
>>
>> At this moment the queries are going to be mainly on executedby ( the
>> employee id).
>> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering
>> on executedat (timestamp) ,objecttype(order, customer, ticket,
>> message,account, paymenttransaction,refund etc.)  and actiontype(create,
>> retrieve, update, delete, approve, activate, unlock, lock etc.) .
>>
>> I am considering to count exclusively on Stratio’s Cassandra Lucene
>>  filtering and avoid to add  “period” columns like month(int), year(int),
>> day (int).
>>
>> Thanks
>>
>> --
>> IPVP
>>
>>
>> From: Jack Krupansky 
>> 
>> Reply: user@cassandra.apache.org >
>> 
>> Date: March 16, 2016 at 5:22:36 PM
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling Audit Trail on Cassandra
>>
>> executedby is the ID assigned to an employee.
>>
>> I'm presuming that JSON is to be used for objectbefore/after. This
>> suggests no ability to query by individual object fields. I didn't sense
>> any other columns that would be JSON.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
>> wrote:
>>
>>> Is text the most appropriate data type to store JSON that contain couple
>>>> of dozen lines ?
>>>>
>>>
>>> It sure is the simplest way to store JSON.
>>>
>>> The query requirement  is  "where executedby = ?”.
>>>>
>>>
>>> Since executedby is a timeuuid, I guess you don't want to query a single
>>> record, since that would require you to know the exact timeuuid. Do you
>>> mean that you would like to query all changes in a certain time frame, e.g.
>>> today? In that case, you would have to group your rows in time buckets,
>>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
>>> other period that suits your situation. Retrieving all changes in a
>>> specific time frame is done by retrieving all relevant periods.
>>>
>>> Tom
>>>
>>
>>
>


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
Stratio (or DSE Search) should be good for ad hoc or complex queries, but
if there are some fixed/common query patterns you might be better off
implementing query tables or using materialized views. The latter allows
you to include a non-PK data column in the PK of the MV so that you can
directly access the indexed row without the complexity of Lucene/DSE. This
also lets you effectively cluster data that will be commonly accessed
together on a single node/partition, and to do it automatically without any
application logic to manually duplicate/update data.

(3.x still has the restriction that an MV PK can only include one non-PK
data column - CASSANDRA-9928
<https://issues.apache.org/jira/browse/CASSANDRA-9928>.)

-- Jack Krupansky

On Wed, Mar 16, 2016 at 4:40 PM, I PVP  wrote:

> Jack/Tom
> Thanks for answering.
>
> Here is the table definition so far:
>
> CREATE TABLE audit_trail (
> auditid timeuuid,
> actiontype text,
> objecttype text,
> executedby uuid ( or timeuuid?),
> executedat timestamp,
> objectbefore text,
> objectafter text,
> clientipaddr text,
> serveripaddr text,
> servername text,
> channel text,
> PRIMARY KEY (auditid)
> );
>
> objectbefore/after are the only ones that will have JSON content. quering
> based on the contents of these two  columns are not a requirement.
>
> At this moment the queries are going to be mainly on executedby ( the
> employee id).
> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering
> on executedat (timestamp) ,objecttype(order, customer, ticket,
> message,account, paymenttransaction,refund etc.)  and actiontype(create,
> retrieve, update, delete, approve, activate, unlock, lock etc.) .
>
> I am considering to count exclusively on Stratio’s Cassandra Lucene
>  filtering and avoid to add  “period” columns like month(int), year(int),
> day (int).
>
> Thanks
>
> --
> IPVP
>
>
> From: Jack Krupansky  
> Reply: user@cassandra.apache.org >
> 
> Date: March 16, 2016 at 5:22:36 PM
> To: user@cassandra.apache.org >
> 
> Subject:  Re: Modeling Audit Trail on Cassandra
>
> executedby is the ID assigned to an employee.
>
> I'm presuming that JSON is to be used for objectbefore/after. This
> suggests no ability to query by individual object fields. I didn't sense
> any other columns that would be JSON.
>
>
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
> wrote:
>
>> Is text the most appropriate data type to store JSON that contain couple
>>> of dozen lines ?
>>>
>>
>> It sure is the simplest way to store JSON.
>>
>> The query requirement  is  "where executedby = ?”.
>>>
>>
>> Since executedby is a timeuuid, I guess you don't want to query a single
>> record, since that would require you to know the exact timeuuid. Do you
>> mean that you would like to query all changes in a certain time frame, e.g.
>> today? In that case, you would have to group your rows in time buckets,
>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
>> other period that suits your situation. Retrieving all changes in a
>> specific time frame is done by retrieving all relevant periods.
>>
>> Tom
>>
>
>


Re: Modeling Audit Trail on Cassandra

2016-03-18 Thread I PVP
Jack/Tom
Thanks for answering.

Here is the table definition so far:

CREATE TABLE audit_trail (
auditid timeuuid,
actiontype text,
objecttype text,
executedby uuid ( or timeuuid?),
executedat timestamp,
objectbefore text,
objectafter text,
clientipaddr text,
serveripaddr text,
servername text,
channel text,
PRIMARY KEY (auditid)
);

objectbefore/after are the only ones that will have JSON content. quering based 
on the contents of these two  columns are not a requirement.

At this moment the queries are going to be mainly on executedby ( the employee 
id).
Stratio’s Cassandra Lucene Index will be used to allow querying/filtering on 
executedat (timestamp) ,objecttype(order, customer, ticket, message,account, 
paymenttransaction,refund etc.)  and actiontype(create, retrieve, update, 
delete, approve, activate, unlock, lock etc.) .

I am considering to count exclusively on Stratio’s Cassandra Lucene  filtering 
and avoid to add  “period” columns like month(int), year(int), day (int).

Thanks

--
IPVP


From: Jack Krupansky <mailto:jack.krupan...@gmail.com>
Reply: user@cassandra.apache.org 
><mailto:user@cassandra.apache.org>
Date: March 16, 2016 at 5:22:36 PM
To: user@cassandra.apache.org 
><mailto:user@cassandra.apache.org>
Subject:  Re: Modeling Audit Trail on Cassandra

executedby is the ID assigned to an employee.

I'm presuming that JSON is to be used for objectbefore/after. This suggests no 
ability to query by individual object fields. I didn't sense any other columns 
that would be JSON.



-- Jack Krupansky

On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
mailto:t...@drillster.com>> wrote:
Is text the most appropriate data type to store JSON that contain couple of 
dozen lines ?

It sure is the simplest way to store JSON.

The query requirement  is  "where executedby = ?”.

Since executedby is a timeuuid, I guess you don't want to query a single 
record, since that would require you to know the exact timeuuid. Do you mean 
that you would like to query all changes in a certain time frame, e.g. today? 
In that case, you would have to group your rows in time buckets, e.g. PRIMARY 
KEY ((period), auditid). Period can be a day, month, or any other period that 
suits your situation. Retrieving all changes in a specific time frame is done 
by retrieving all relevant periods.

Tom



Modeling Audit Trail on Cassandra

2016-03-15 Thread I PVP
Hi everyone,

I am looking for your  feedback or advice on modeling an audit trail log table 
on Cassandra that stores information from tracking everything an employee 
changes within the application.

The existing application is being migrated from mysql to Cassandra.

Is text the most appropriate data type to store JSON that contain couple of 
dozen lines ?

CREATE TABLE audit_trail (
auditid timeuuid,
actiontype text,
objecttype text,
executedby timeuuid,
executedat text,
objectbefore text,
objectafter text,
clientipaddr text,
serveripaddr text,
servername text,
channel text,
PRIMARY KEY (auditid)
);

auditid // an UUID of the audit log
actiontype // create, retrieve, update, delete, approve, activate, unlock, lock 
etc.
objecttype // order, customer, ticket, message,account, 
paymenttransaction,refund
executedby // the UUID of the employee
executedat // timestamp when the action happened
objectbefore // the json of the object before the change
objectafter  // the json of the object after the change
clientipaddr // the ip address of the client
serveripaddr // the server ip address that handled the request
servername // the server name that handled the request
channel  //web, mobile, call center

The query requirement  is  "where executedby = ?”.
Will be using Stratio’s Cassandra Lucene Index to support querying/filtering.


Thanks!
--
IPVP