Re: data modeling appointment scheduling

2018-11-04 Thread I PVP
For people(invitee), you are correct. They will not have millions of 
appointments. But, the organizer is a business.. a chain of businesses 
(Franchisor and Franchisees) that together across the country have dozens of 
thousands of appointments per day.

Do you suggest removing the bucket , making the startdatetime clustering key 
and quering against the startdatetime  with > and mailto:j...@jonhaddad.com>) wrote:

Maybe I’m missing something, but it seems to me that the bucket might be a 
little overkill for a scheduling system. Do you expect people to have millions 
of appointments?

On Sun, Nov 4, 2018 at 12:46 PM I PVP 
mailto:i...@hotmail.com>> wrote:
Could you please provide advice on the modeling approach for the following   
appointment scheduling scenario?

I am struggling to model in an way that allows to satisfy the requirement to be 
able to update an appointment, specially to be able to change the start 
datetime and consequently the bucket.

Queries/requirements:
1)The ability to select all appointments by invitee and by date range on the 
start date

2)The ability to select all appointments by organizer and by date range on the 
start date

3)The ability to update(date, location, status) of an specific appointment.

4)The ability to delete an specific appointment

Note: The bucket column is intended to allow date querying and to help spread 
data evenly around the cluster. The bucket value is composed by year+month+day 
sample bucket value: 20181104 )


CREATE TABLE appointment_by_invitee(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((inviteeid, bucket), objectid)
);

CREATE TABLE appointment_by_organizer(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((organizerid, bucket), objectid)
);


Any help will be appreciated.

Thanks

IPVP


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


data modeling appointment scheduling

2018-11-04 Thread I PVP
Could you please provide advice on the modeling approach for the following   
appointment scheduling scenario?

I am struggling to model in an way that allows to satisfy the requirement to be 
able to update an appointment, specially to be able to change the start 
datetime and consequently the bucket.

Queries/requirements:
1)The ability to select all appointments by invitee and by date range on the 
start date

2)The ability to select all appointments by organizer and by date range on the 
start date

3)The ability to update(date, location, status) of an specific appointment.

4)The ability to delete an specific appointment

Note: The bucket column is intended to allow date querying and to help spread 
data evenly around the cluster. The bucket value is composed by year+month+day 
sample bucket value: 20181104 )


CREATE TABLE appointment_by_invitee(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((inviteeid, bucket), objectid)
);

CREATE TABLE appointment_by_organizer(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((organizerid, bucket), objectid)
);


Any help will be appreciated.

Thanks

IPVP




Re: Working With Prepared Statements

2017-08-29 Thread I PVP
Hi Shalom,

If that helps..
From the documentation : "You should prepare only once, and cache the 
PreparedStatement in your application (it is thread-safe). If you call prepare 
multiple times with the same query string, the driver will log a warning”



On August 29, 2017 at 12:04:34 PM, Shalom Sagges 
(shal...@liveperson.com) wrote:

That's a good to know post.

Thanks for the info Nicolas!


[https://signature.s3.amazonaws.com/2015/lp_logo.png]
Shalom Sagges
DBA
T: +972-74-700-4035
[https://signature.s3.amazonaws.com/2015/LinkedIn.png]
  [https://signature.s3.amazonaws.com/2015/Twitter.png] 
   
[https://signature.s3.amazonaws.com/2015/Facebook.png] 

We Create Meaningful Connections

[https://signature.s3.amazonaws.com/2016/banners/V1_3_email_signature.png]


On Tue, Aug 29, 2017 at 3:04 PM, Nicolas Guyomar 
> wrote:
I would suggest to read this post by the last pickle: 
http://thelastpickle.com/blog/2016/09/15/Null-bindings-on-prepared-statements-and-undesired-tombstone-creation.html
  and make sure you are not concerned by the mentioned behavior, because some 
people still choose to use C* v2.X because of some bugs in v3 branch

Have fun !

On 29 August 2017 at 13:43, Shalom Sagges 
> wrote:
Sounds great then.

Thanks a lot guys! 


[https://signature.s3.amazonaws.com/2015/lp_logo.png]
Shalom Sagges
DBA

[https://signature.s3.amazonaws.com/2015/LinkedIn.png]
  [https://signature.s3.amazonaws.com/2015/Twitter.png] 
   
[https://signature.s3.amazonaws.com/2015/Facebook.png] 

We Create Meaningful Connections





On Tue, Aug 29, 2017 at 2:41 PM, Nicolas Guyomar 
> wrote:
Hi Shalom,

AFAIK, you are completely safe with prepared statement, there are no caveats 
using them, and you will have better performance.

Make sure to only prepare them once ;)

On 29 August 2017 at 13:41, Matija Gobec 
> wrote:
I don't see any disadvantages or warning signs. You will see a performance 
increase on moderate request rate frequency.

On Tue, Aug 29, 2017 at 1:28 PM, Shalom Sagges 
> wrote:
Hi Matija,

I just wish to know if there are any disadvantages when using prepared 
statement or any warning signs I should look for. Queries will run multiple 
times so it fits the use case.

Thanks!


[https://signature.s3.amazonaws.com/2015/lp_logo.png]
Shalom Sagges
DBA

[https://signature.s3.amazonaws.com/2015/LinkedIn.png]
  [https://signature.s3.amazonaws.com/2015/Twitter.png] 
   
[https://signature.s3.amazonaws.com/2015/Facebook.png] 

We Create Meaningful Connections





On Tue, Aug 29, 2017 at 2:18 PM, Matija Gobec 
> wrote:
Do you have any concrete questions re prepared statements?

They are faster to execute since the statement is already parsed and in C* and 
you just pass the parameters. No additional statement processing is needed.

Matija

On Tue, Aug 29, 2017 at 12:33 PM, Shalom Sagges 
> wrote:
Insights, anyone?


[https://signature.s3.amazonaws.com/2015/lp_logo.png]
Shalom Sagges
DBA

[https://signature.s3.amazonaws.com/2015/LinkedIn.png]
  [https://signature.s3.amazonaws.com/2015/Twitter.png] 
   
[https://signature.s3.amazonaws.com/2015/Facebook.png] 

We Create Meaningful Connections





On Mon, Aug 28, 2017 at 10:43 AM, Shalom Sagges 
> wrote:
Hi Everyone,

I want to start working with Prepared Statements.

I've read 
https://docs.datastax.com/en/developer/java-driver/3.1/manual/statements/prepared/
 and just wanted to know if there are any other considerations I need to take 
into account when deciding to use Prepared Statements.

Thanks!


[https://signature.s3.amazonaws.com/2015/lp_logo.png]
Shalom Sagges
DBA

[https://signature.s3.amazonaws.com/2015/LinkedIn.png]
  [https://signature.s3.amazonaws.com/2015/Twitter.png] 
   
[https://signature.s3.amazonaws.com/2015/Facebook.png] 

We Create Meaningful Connections






This message may contain confidential and/or privileged information.
If you are not the addressee or 

Cassandra + Stratio’s Cassandra Lucene Index for online quering

2017-07-28 Thread I PVP
Considering that Stratio’s Cassandra Lucene Index,  "..is a plugin for Apache 
Cassandra that extends its index functionality to provide near real time search 
such as ElasticSearch or Solr..".

Can I use Stratio’s Cassandra Lucene Index to build a online reporting 
functionality within my application ?

Is anyone doing  something like that ?

thanks

IPVP





Re: /etc/init.d/cassandra for RHEL

2017-03-29 Thread I PVP
This one worked fine
https://blog.tomas.cat/sites/default/files/cassandra.initd





On March 29, 2017 at 2:17:17 AM, I PVP 
(i...@hotmail.com<mailto:i...@hotmail.com>) wrote:

i tried using the 
https://github.com/apache/cassandra/blob/trunk/redhat/cassandra
won’t start using my own values and it is not generating logs  at all.

had you ever saw this behavior before?

JAVA_HOME is properly set
the user and group cassandra exist
cassandra:cassandra  is the owner of the /opt/cassandra/*

….
export CASSANDRA_HOME=/opt/cassandra
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_INCLUDE=/opt/cassandra/bin/cassandra.in.sh
export CASSANDRA_OWNR=cassandra
NAME="cassandra"
log_file=/opt/cassandra/log/cassandra.log
pid_file=/opt/cassandra/cassandra.pid
lock_file=/opt/cassandra/cassandra.lock
CASSANDRA_PROG=/opt/cassandra/bin/cassandra
….

it starts only by doing  "sudo /opt/cassandra/bin/cassandra -fR”


thanks for your help

IPVP


On March 29, 2017 at 12:57:50 AM, Michael Shuler 
(mich...@pbandjelly.org<mailto:mich...@pbandjelly.org>) wrote:

On 03/28/2017 10:53 PM, Michael Shuler wrote:
> On 03/28/2017 09:36 PM, I PVP wrote:
>> any recommend /etc/init.d/cassandra for RHEL when installing it from
>> apache-cassandra-3.10-bin.tar.gz ?
>>
>> i goggled it but could not find anything that seems to be "proven"
>>
>
> https://github.com/apache/cassandra/blob/trunk/redhat/cassandra
>

That was probably a little too brief. The paths will need to be
modified, if you are running from a tarball. Easier would be to build
the rpm packages and install - see the README.md in the redhat directory:

https://github.com/apache/cassandra/tree/trunk/redhat

Check out the release tag you want, `git checkout cassandra-3.10` for
example, and build the rpms. This will install the init for your and put
all the pieces in the expected places.

--
Kind regards,
Michael


Re: /etc/init.d/cassandra for RHEL

2017-03-28 Thread I PVP
i tried using the 
https://github.com/apache/cassandra/blob/trunk/redhat/cassandra
won’t start using my own values and it is not generating logs  at all.

had you ever saw this behavior before?

JAVA_HOME is properly set
the user and group cassandra exist
cassandra:cassandra  is the owner of the /opt/cassandra/*

….
export CASSANDRA_HOME=/opt/cassandra
export CASSANDRA_CONF=/opt/cassandra/conf
export CASSANDRA_INCLUDE=/opt/cassandra/bin/cassandra.in.sh
export CASSANDRA_OWNR=cassandra
NAME="cassandra"
log_file=/opt/cassandra/log/cassandra.log
pid_file=/opt/cassandra/cassandra.pid
lock_file=/opt/cassandra/cassandra.lock
CASSANDRA_PROG=/opt/cassandra/bin/cassandra
….

it starts only by doing  "sudo /opt/cassandra/bin/cassandra -fR”


thanks for your help

IPVP


On March 29, 2017 at 12:57:50 AM, Michael Shuler 
(mich...@pbandjelly.org<mailto:mich...@pbandjelly.org>) wrote:

On 03/28/2017 10:53 PM, Michael Shuler wrote:
> On 03/28/2017 09:36 PM, I PVP wrote:
>> any recommend /etc/init.d/cassandra for RHEL when installing it from
>> apache-cassandra-3.10-bin.tar.gz ?
>>
>> i goggled it but could not find anything that seems to be "proven"
>>
>
> https://github.com/apache/cassandra/blob/trunk/redhat/cassandra
>

That was probably a little too brief. The paths will need to be
modified, if you are running from a tarball. Easier would be to build
the rpm packages and install - see the README.md in the redhat directory:

https://github.com/apache/cassandra/tree/trunk/redhat

Check out the release tag you want, `git checkout cassandra-3.10` for
example, and build the rpms. This will install the init for your and put
all the pieces in the expected places.

--
Kind regards,
Michael


/etc/init.d/cassandra for RHEL

2017-03-28 Thread I PVP
any recommend /etc/init.d/cassandra for RHEL when installing it from 
apache-cassandra-3.10-bin.tar.gz ?

i goggled it but could not find anything that seems to be "proven"



Re: Modeling Audit Trail on Cassandra

2016-03-18 Thread I PVP
Jack/Tom
Thanks for answering.

Here is the table definition so far:

CREATE TABLE audit_trail (
auditid timeuuid,
actiontype text,
objecttype text,
executedby uuid ( or timeuuid?),
executedat timestamp,
objectbefore text,
objectafter text,
clientipaddr text,
serveripaddr text,
servername text,
channel text,
PRIMARY KEY (auditid)
);

objectbefore/after are the only ones that will have JSON content. quering based 
on the contents of these two  columns are not a requirement.

At this moment the queries are going to be mainly on executedby ( the employee 
id).
Stratio’s Cassandra Lucene Index will be used to allow querying/filtering on 
executedat (timestamp) ,objecttype(order, customer, ticket, message,account, 
paymenttransaction,refund etc.)  and actiontype(create, retrieve, update, 
delete, approve, activate, unlock, lock etc.) .

I am considering to count exclusively on Stratio’s Cassandra Lucene  filtering 
and avoid to add  “period” columns like month(int), year(int), day (int).

Thanks

--
IPVP


From: Jack Krupansky 
Reply: user@cassandra.apache.org 
>
Date: March 16, 2016 at 5:22:36 PM
To: user@cassandra.apache.org 
>
Subject:  Re: Modeling Audit Trail on Cassandra

executedby is the ID assigned to an employee.

I'm presuming that JSON is to be used for objectbefore/after. This suggests no 
ability to query by individual object fields. I didn't sense any other columns 
that would be JSON.



-- Jack Krupansky

On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
> wrote:
Is text the most appropriate data type to store JSON that contain couple of 
dozen lines ?

It sure is the simplest way to store JSON.

The query requirement  is  "where executedby = ?”.

Since executedby is a timeuuid, I guess you don't want to query a single 
record, since that would require you to know the exact timeuuid. Do you mean 
that you would like to query all changes in a certain time frame, e.g. today? 
In that case, you would have to group your rows in time buckets, e.g. PRIMARY 
KEY ((period), auditid). Period can be a day, month, or any other period that 
suits your situation. Retrieving all changes in a specific time frame is done 
by retrieving all relevant periods.

Tom



Modeling Audit Trail on Cassandra

2016-03-15 Thread I PVP
Hi everyone,

I am looking for your  feedback or advice on modeling an audit trail log table 
on Cassandra that stores information from tracking everything an employee 
changes within the application.

The existing application is being migrated from mysql to Cassandra.

Is text the most appropriate data type to store JSON that contain couple of 
dozen lines ?

CREATE TABLE audit_trail (
auditid timeuuid,
actiontype text,
objecttype text,
executedby timeuuid,
executedat text,
objectbefore text,
objectafter text,
clientipaddr text,
serveripaddr text,
servername text,
channel text,
PRIMARY KEY (auditid)
);

auditid // an UUID of the audit log
actiontype // create, retrieve, update, delete, approve, activate, unlock, lock 
etc.
objecttype // order, customer, ticket, message,account, 
paymenttransaction,refund
executedby // the UUID of the employee
executedat // timestamp when the action happened
objectbefore // the json of the object before the change
objectafter  // the json of the object after the change
clientipaddr // the ip address of the client
serveripaddr // the server ip address that handled the request
servername // the server name that handled the request
channel  //web, mobile, call center

The query requirement  is  "where executedby = ?”.
Will be using Stratio’s Cassandra Lucene Index to support querying/filtering.


Thanks!
--
IPVP



RE: Modeling transactional messages

2016-03-04 Thread I PVP
Thanks for  answering.

Yes, It is mainly a queue, but also has some functionality to allow resend the 
messages.

Does anyone have experience handling this kind of scenario, within (or without) 
Cassandra?

Thanks

--
IPVP


From: sean_r_dur...@homedepot.com 
<sean_r_dur...@homedepot.com><mailto:sean_r_dur...@homedepot.com>
Reply: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: March 4, 2016 at 11:48:56 AM
To: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Subject:  RE: Modeling transactional messages

As you have it, this is not a good model for Cassandra. Your partition key has 
only 2 specific values. You would end up with only 2 partitions (perhaps owned 
by just 2 nodes) that would quickly get huge (and slow). Also, secondary 
indexes are generally a bad idea. You would either want to create new table to 
support additional queries or look at the materialized views in the 3.x 
versions.

You are setting up something like a queue, which is typically an anti-pattern 
for Cassandra.

However, I will at least toss out an idea for the rest of the community to 
improve (or utterly reject):

You could have an unsent mail table and a sent mail table.
For unsent mail, just use the objectID as the partition key. The drivers can 
page through results, though if it gets very large, you might see problems. 
Delete the row from unsent mail once it is sent. Try leveled compaction with a 
short gc_grace. There would be a lot of churn on this table, so it may still be 
less than ideal.

Then you could do the sent email table with objectID and all the email details. 
Add separate lookup tables for:
- (emailaddr), object ID (if this is going to be large/wide, perhaps add a time 
bucket to the partition key, like mm)
- (domain, time bucket), objectID

Set TTL on these rows (either default or with the insert) to get the purge to 
be automatic.


Sean Durity

From: I PVP [mailto:i...@hotmail.com]
Sent: Thursday, March 03, 2016 7:51 PM
To: user@cassandra.apache.org
Subject: Modeling transactional messages

Hi everyone,

Can anyone please let me know if I am heading to an antiparttern or 
somethingelse bad?

How would you model the following ... ?

I am migrating from MYSQL to Cassandra, I have a scenario in which need to 
store the content of "to be sent" transactional email messages that the 
customer will receive on events like : an order was created, an order was 
updated, an order was canceled,an order was  shipped,an account was created, an 
account was confirmed, an account was locked and so on.

On MYSQL there is table for email message "type", like: a table to store 
messages of "order-created”, a table to store messages of "order-updated" and 
so on.

The messages are sent by a non-parallelized java worker, scheduled to run every 
X seconds, that push the messages to a service like Sendgrid/Mandrill/Mailjet.

For better performance, easy to purge and overall code maintenance I am looking 
to have all message "types" on a single table/column family as following:

CREATE TABLE communication.transactional_email (
objectid timeuuid,
subject text,
content text,
fromname text,
fromaddr text,
toname text,
toaddr text,
wassent boolean,
createdate timestamp,
sentdate timestamp,
type text,// example: order_created, order_canceled
domain text, // exaple: hotmail.com. in case need to stop sending to a specific 
domain
PRIMARY KEY (wassent, objectid)
);

create index on toaddr
create index on sentdate
create index on domain
create index on type


The requirements are :

1) select * from transactional_email where was_sent = false and objectid < 
minTimeuuid(current timestamp) limit 

(to get the messages that need to be sent)

2) update transactional_email set was_sent = true where objectid = 

(to update the message  right after it was sent)

3) select * from transactional_email where toaddr = 

(to get all messages that were sent to a specific emailaddr)

4) select * from transactional_email where domain = 

(to get all messages that were sent to a specific domain)

5) delete from transactional_email where was_sent = true and objectid < 
minTimeuuid(a timestamp)

(to do purge, delete all messages send before the last X days)

6) delete from transactional_email where toaddr = 

(to be able to delete all messages when a user account is closed)


Thanks

IPVP



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are su

Modeling transactional messages

2016-03-03 Thread I PVP
Hi everyone,

Can anyone please let me know if I am heading to an antiparttern or 
somethingelse bad?

How would you model the following ... ?

I am migrating from MYSQL to Cassandra, I have a scenario in which need to 
store the content of "to be sent" transactional email messages that the 
customer will receive on events like : an order was created, an order was 
updated, an order was canceled,an order was  shipped,an account was created, an 
account was confirmed, an account was locked and so on.

On MYSQL there is table for email message "type", like: a table to store 
messages of "order-created”, a table to store messages of "order-updated" and 
so on.

The messages are sent by a non-parallelized java worker, scheduled to run every 
X seconds, that push the messages to a service like Sendgrid/Mandrill/Mailjet.

For better performance, easy to purge and overall code maintenance I am looking 
to have all message "types" on a single table/column family as following:

CREATE TABLE communication.transactional_email (
objectid timeuuid,
subject text,
content text,
fromname text,
fromaddr text,
toname text,
toaddr text,
wassent boolean,
createdate timestamp,
sentdate timestamp,
type text,// example: order_created, order_canceled
domain text, // exaple: hotmail.com. in case need to stop sending to a specific 
domain
PRIMARY KEY (wassent, objectid)
);

create index on toaddr
create index on sentdate
create index on domain
create index on type


The requirements are :

1) select * from transactional_email where was_sent = false and objectid < 
minTimeuuid(current timestamp) limit 

(to get the messages that need to be sent)

2) update transactional_email set was_sent = true where objectid = 

(to update the message  right after it was sent)

3) select * from transactional_email where toaddr = 

(to get all messages that were sent to a specific emailaddr)

4) select * from transactional_email where domain = 

(to get all messages that were sent to a specific domain)

5) delete from transactional_email where was_sent = true and objectid < 
minTimeuuid(a timestamp)

(to do purge, delete all messages send before the last X days)

6) delete from transactional_email where toaddr = 

(to be able to delete all messages when a user account is closed)


Thanks

IPVP


Modeling approach to widely used textual information

2016-01-14 Thread I PVP
Hi everyone,

I am new to Cassandra and moving a existing myqql application to Cassandra.

As a generic rule, what is the recommended approach for keeping textual 
information like a user_nickname, a company_name, product_title, that will 
potentially be updated at some time and is routinely and repeatedly displayed 
on many use cases across the application like when the end user: see a employee 
list, sees a contact list, send/receive chat messages, see RFQs, see an order, 
see a shipping provider information/tracking, see rantings and reviews, see 
invites and so on?

Situations that  MVs alone cannot solve because would involve multiple tables.

Options:
-

A) The text information should be copied and update on all CFs/tables, that 
were modeled to answer the many queries for the many use cases across the 
application, every time that the information changes on the "source" CF/table 
(like: user table, product table or company table?

OR

B)Should only the ids (person_id/company_id, product_id) be stored across the 
columns families/tables, and at the front end the "source" column familie/table 
is queried to retrieve that specific text field : the person_name/company_name 
and display it ? ( potentially leveraging REST http caching)

OR

C) Other approaches ?

-

I understand that the proper modeling it crucial and that "writes as cheap", 
but new tables will come sooner or later and changing a previously created 
business logic code every-time that a new CF/Table  is created  is not cheap.
At this moment option B is the most likely. Specially with some use cases 
allowing the data like a user's contact list id/names(only), the id/name(only) 
of the companies that the user is doing business with,to be downloaded to the 
frontend at once and used for a couple of seconds/minutes executing a tasks or 
for other use cases having specific REST Services ( /company/id/name, 
product/id/title) to provide these widely used information/fields and 
potentially leveraging http cache for some time to provide the text data across 
the application.

Any advice and guidance will be appreciated.

Thanks for your help.

--
IPVP




Re: Modeling contact list, plain table or List

2016-01-12 Thread I PVP
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname= ?
Answer : Because a contact name can be duplicated.   Or should I force unique 
contact names?

Overall , the challenge seems to be addressed , with some trade of on the 
"ordering by contact name”.

If, at the base table, the clustering column is the objectid(timeuuid) instead 
of the contactname , the DELETE will be based on userid = ? and objectid = ?.
This works fine. Generic SELECTs will also work fine on the base table.

The MV will serve SELECTs  targeting/filtering the favorite contacts.

Like this:

CREATE TABLE communication.user_contact (
userid int,
objectid timeuuid,
contactid int,
contactname text,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
PRIMARY KEY (userid, objectid)
);


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, objectid, contactid, contactname, createdat, 
favoriteat
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), objectid )
WITH CLUSTERING ORDER BY ( objectid DESC ) ;


Unfortunately  this approach forces the model to cluster by objected(timeuuid) 
just to satisfy the need to DELETE a specific contact row,  and by doing that 
it wastes a opportunity on the MV, because all the PKs from the base table need 
to be at the MV and  it is not possible to set the MV with with more than 1 
non-PK column from the base table as the MV PK.  But still working fine.


That is my first Cassandra use case and the guidance provided by  you guys 
pretty important.

Thanks very much for the answers, questions and suggestions.


--
IPVP


From: DuyHai Doan <doanduy...@gmail.com><mailto:doanduy...@gmail.com>
Reply: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 12, 2016 at 10:27:45 AM
To: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Cc: Jack Krupansky <jack.krupan...@gmail.com>><mailto:jack.krupan...@gmail.com>
Subject:  Re: Modeling contact list, plain table or List

1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname= ?

The Materialized View will be automagically updated by Cassandra with a query 
similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND 
is_favorite=zzz AND contactname=yyy

On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
In general I advise people avoid lists and use Maps or Sets instead.

Using this data model, for instance, it's easy to remove a specific Address 
from a user:

CREATE TYPE address (
  street text,
  city text,
  zip_code int,
);

CREATE TABLE user (
user_id int primary key,
addresses map<text, frozen >
);

When I want to remove one of the addresses from a user, I can do this:

cqlsh:test> delete addresses['home'] from user where user_id =  1;


Hope that helps,
Jon


On Mon, Jan 11, 2016 at 1:20 PM I PVP 
<i...@hotmail.com<mailto:i...@hotmail.com>> wrote:
Well…the way it is now  it is not possible to delete a specific contact row 
from the base table at all. Because a DELETE statement only works with  PK in 
the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE clausule.
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html

The way it is now  It is only possible to delete the entire contact list  for 
that specific user.
Looks like will need to:
1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

Is that the proper way to achieve it or Am I missing some point in the modeling 
that would allow to delete a specific contact row  and still able to comply 
with the select requirements?

Thanks
--
IPVP


From: Jack Krupansky <jack.krupan...@gmail.com><mailto:jack.krupan...@gmail.com>
Reply: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 11, 2016 at 7:00:04 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Subject:  Re: Modeling contact list, plain table or List

That's the beauty of MV - Cassandra automatically updates the MVs when the base 
table changes, including deletions, which is why all of the PK columns from the 
base table needed to be in the MV PK.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 3:41 PM, I PVP 
<i...@hotmail.com<mailto:i...@hotm

Re: Modeling contact list, plain table or List

2016-01-11 Thread I PVP
The below table and materialized view will solve the SELECT requirements of my 
current application .
The challenge now is when the user decides to DELETE one specific contact from 
his contact list. I could add the objectid to a composite partition key 
together with the userid. But that would make the SELECT inviable.

 Any ideas/suggestions?


CREATE TABLE communication.user_contact (
userid int,
contactname text,
contactid int,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
objectid timeuuid,
PRIMARY KEY (userid, contactname)
) WITH CLUSTERING ORDER BY ( contactname DESC )


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat, 
objectid
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), contactname )
WITH CLUSTERING ORDER BY ( contactname DESC )

Thanks

--
IPVP


From: DuyHai Doan 
Reply: user@cassandra.apache.org 
>
Date: January 11, 2016 at 11:14:10 AM
To: user@cassandra.apache.org 
>
Subject:  Re: Modeling contact list, plain table or List

In the current iteration of materialized view, it is still not possible to have 
WHERE clause other than IS NOT NULL so is_favourite IS TRUE won't work.

Still there is a JIRA created to support this feature : 
https://issues.apache.org/jira/browse/CASSANDRA-10368

About cardinality of favorite vs non-favorites, it doesn't matter in 
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif] this case 
because the OP said "Less then one hundred contacts by user is the normal."

So even if all contacts are stuck in one unique favorite state, the 
materialized view partition for one user is at most 100. Even for extreme edge 
case with users having 10 000 contacts, it's still a manageable partition size 
for C*.

But I agree it is important to know before-hand the favorite/non-favorite 
update frequency since it will impact the write throughput on the MV.

For more details on materialized view impl and performance: 
http://www.doanduyhai.com/blog/?p=1930

On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky 
> wrote:
The new Materialized View feature is just an automated way of creating and 
maintaining what people used to call a "query table", which is the traditional 
Cassandra data modeling technique for performing queries on on than the primary 
key for a table - you store the same columns in different tables using 
different columns for the primary key.

One also needs to be careful to include all columns of the original primary key 
in each MV primary key - in addition to whatever column(s) are to be used for 
indexing in each MV (so that Cassandra can find the old row when it needs to 
update the MV when the base table row changes, such as on a deletion.)

But before creating MVs, you first need to answer questions about how the app 
needs to query the data. Even with MV, conceptualizing queries needs to precede 
data modeling.

For example, what is the cardinality of favorites vs. non-favorites, does the 
app even need to query by favorates, as opposed to querying all contacts and 
retrieving is_favorite as simply a non-key column value, whether favorites need 
to be retrieved separately from non-favorites, the frequency and latency 
requirements for query by favorite status, etc. Once these questions are 
answered, decisions can be made about data modeling.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso 
> wrote:
I have never used Materialized Views so maybe this suggestion is not possible, 
but in this case, wouldn't it make sense to define the materialized view as

is_favourite IS TRUE
instead of
is_favourite IS NOT NULL?

Carlos Alonso | Software Engineer | @calonso

On 10 January 2016 at 09:59, DuyHai Doan 
> wrote:
Try this

CREATE TABLE communication.user_contact_list (
  user_id uuid,
  contact_id uuid,
  contact_name text,
  created_at timeuuid,
  is_favorite boolean,
  favorite_at timestamp,
  PRIMARY KEY (user_id, contact_name, contact_id)
);

CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
AS SELECT * FROM communication.user_contact_list
WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)

If the flag is_favorite is not updated very often the write perf hit due to 
materialized view is acceptable.

On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. 
> wrote:
Jack/ Michael,

Thanks for answering.

How big?: Less then one