Nodejs connector high latency

2018-11-04 Thread Tarun Chabarwal
Hi

I used cassandra driver provided by datastax (3.5.0) library in nodejs.
I've 5 nodes cluster. I'm writing to a table with quorum.

I observed that there is some spike in write. In ~20 writes, 2-5 writes are
taking longer(~200ms). I debugged one of the node process with strace and
found that longer latencies are batched and they use same fd to connect to
cassandra. This may be the multiplexing.

Why it takes that long ?
Where should I look to resolve it?

Regards
Tarun Chabarwal


Re: Nodejs connector high latency

2018-11-04 Thread Andy Tolbert
Hi Tarun,

There are a ton of factors that can impact query performance.

The cassandra native protocol supports multiple simultaneous requests per
connection.  Most drivers by default only create one connection to each C*
host in the local data center.  That being said, that shouldn't be a
problem, particularly if you are only executing 20 concurrent requests,
this is something both driver clients and C* handles well.  The driver does
do some write batching to reduce the amount of system calls, but I'm
reasonably confident this is not an issue.

It may be worth enabling client logging
 to see if that shines
any light.   You can also enable tracing on your requests by specifying
traceQuery

as a query option (example
)
to see if the delay is caused by C*-side processing.

Also keep in mind that all user code in node.js is handled in a single
thread.  If you have callbacks tied to your responses that do non-trivial
work, that can delay subsequent requests from being processed, which may
give impression that some queries are slow.

Thanks,
Andy

On Sun, Nov 4, 2018 at 8:59 AM Tarun Chabarwal 
wrote:

> Hi
>
> I used cassandra driver provided by datastax (3.5.0) library in nodejs.
> I've 5 nodes cluster. I'm writing to a table with quorum.
>
> I observed that there is some spike in write. In ~20 writes, 2-5 writes
> are taking longer(~200ms). I debugged one of the node process with strace
> and found that longer latencies are batched and they use same fd to connect
> to cassandra. This may be the multiplexing.
>
> Why it takes that long ?
> Where should I look to resolve it?
>
> Regards
> Tarun Chabarwal
>


data modeling appointment scheduling

2018-11-04 Thread I PVP
Could you please provide advice on the modeling approach for the following   
appointment scheduling scenario?

I am struggling to model in an way that allows to satisfy the requirement to be 
able to update an appointment, specially to be able to change the start 
datetime and consequently the bucket.

Queries/requirements:
1)The ability to select all appointments by invitee and by date range on the 
start date

2)The ability to select all appointments by organizer and by date range on the 
start date

3)The ability to update(date, location, status) of an specific appointment.

4)The ability to delete an specific appointment

Note: The bucket column is intended to allow date querying and to help spread 
data evenly around the cluster. The bucket value is composed by year+month+day 
sample bucket value: 20181104 )


CREATE TABLE appointment_by_invitee(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((inviteeid, bucket), objectid)
);

CREATE TABLE appointment_by_organizer(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((organizerid, bucket), objectid)
);


Any help will be appreciated.

Thanks

IPVP




Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
Maybe I’m missing something, but it seems to me that the bucket might be a
little overkill for a scheduling system. Do you expect people to have
millions of appointments?

On Sun, Nov 4, 2018 at 12:46 PM I PVP  wrote:

> Could you please provide advice on the modeling approach for the following
>   appointment scheduling scenario?
>
> I am struggling to model in an way that allows to satisfy the requirement
> to be able to update an appointment, specially to be able to change the
> start datetime and consequently the bucket.
>
> Queries/requirements:
> 1)The ability to select all appointments by invitee and by date range on
> the start date
>
> 2)The ability to select all appointments by organizer and by date range on
> the start date
>
> 3)The ability to update(date, location, status) of an specific appointment.
>
> 4)The ability to delete an specific appointment
>
> Note: The bucket column is intended to allow date querying and to help
> spread data evenly around the cluster. The bucket value is composed by
> year+month+day sample bucket value: 20181104 )
>
>
> CREATE TABLE appointment_by_invitee(
> objectid timeuuid,
> organizerid timeuuid,
> inviteeid timeuuid,
> bucket bigint,
> status text,
> location text,
> startdatetime timestamp,
> enddatetime timestamp,
> PRIMARY KEY ((inviteeid, bucket), objectid)
> );
>
> CREATE TABLE appointment_by_organizer(
> objectid timeuuid,
> organizerid timeuuid,
> inviteeid timeuuid,
> bucket bigint,
> status text,
> location text,
> startdatetime timestamp,
> enddatetime timestamp,
> PRIMARY KEY ((organizerid, bucket), objectid)
> );
>
>
> Any help will be appreciated.
>
> Thanks
>
> IPVP
>
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: data modeling appointment scheduling

2018-11-04 Thread I PVP
For people(invitee), you are correct. They will not have millions of 
appointments. But, the organizer is a business.. a chain of businesses 
(Franchisor and Franchisees) that together across the country have dozens of 
thousands of appointments per day.

Do you suggest removing the bucket , making the startdatetime clustering key 
and quering against the startdatetime  with > and mailto:j...@jonhaddad.com>) wrote:

Maybe I’m missing something, but it seems to me that the bucket might be a 
little overkill for a scheduling system. Do you expect people to have millions 
of appointments?

On Sun, Nov 4, 2018 at 12:46 PM I PVP 
mailto:i...@hotmail.com>> wrote:
Could you please provide advice on the modeling approach for the following   
appointment scheduling scenario?

I am struggling to model in an way that allows to satisfy the requirement to be 
able to update an appointment, specially to be able to change the start 
datetime and consequently the bucket.

Queries/requirements:
1)The ability to select all appointments by invitee and by date range on the 
start date

2)The ability to select all appointments by organizer and by date range on the 
start date

3)The ability to update(date, location, status) of an specific appointment.

4)The ability to delete an specific appointment

Note: The bucket column is intended to allow date querying and to help spread 
data evenly around the cluster. The bucket value is composed by year+month+day 
sample bucket value: 20181104 )


CREATE TABLE appointment_by_invitee(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((inviteeid, bucket), objectid)
);

CREATE TABLE appointment_by_organizer(
objectid timeuuid,
organizerid timeuuid,
inviteeid timeuuid,
bucket bigint,
status text,
location text,
startdatetime timestamp,
enddatetime timestamp,
PRIMARY KEY ((organizerid, bucket), objectid)
);


Any help will be appreciated.

Thanks

IPVP


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
Well, generally speaking I like to understand the problem before trying to
fit a solution.  If you're looking to set up millions of appointments for a
business, that might quality for some amount of partitioning / bucketing.
That said, you might be better off using time based buckets, say monthly or
yearly, and as part of the process consider the worst case scenario for
data size.  Is there a chance that in a given month there will be more than
50MB of data associated with a single account / entity?

If you design the table using the startdatetime as the clustering key,
you'll get your events back in the order they are scheduled, which has
obvious advantages but does come at the cost of increased complexity when
updating the start time.  The short answer is - you can't update it, you
have to delete the record and re-insert it with the updated data (you can't
update a clustering key).

Hope this helps.
Jon



On Sun, Nov 4, 2018 at 2:28 PM I PVP  wrote:

> For people(invitee), you are correct. They will not have millions of
> appointments. But, the organizer is a business.. a chain of businesses
> (Franchisor and Franchisees) that together across the country have dozens
> of thousands of appointments per day.
>
> Do you suggest removing the bucket , making the startdatetime clustering
> key and quering against the startdatetime  with > and 
> Wouldn't still have the issue to be able to update startdatetime  when an
> appointment gets rescheduled ?
>
> thanks.
>
> IPVP
>
> On November 4, 2018 at 7:25:05 PM, Jonathan Haddad (j...@jonhaddad.com)
> wrote:
>
> Maybe I’m missing something, but it seems to me that the bucket might be a
> little overkill for a scheduling system. Do you expect people to have
> millions of appointments?
>
> On Sun, Nov 4, 2018 at 12:46 PM I PVP  wrote:
>
>> Could you please provide advice on the modeling approach for the
>> following   appointment scheduling scenario?
>>
>> I am struggling to model in an way that allows to satisfy the requirement
>> to be able to update an appointment, specially to be able to change the
>> start datetime and consequently the bucket.
>>
>> Queries/requirements:
>> 1)The ability to select all appointments by invitee and by date range on
>> the start date
>>
>> 2)The ability to select all appointments by organizer and by date range
>> on the start date
>>
>> 3)The ability to update(date, location, status) of an specific
>> appointment.
>>
>> 4)The ability to delete an specific appointment
>>
>> Note: The bucket column is intended to allow date querying and to help
>> spread data evenly around the cluster. The bucket value is composed by
>> year+month+day sample bucket value: 20181104 )
>>
>>
>> CREATE TABLE appointment_by_invitee(
>> objectid timeuuid,
>> organizerid timeuuid,
>> inviteeid timeuuid,
>> bucket bigint,
>> status text,
>> location text,
>> startdatetime timestamp,
>> enddatetime timestamp,
>> PRIMARY KEY ((inviteeid, bucket), objectid)
>> );
>>
>> CREATE TABLE appointment_by_organizer(
>> objectid timeuuid,
>> organizerid timeuuid,
>> inviteeid timeuuid,
>> bucket bigint,
>> status text,
>> location text,
>> startdatetime timestamp,
>> enddatetime timestamp,
>> PRIMARY KEY ((organizerid, bucket), objectid)
>> );
>>
>>
>> Any help will be appreciated.
>>
>> Thanks
>>
>> IPVP
>>
>>
>> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: A quick question on unlogged batch

2018-11-04 Thread wxn...@zjqunshuo.com
Hi Onmstester,

Thank you all. Now I understand whether to use batch or asynchronous writes 
really depends on use case. Till now batch writes work for me in a 8 nodes 
cluster with over 500 million requests per day.

> Did you compare the cluster performance including blocked natives, dropped 
> mutations, 95 percentiles, cluster CPU usage, etc  in two scenarios (batch vs 
> single)? 
Although 500M per day is not so much for 8 nodes cluster (if the node spec is 
compliant with datastax recommendations) and async single statements could 
handle it (just demands high CPU on client machine), the impact of such things 
(non compliant batch statements annoying the cluster) would show up after some 
weeks, when suddenly a lot of cluster tasks need to be run simultaneously; one 
or two big compactions are running on most of the nodes, some hinted hand offs 
and cluster could not keep up and starts to became slower and slower. 
The way to prevent it sooner, would be keep the error counters as low as 
possible, things like blocked NTPs, dropped, errors, hinted hinted hand-offs, 
latencies, etc. 

I checked the error counters, so far so good, no blocked natives, and no 
dropped mutations. I use TWCS to avoid big compactions. My node has 4 cores and 
8G memory. In fact when I started to use Cassandra, I didn't know the 
difference between batch and asynchronous writes and my choosing batch write is 
coming from the experience of using Redis batch to avoid netwark overhead. I 
got your point and will keep an eye on the error counters stuff. Thank you.

Cheers,
-Simon



 


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-04 Thread Oleksandr Shulgin
On Sat, Nov 3, 2018 at 1:13 AM Brian Spindler 
wrote:

> That wasn't horrible at all.  After testing, provided all goes well I can
> submit this back to the main TWCS repo if you think it's worth it.
>
> Either way do you mind just reviewing briefly for obvious mistakes?
>
>
> https://github.com/bspindler/twcs/commit/7ba388dbf41b1c9dc1b70661ad69273b258139da
>

About almost a year ago we were migrating from 2.1 to 3.0 and we figured
out that Jeff's master branch didn't compile with 3.0, but the change to
get it running was really minimal:
https://github.com/a1exsh/twcs/commit/10ee91c6f409aa249c8d439f7670d8b997ab0869

So we built that jar, added it to the packaged 3.0 and we were good to go.
Maybe you might want to consider migrating in two steps: 2.1 -> 3.0, ALTER
TABLE, upgradesstables, 3.0 -> 3.1.

And huge thanks to Jeff for coming up with TWCS in the first place! :-)

Cheers,
--
Alex