Re: MemtableReclaimMemory pending building up

2016-03-08 Thread Dan Kinder
Quick follow-up here, so far I've had these nodes stable for about 2 days
now with the following (still mysterious) solution: *increase*
memtable_heap_space_in_mb
to 20GB. This was having issues at the default value of 1/4 heap (12GB in
my case, I misspoke earlier and said 16GB). Upping it to 20GB seems to have
made the issue go away so far.

Best guess now is that it simply was memtable flush throughput. Playing
with memtable_cleanup_threshold further may have also helped but I didn't
want to create small SSTables.

Thanks again for the input @Alain.

On Fri, Mar 4, 2016 at 4:53 PM, Dan Kinder  wrote:

> Hi thanks for responding Alain. Going to provide more info inline.
>
> However a small update that is probably relevant: while the node was in
> this state (MemtableReclaimMemory building up), since this cluster is not
> serving live traffic I temporarily turned off ALL client traffic, and the
> node still never recovered, MemtableReclaimMemory never went down. Seems
> like there is one thread doing this reclaiming and it has gotten stuck
> somehow.
>
> Will let you know when I have more results from experimenting... but
> again, merci
>
> On Thu, Mar 3, 2016 at 2:32 AM, Alain RODRIGUEZ 
> wrote:
>
>> Hi Dan,
>>
>> I'll try to go through all the elements:
>>
>> seeing this odd behavior happen, seemingly to single nodes at a time
>>
>>
>> Is that one node at the time or always on the same node. Do you consider
>> your data model if fairly, evenly distributed ?
>>
>
> of 6 nodes, 2 of them seem to be the recurring culprits. Could be related
> to a particular data partition.
>
>
>>
>> The node starts to take more and more memory (instance has 48GB memory on
>>> G1GC)
>>
>>
>> Do you use 48 GB heap size or is that the total amount of memory in the
>> node ? Could we have your JVM settings (GC and heap sizes), also memtable
>> size and type (off heap?) and the amount of available memory ?
>>
>
> Machine spec: 24 virtual cores, 64GB memory, 12 HDD JBOD (yes an absurd
> number of disks, not my choice)
>
> memtable_heap_space_in_mb: 10240 # 10GB (previously left as default which
> was 16GB and caused the issue more frequently)
> memtable_allocation_type: heap_buffers
> memtable_flush_writers: 12
>
> MAX_HEAP_SIZE="48G"
> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
> JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>
>>
>> Note that there is a decent number of compactions going on as well but
>>> that is expected on these nodes and this particular one is catching up from
>>> a high volume of writes
>>>
>>
>> Are the *concurrent_compactors* correctly throttled (about 8 with good
>> machines) and the *compaction_throughput_mb_per_sec* high enough to cope
>> with what is thrown at the node ? Using SSD I often see the latter
>> unthrottled (using 0 value), but I would try small increments first.
>>
> concurrent_compactors: 12
> compaction_throughput_mb_per_sec: 0
>
>>
>> Also interestingly, neither CPU nor disk utilization are pegged while
>>> this is going on
>>>
>>
>> First thing is making sure your memory management is fine. Having
>> information about the JVM and memory usage globally would help. Then, if
>> you are not fully using the resources you might want to try increasing the
>> number of *concurrent_writes* to a higher value (probably a way higher,
>> given the pending requests, but go safely, incrementally, first on a canary
>> node) and monitor tpstats + resources. Hope this will help Mutation pending
>> going down. My guess is that pending requests are messing with the JVM, but
>> it could be the exact contrary as well.
>>
> concurrent_writes: 192
> It may be worth noting that the main reads going on are large batch reads,
> while these writes are happening (akin to analytics jobs).
>
> I'm going to look into JVM use a bit more but otherwise it seems like
> normal Young generation GCs are happening even as this problem surfaces.
>
>
>>
>> Native-Transport-Requests25 0  547935519 0
>>> 2586907
>>
>>
>> About Native requests being blocked, you can probably mitigate things by
>> increasing the native_transport_max_threads: 128 (try to double it and
>> continue tuning incrementally). Also, an up to date client, using the
>> Native protocol V3 handles a lot better connections / threads from clients.
>> Having an heavy throughput like yours, you might want to give this a try.
>>
>
> This one is a good idea and I'll probably try increasing it, but I don't
> really see these back up so.
>
>
>>
>> What is your current client ?
>> What does "netstat -an | grep -e 9042 -e 9160 | grep ESTABLISHED | wc -l"
>> outputs ? This is the number of clients connected to the node.
>> Do you have other significant errors or warning in the logs (other than
>> dropped mutations)? "grep -i -e "ERROR" 

Re: nulls in prepared statement & tombstones?

2016-03-08 Thread Henry M
Thank you. It's probably not specific to prepared statements then and just
a more general statement. That makes sense.


On Tue, Mar 8, 2016 at 10:06 AM Steve Robenalt 
wrote:

> Hi Henry,
>
> I would suspect that the tombstones are necessary to overwrite any
> previous values in the null'd columns. Since Cassandra avoids
> read-before-write, there's no way to be sure that the nulls were not
> intended to remove any such previous values, so the tombstones insure that
> they don't re-appear.
>
> Steve
>
>
>
> On Tue, Mar 8, 2016 at 9:36 AM, Henry Manasseh 
> wrote:
>
>> The following article makes the following statement which I am trying to
>> understand:
>>
>> *"Cassandra’s storage engine is optimized to avoid storing unnecessary
>> empty columns, but when using prepared statements those parameters that are
>> not provided result in null values being passed to Cassandra (and thus
>> tombstones being stored)." *
>> http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
>>
>> I was wondering if someone could help explain why sending nulls as part
>> of a prepared statement update would result in tombstones.
>>
>> Thank you,
>> - Henry
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>


Re: nulls in prepared statement & tombstones?

2016-03-08 Thread Steve Robenalt
Hi Henry,

I would suspect that the tombstones are necessary to overwrite any previous
values in the null'd columns. Since Cassandra avoids read-before-write,
there's no way to be sure that the nulls were not intended to remove any
such previous values, so the tombstones insure that they don't re-appear.

Steve



On Tue, Mar 8, 2016 at 9:36 AM, Henry Manasseh 
wrote:

> The following article makes the following statement which I am trying to
> understand:
>
> *"Cassandra’s storage engine is optimized to avoid storing unnecessary
> empty columns, but when using prepared statements those parameters that are
> not provided result in null values being passed to Cassandra (and thus
> tombstones being stored)." *
> http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
>
> I was wondering if someone could help explain why sending nulls as part of
> a prepared statement update would result in tombstones.
>
> Thank you,
> - Henry
>



-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication


nulls in prepared statement & tombstones?

2016-03-08 Thread Henry Manasseh
The following article makes the following statement which I am trying to
understand:

*"Cassandra’s storage engine is optimized to avoid storing unnecessary
empty columns, but when using prepared statements those parameters that are
not provided result in null values being passed to Cassandra (and thus
tombstones being stored)." *
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra

I was wondering if someone could help explain why sending nulls as part of
a prepared statement update would result in tombstones.

Thank you,
- Henry


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Ok, I just realized the parameter should not be called ":limit" :-)

Also I upgraded my Java Driver from 2.1.6 to 2.1.9.

Both, TTL and limit, work fine now. Sorry again for the confusion.

cheers,
Christian


On Tue, Mar 8, 2016 at 3:19 PM, horschi  wrote:

> Oh, I just realized I made a mistake with the TTL query:
>
> The TTL has to be specified before the set. Like this:
> update mytable using ttl :timetolive set data=:data where ts=:ts and
> randkey=:randkey
>
> And this of course works nicely. Sorry for the confusion.
>
>
> Nevertheless, I don't think this is the issue with my "select ... limit"
> querys. But I will verify this and also try the workaround.
>
>
>
> On Tue, Mar 8, 2016 at 3:08 PM, horschi  wrote:
>
>> Hi Nick,
>>
>> I will try your workaround. Thanks a lot.
>>
>> I was not expecting the Java-Driver to have a bug, because in the Jira
>> Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
>> do to support it :-)
>>
>> kind regards,
>> Christian
>>
>> On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson <
>> nicholas.wil...@realvnc.com> wrote:
>>
>>> Hi Christian,
>>>
>>>
>>> I ran into this problem last month; after some chasing I thought it was
>>> possibly a bug in the Datastax driver, which I'm also using. The CQL
>>> protocol itself supports dynamic TTLs fine.
>>>
>>>
>>> One workaround that seems to work is to use an unnamed bind marker for
>>> the TTL ('?') and then set it using the "[ttl]" reserved name as the bind
>>> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
>>> in the bound statement.
>>>
>>>
>>> Best,
>>>
>>> Nick​
>>>
>>>
>>> --
>>> *From:* horschi 
>>> *Sent:* 08 March 2016 13:52
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>>>
>>> Hi,
>>>
>>> according to CASSANDRA-4450
>>>  it should be
>>> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>>>
>>> Query:
>>> update mytable set data=:data where ts=:ts and randkey=:randkey using
>>> ttl :timetolive
>>>
>>> Exception:
>>> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
>>> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
>>> at
>>> com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>>>
>>> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
>>> see this, even though the Jira ticket states fixVersion 2.0.
>>>
>>> Has anyone used this successfully? Am I doing something wrong or is
>>> there still a bug?
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>> Tickets:
>>> https://datastax-oss.atlassian.net/browse/JAVA-54
>>> https://issues.apache.org/jira/browse/CASSANDRA-4450
>>>
>>>
>>>
>>
>


[RELEASE] Apache Cassandra 3.4 released

2016-03-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a feature release[1] on the 3.4 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/l61Mvd (CHANGES.txt)
[2]: http://goo.gl/hIamQh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.4 released

2016-03-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/i27IR3 (CHANGES.txt)
[2]: http://goo.gl/8Fy3pe (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Oh, I just realized I made a mistake with the TTL query:

The TTL has to be specified before the set. Like this:
update mytable using ttl :timetolive set data=:data where ts=:ts and
randkey=:randkey

And this of course works nicely. Sorry for the confusion.


Nevertheless, I don't think this is the issue with my "select ... limit"
querys. But I will verify this and also try the workaround.



On Tue, Mar 8, 2016 at 3:08 PM, horschi  wrote:

> Hi Nick,
>
> I will try your workaround. Thanks a lot.
>
> I was not expecting the Java-Driver to have a bug, because in the Jira
> Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
> do to support it :-)
>
> kind regards,
> Christian
>
> On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson <
> nicholas.wil...@realvnc.com> wrote:
>
>> Hi Christian,
>>
>>
>> I ran into this problem last month; after some chasing I thought it was
>> possibly a bug in the Datastax driver, which I'm also using. The CQL
>> protocol itself supports dynamic TTLs fine.
>>
>>
>> One workaround that seems to work is to use an unnamed bind marker for
>> the TTL ('?') and then set it using the "[ttl]" reserved name as the bind
>> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
>> in the bound statement.
>>
>>
>> Best,
>>
>> Nick​
>>
>>
>> --
>> *From:* horschi 
>> *Sent:* 08 March 2016 13:52
>> *To:* user@cassandra.apache.org
>> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>>
>> Hi,
>>
>> according to CASSANDRA-4450
>>  it should be
>> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>>
>> Query:
>> update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
>> :timetolive
>>
>> Exception:
>> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
>> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
>> at
>> com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>>
>> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
>> see this, even though the Jira ticket states fixVersion 2.0.
>>
>> Has anyone used this successfully? Am I doing something wrong or is there
>> still a bug?
>>
>> kind regards,
>> Christian
>>
>>
>> Tickets:
>> https://datastax-oss.atlassian.net/browse/JAVA-54
>> https://issues.apache.org/jira/browse/CASSANDRA-4450
>>
>>
>>
>


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Hi Nick,

I will try your workaround. Thanks a lot.

I was not expecting the Java-Driver to have a bug, because in the Jira
Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
do to support it :-)

kind regards,
Christian

On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson  wrote:

> Hi Christian,
>
>
> I ran into this problem last month; after some chasing I thought it was
> possibly a bug in the Datastax driver, which I'm also using. The CQL
> protocol itself supports dynamic TTLs fine.
>
>
> One workaround that seems to work is to use an unnamed bind marker for the
> TTL ('?') and then set it using the "[ttl]" reserved name as the bind
> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
> in the bound statement.
>
>
> Best,
>
> Nick​
>
>
> --
> *From:* horschi 
> *Sent:* 08 March 2016 13:52
> *To:* user@cassandra.apache.org
> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>
> Hi,
>
> according to CASSANDRA-4450
>  it should be
> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>
> Query:
> update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
> :timetolive
>
> Exception:
> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
> at com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>
> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
> see this, even though the Jira ticket states fixVersion 2.0.
>
> Has anyone used this successfully? Am I doing something wrong or is there
> still a bug?
>
> kind regards,
> Christian
>
>
> Tickets:
> https://datastax-oss.atlassian.net/browse/JAVA-54
> https://issues.apache.org/jira/browse/CASSANDRA-4450
>
>
>


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread Nicholas Wilson
Hi Christian,


I ran into this problem last month; after some chasing I thought it was 
possibly a bug in the Datastax driver, which I'm also using. The CQL protocol 
itself supports dynamic TTLs fine.


One workaround that seems to work is to use an unnamed bind marker for the TTL 
('?') and then set it using the "[ttl]" reserved name as the bind marker name 
('setLong("[ttl]", myTtl)'), which will set the correct field in the bound 
statement.


Best,

Nick?



From: horschi 
Sent: 08 March 2016 13:52
To: user@cassandra.apache.org
Subject: Dynamic TTLs / limits still not working in 2.2 ?

Hi,

according to 
CASSANDRA-4450 it should 
be fixed, but I still can't use dynamic TTLs or limits in my CQL queries.

Query:
update mytable set data=:data where ts=:ts and randkey=:randkey using ttl 
:timetolive

Exception:
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138 missing 
EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:100)

I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still see 
this, even though the Jira ticket states fixVersion 2.0.

Has anyone used this successfully? Am I doing something wrong or is there still 
a bug?

kind regards,
Christian


Tickets:
https://datastax-oss.atlassian.net/browse/JAVA-54
https://issues.apache.org/jira/browse/CASSANDRA-4450




Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Hi,

according to CASSANDRA-4450
 it should be fixed,
but I still can't use dynamic TTLs or limits in my CQL queries.

Query:
update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
:timetolive

Exception:
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:100)

I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still see
this, even though the Jira ticket states fixVersion 2.0.

Has anyone used this successfully? Am I doing something wrong or is there
still a bug?

kind regards,
Christian


Tickets:
https://datastax-oss.atlassian.net/browse/JAVA-54
https://issues.apache.org/jira/browse/CASSANDRA-4450


Cassandra-stress output

2016-03-08 Thread Jean Carlo
Hi guys,

I use cassandra stress to populate the next table

CREATE TABLE cf1 (
kvalue text,
ktype text,
prov text,
dname text,
dattrib blob,
dvalue text,
PRIMARY KEY (kvalue, ktype, prov, dname)
  ) WITH bloom_filter_fp_chance = 0.01
 AND caching = '{"keys":"ALL", "rows_per_partition":"60"}'
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'}
AND dclocal_read_repair_chance = 0.02
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.01
AND speculative_retry = '99.0PERCENTILE';

And cassandra stress create the next string to the field kvalue of type
text:

"P*d,xY\x03m\x1b\x10\x0b$\x04pt-G\x08\n`7\x1fs\x15kH\x02i1\x16jf%YM"

what bothers me is that kvalue has control characters like \x03. do you
guys know any way to avoid creating this kind of characters while using
cassandra-stress?



Thank you very much

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


If a cluster column has the same value over many rows, is the value duplicated in memory?

2016-03-08 Thread X. F. Li

Hi,

If a cluster column has the same value over many rows, is the value 
duplicated in memory?


Suppose we have

  create table t (
c1 int,
c2 text,
c3 int,
primary key(c1,c2,c3)
  );

and we have N rows with the same value for (c1,c2) and different values 
for c3. When the rows are loaded into memory cache, is the value of c2 
duplicated N times in memory? Or cassandra treat (c1,c2,c3) like a 
multi-level map and only have 1 copy for the value?


Thanks.

X. F. Li