Re: Removed node is not completely removed

2015-10-15 Thread Tom van den Berge
Thanks Sebastian, a restart solved the problem!


On Wed, Oct 14, 2015 at 3:46 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> We still keep endpoints in memory. Not sure how you git to this state but
> try a rolling restart.
> On Oct 14, 2015 9:43 AM, "Tom van den Berge" 
> wrote:
>
>> Thanks for that Michael, I did not know that. However, the node is not
>> listed in the system.peers table on any node, so it seems that the problem
>> is not in this table.
>>
>>
>>
>> On Wed, Oct 14, 2015 at 3:30 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> Remember that the system keyspace uses LocalStrategy: each node has its
>>> own set of system tables. -ml
>>>
>>> On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge <
>>> tom.vandenbe...@gmail.com> wrote:
>>>
 Hi Carlos,

 I'm using 2.1.6. The mysterious node is not in the peers table. Any
 other ideas?
 One of my existing nodes is not present in the system.peers table,
 though. Should I be worried?

 Regards,
 Tom

 On Wed, Oct 14, 2015 at 2:27 PM, Carlos Rolo  wrote:

> Check system.peers table to see if the IP is still there. If so edit
> the table and remove the offending IP.
>
> You are probably running into this:
> https://issues.apache.org/jira/browse/CASSANDRA-6053
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: 
> *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Wed, Oct 14, 2015 at 12:26 PM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> I have removed a node with nodetool removenode, which completed ok.
>> Nodetool status does not list the node anymore.
>>
>> But since then, Im seeing messages in my other nodes log files
>> referring to the removed node:
>>
>>  INFO [GossipStage:38] 2015-10-14 11:18:26,322 Gossiper.java (line
>> 968) InetAddress /10.68.56.200 is now DOWN
>>  INFO [GossipStage:38] 2015-10-14 11:18:26,324 StorageService.java
>> (line 1891) Removing tokens [85070591730234615865843651857942052863] for 
>> /
>> 10.68.56.200
>>
>>
>> These two messages appear every minute.
>> I've tried nodetool removenode again (Host ID not found) and
>> removenode force (no token removals in process).
>> The jmx unsafeAssassinateEndpoint gives a NullPointerException.
>>
>> What can I do to remove the node entirely?
>>
>>
>>
>
> --
>
>
>
>

>>>
>>


Re: Accessing dynamic columns via cqlsh

2015-10-15 Thread Onur Yalazı

Thank you Eric,

I think we have a limited number of dynamically named columns but I'm 
not inclined to have them added in the schema.


I have just managed to do what I want with the schema below. But It cost 
me my secondary index on eventId. Because eventId is a clustering_key 
it's not yet supported.
(Bad Request: Secondary index on CLUSTERING_KEY column name is not yet 
supported for compact table)


Of course, I'm not sure how this will effect our archaic application so 
I have to test hard before applying this change.





CREATE TABLE "EventKeys" (
  key ascii,
  name ascii,
  value ascii,
  PRIMARY KEY (key,"name")
)

Select * from EventKeys where key='b5f0d4be-c0fc-4dc4-8e38-0a00e4552866';

 key  | name   | value
--++--
 b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 |   actionId | 
080abda2-3623-4a98-a84a-d33b6aecbe99

 b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 | code|  var x = .\n
 b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 | eventId| 
ce3b0c03-dcce-4522-a35a-864909cb024f




On 10/15/2015 06:09 PM, Eric Stevens wrote:
If the columns are not dynamically named (as in "actionId" and "code") 
you should be able to add that to your CQL table definition with ALTER 
TABLE, and those columns should be available in the query results.


If the columns /are/ dynamically named, and you can't reasonably add 
every option to the CQL definition, your job gets a lot harder. If 
you're using composite names, there might be some hope if you happen 
to conform to the same standard as CQL collections (not impossible, 
but probably not super likely).  You can create a test table with one 
of each collection type, insert a record, then look at the Thrift to 
see how those map.


If your dynamically named columns are string concatenation or some 
other custom serialization format, then your only hope is basically a 
data migration from your thrift format to your CQL format.  You should 
be able to accomplish all the same business functionality using CQL, 
but you might not be able to create a CQL schema that maps exactly to 
the data at rest for your historic schema.


On Thu, Oct 15, 2015 at 8:54 AM Onur Yalazı > wrote:


Hello,

I have a cassandra cluster from pre-cql era and I am having problems
accessing data via cqlsh.
As you can see below, I can not reach dynamic columns via cqlsh
but they
are accessible via cassandra-cli.

How can I make the data shown on cqlsh?


cqlsh:automation> select * from "EventKeys" where
key='b5f0d4be-c0fc-4dc4-8e38-0a00e4552866' ;

  key|eventId

--+--
  b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 |
ce3b0c03-dcce-4522-a35a-864909cb024f

(1 rows)


[default@keyspace] get
EventKeys['b5f0d4be-c0fc-4dc4-8e38-0a00e4552866'];
=> (name=actionId, value=3038...64623661656362653939,
timestamp=1431608711629002)
=> (name=code, b0a..0a0a, timestamp=1431608711629003)
=> (name=eventId, value=ce3b0c03-dcce-4522-a35a-864909cb024f,
timestamp=1431608711629000)
Returned 3 results.


ColumnFamily Description:
 ColumnFamily: EventKeys
   Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Default column value validator:
org.apache.cassandra.db.marshal.BytesType
   Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 0.1
   DC Local Read repair chance: 0.0
   Populate IO Cache on flush: false
   Replicate on write: true
   Caching: KEYS_ONLY
   Default time to live: 0
   Bloom Filter FP chance: 0.01
   Index interval: 128
   Speculative Retry: 99.0PERCENTILE
   Built indexes: [EventKeys.eventkeys_eventid_idx]
   Column Metadata:
 Column Name: scenarioId
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Index Name: eventkeys_eventid_idx
   Index Type: KEYS
   Index Options: {}
   Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
   Compaction Strategy Options:
 min_threshold: 4
 max_threshold: 32
   Compression Options:
 sstable_compression:
org.apache.cassandra.io
.compress.SnappyCompressor


CQL Desc of the table:

CREATE TABLE "EventKeys" (
   key ascii,
   "eventId" ascii,
   PRIMARY KEY (key)
) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   

Re: unchecked_tombstone_compaction - query

2015-10-15 Thread Paulo Motta
Hello Deepak,

The dev@cassandra list is exclusive for development announcements and
discussions, so I will reply to users@cassandra as someone else might have
a similar question.

Basically, there is pre-check, that defines which sstables are eligible for
single-sstable tombstone compaction, and an actual check that determines if
a key is present in a single sstable before performing the actual tombstone
removal (otherwise it doesn't do anything).

The pre-check is cheap, and only consider an sstable eligible for tombstone
compaction if the sstable key range does not overlap with any other sstable
key range. The actual check might need to read from disk (=more seeks on
spindles) to confirm if the sstables with range overlap actually contain
the tombstoned key, in order to define if it's safe or not to drop the
tombstone.

In the case of size tiered compaction, it's common for many sstables to
have overlapping ranges, so the tombstone compaction is almost never
triggered, so you will need to wait until compactions organically remove
tomsbtones. The unchecked_tombstone_compaction removes the pre-check for
overlapping ranges, but still performs the check to determine if it's safe
to drop a tombstone. So, it's possible that your I/O will increase if you
enable this property, since more data will need to be read to perform the
actual checks, but otherwise it's safe to use this setting. A good way to
check if the setting is useful is to watch your droppable tombstone ratio
metrics after enabling it.

Cheers,

Paulo

2015-10-14 23:27 GMT-07:00 Deepak Nagaraj :

> Hi Paulo, C* devs,
>
> I have a question on "unchecked_tombstone_compaction" option.  I
> understand that setting this to true prevents a heuristic check on keys
> that span multiple sstables.
>
> But I also read that the heuristic was introduced because not having it
> can cause resurrections (i.e. sstable1 may have data, sstable2 may have
> tombstone, and when we delete the tombstone, deleted data suddenly shows
> up).
>
> So - isn't setting unchecked_tombstone_compaction to "true" a dangerous
> setting?  Won't it cause resurrections?  What is the use case for this
> knob, and when do I know I can set it to true safely?
>
> I've read the source code, Jira 6563, and relevant e-mail threads many
> times but I still don't have a clear understanding.
>
> Thanks in advance,
> -deepak
>
>


RE: Accessing dynamic columns via cqlsh

2015-10-15 Thread Akbar Pirani
I do not think that cqlsh provides a way to get internal data. I hope I am 
wrong...

-Original Message-
From: Onur Yalazı [mailto:onur.yal...@8digits.com]
Sent: Thursday, October 15, 2015 10:54 AM
To: user@cassandra.apache.org
Subject: Accessing dynamic columns via cqlsh

Hello,

I have a cassandra cluster from pre-cql era and I am having problems accessing 
data via cqlsh.
As you can see below, I can not reach dynamic columns via cqlsh but they are 
accessible via cassandra-cli.

How can I make the data shown on cqlsh?


cqlsh:automation> select * from "EventKeys" where 
key='b5f0d4be-c0fc-4dc4-8e38-0a00e4552866' ;

  key|eventId
--+-
--+-
  b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 | ce3b0c03-dcce-4522-a35a-864909cb024f

(1 rows)


[default@keyspace] get EventKeys['b5f0d4be-c0fc-4dc4-8e38-0a00e4552866'];
=> (name=actionId, value=3038...64623661656362653939,
timestamp=1431608711629002)
=> (name=code, b0a..0a0a, timestamp=1431608711629003) => (name=eventId, 
value=ce3b0c03-dcce-4522-a35a-864909cb024f,
timestamp=1431608711629000)
Returned 3 results.


ColumnFamily Description:
 ColumnFamily: EventKeys
   Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Default column value validator:
org.apache.cassandra.db.marshal.BytesType
   Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 0.1
   DC Local Read repair chance: 0.0
   Populate IO Cache on flush: false
   Replicate on write: true
   Caching: KEYS_ONLY
   Default time to live: 0
   Bloom Filter FP chance: 0.01
   Index interval: 128
   Speculative Retry: 99.0PERCENTILE
   Built indexes: [EventKeys.eventkeys_eventid_idx]
   Column Metadata:
 Column Name: scenarioId
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Index Name: eventkeys_eventid_idx
   Index Type: KEYS
   Index Options: {}
   Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
   Compaction Strategy Options:
 min_threshold: 4
 max_threshold: 32
   Compression Options:
 sstable_compression:
org.apache.cassandra.io.compress.SnappyCompressor


CQL Desc of the table:

CREATE TABLE "EventKeys" (
   key ascii,
   "eventId" ascii,
   PRIMARY KEY (key)
) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'min_threshold': '4', 'class':
'SizeTieredCompactionStrategy', 'max_threshold': '32'} AND
   compression={'sstable_compression': 'SnappyCompressor'}; CREATE INDEX 
eventkeys_eventid_idx ON "EventKeys" ("eventId");

This e-mail and any attachments, contain SP Richards confidential information 
that is proprietary, privileged, and protected by applicable laws. If you have 
received this message in error and are not the intended recipient, you should 
not retain, distribute, disclose or use any of this information and you should 
destroy this e-mail, any attachments or copies therein forthwith. Please notify 
the sender immediately by e-mail if you have received this e-mail in error.


Re: Accessing dynamic columns via cqlsh

2015-10-15 Thread Eric Stevens
If the columns are not dynamically named (as in "actionId" and "code") you
should be able to add that to your CQL table definition with ALTER TABLE,
and those columns should be available in the query results.

If the columns *are* dynamically named, and you can't reasonably add every
option to the CQL definition, your job gets a lot harder. If you're using
composite names, there might be some hope if you happen to conform to the
same standard as CQL collections (not impossible, but probably not super
likely).  You can create a test table with one of each collection type,
insert a record, then look at the Thrift to see how those map.

If your dynamically named columns are string concatenation or some other
custom serialization format, then your only hope is basically a data
migration from your thrift format to your CQL format.  You should be able
to accomplish all the same business functionality using CQL, but you might
not be able to create a CQL schema that maps exactly to the data at rest
for your historic schema.

On Thu, Oct 15, 2015 at 8:54 AM Onur Yalazı  wrote:

> Hello,
>
> I have a cassandra cluster from pre-cql era and I am having problems
> accessing data via cqlsh.
> As you can see below, I can not reach dynamic columns via cqlsh but they
> are accessible via cassandra-cli.
>
> How can I make the data shown on cqlsh?
>
>
> cqlsh:automation> select * from "EventKeys" where
> key='b5f0d4be-c0fc-4dc4-8e38-0a00e4552866' ;
>
>   key|eventId
>
> --+--
>   b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 |
> ce3b0c03-dcce-4522-a35a-864909cb024f
>
> (1 rows)
>
>
> [default@keyspace] get EventKeys['b5f0d4be-c0fc-4dc4-8e38-0a00e4552866'];
> => (name=actionId, value=3038...64623661656362653939,
> timestamp=1431608711629002)
> => (name=code, b0a..0a0a, timestamp=1431608711629003)
> => (name=eventId, value=ce3b0c03-dcce-4522-a35a-864909cb024f,
> timestamp=1431608711629000)
> Returned 3 results.
>
>
> ColumnFamily Description:
>  ColumnFamily: EventKeys
>Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
>Default column value validator:
> org.apache.cassandra.db.marshal.BytesType
>Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
>GC grace seconds: 864000
>Compaction min/max thresholds: 4/32
>Read repair chance: 0.1
>DC Local Read repair chance: 0.0
>Populate IO Cache on flush: false
>Replicate on write: true
>Caching: KEYS_ONLY
>Default time to live: 0
>Bloom Filter FP chance: 0.01
>Index interval: 128
>Speculative Retry: 99.0PERCENTILE
>Built indexes: [EventKeys.eventkeys_eventid_idx]
>Column Metadata:
>  Column Name: scenarioId
>Validation Class: org.apache.cassandra.db.marshal.AsciiType
>Index Name: eventkeys_eventid_idx
>Index Type: KEYS
>Index Options: {}
>Compaction Strategy:
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>Compaction Strategy Options:
>  min_threshold: 4
>  max_threshold: 32
>Compression Options:
>  sstable_compression:
> org.apache.cassandra.io.compress.SnappyCompressor
>
>
> CQL Desc of the table:
>
> CREATE TABLE "EventKeys" (
>key ascii,
>"eventId" ascii,
>PRIMARY KEY (key)
> ) WITH COMPACT STORAGE AND
>bloom_filter_fp_chance=0.01 AND
>caching='KEYS_ONLY' AND
>comment='' AND
>dclocal_read_repair_chance=0.00 AND
>gc_grace_seconds=864000 AND
>index_interval=128 AND
>read_repair_chance=0.10 AND
>replicate_on_write='true' AND
>populate_io_cache_on_flush='false' AND
>default_time_to_live=0 AND
>speculative_retry='99.0PERCENTILE' AND
>memtable_flush_period_in_ms=0 AND
>compaction={'min_threshold': '4', 'class':
> 'SizeTieredCompactionStrategy', 'max_threshold': '32'} AND
>compression={'sstable_compression': 'SnappyCompressor'};
> CREATE INDEX eventkeys_eventid_idx ON "EventKeys" ("eventId");
>
>


Re: LOCAL_SERIAL

2015-10-15 Thread Eric Stevens
You probably could, but if I were you, I'd consider a tool built for that
purpose, such as Zookeeper.  It'd open up access to a lot of other great
cluster coordination features.

On Thu, Oct 15, 2015 at 8:47 AM Jan Algermissen 
wrote:

> Hi,
>
> suppose I have two data centers and want to coordinate a bunch of services
> in each data center (for example to load data into a per-DC system that is
> not DC-aware (Solr)).
>
> Does it make sense to use CAS functionality with explicit LOCAL_SERIAL to
> 'elect' a leader per data center to do the work?
>
> So instead of saying 'for this query, LOCAL_SERIAL is enough for me' this
> would be like saying 'I want XYZ to happen exactly once, per data center'.
> - All services would try to do XYZ, but only one instance *per datacenter*
> will actually become the leader and succeed.
>
> Makes sense?
>
> Jan
>


Re: LOCAL_SERIAL

2015-10-15 Thread Jon Haddad
ZK seems a little overkill for just 1 feature though.  LOCAL_SERIAL is fine if 
all you want to do is keep a handful of keys up to date.  

There’s a massive cost in adding something new to your infrastructure, and imo, 
very little gain in this case.

> On Oct 15, 2015, at 8:29 AM, Eric Stevens  wrote:
> 
> You probably could, but if I were you, I'd consider a tool built for that 
> purpose, such as Zookeeper.  It'd open up access to a lot of other great 
> cluster coordination features.
> 
> On Thu, Oct 15, 2015 at 8:47 AM Jan Algermissen  > wrote:
> Hi,
> 
> suppose I have two data centers and want to coordinate a bunch of services in 
> each data center (for example to load data into a per-DC system that is not 
> DC-aware (Solr)).
> 
> Does it make sense to use CAS functionality with explicit LOCAL_SERIAL to 
> 'elect' a leader per data center to do the work?
> 
> So instead of saying 'for this query, LOCAL_SERIAL is enough for me' this 
> would be like saying 'I want XYZ to happen exactly once, per data center'. - 
> All services would try to do XYZ, but only one instance *per datacenter* will 
> actually become the leader and succeed.
> 
> Makes sense?
> 
> Jan



LOCAL_SERIAL

2015-10-15 Thread Jan Algermissen

Hi,

suppose I have two data centers and want to coordinate a bunch of services in 
each data center (for example to load data into a per-DC system that is not 
DC-aware (Solr)).

Does it make sense to use CAS functionality with explicit LOCAL_SERIAL to 
'elect' a leader per data center to do the work?

So instead of saying 'for this query, LOCAL_SERIAL is enough for me' this would 
be like saying 'I want XYZ to happen exactly once, per data center'. - All 
services would try to do XYZ, but only one instance *per datacenter* will 
actually become the leader and succeed.

Makes sense?

Jan


Accessing dynamic columns via cqlsh

2015-10-15 Thread Onur Yalazı

Hello,

I have a cassandra cluster from pre-cql era and I am having problems 
accessing data via cqlsh.
As you can see below, I can not reach dynamic columns via cqlsh but they 
are accessible via cassandra-cli.


How can I make the data shown on cqlsh?


cqlsh:automation> select * from "EventKeys" where 
key='b5f0d4be-c0fc-4dc4-8e38-0a00e4552866' ;


 key|eventId
--+--
 b5f0d4be-c0fc-4dc4-8e38-0a00e4552866 | 
ce3b0c03-dcce-4522-a35a-864909cb024f


(1 rows)


[default@keyspace] get EventKeys['b5f0d4be-c0fc-4dc4-8e38-0a00e4552866'];
=> (name=actionId, value=3038...64623661656362653939, 
timestamp=1431608711629002)

=> (name=code, b0a..0a0a, timestamp=1431608711629003)
=> (name=eventId, value=ce3b0c03-dcce-4522-a35a-864909cb024f, 
timestamp=1431608711629000)

Returned 3 results.


ColumnFamily Description:
ColumnFamily: EventKeys
  Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Default column value validator: 
org.apache.cassandra.db.marshal.BytesType

  Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Populate IO Cache on flush: false
  Replicate on write: true
  Caching: KEYS_ONLY
  Default time to live: 0
  Bloom Filter FP chance: 0.01
  Index interval: 128
  Speculative Retry: 99.0PERCENTILE
  Built indexes: [EventKeys.eventkeys_eventid_idx]
  Column Metadata:
Column Name: scenarioId
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Index Name: eventkeys_eventid_idx
  Index Type: KEYS
  Index Options: {}
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

  Compaction Strategy Options:
min_threshold: 4
max_threshold: 32
  Compression Options:
sstable_compression: 
org.apache.cassandra.io.compress.SnappyCompressor



CQL Desc of the table:

CREATE TABLE "EventKeys" (
  key ascii,
  "eventId" ascii,
  PRIMARY KEY (key)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'min_threshold': '4', 'class': 
'SizeTieredCompactionStrategy', 'max_threshold': '32'} AND

  compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX eventkeys_eventid_idx ON "EventKeys" ("eventId");



Re: Re : Replication factor for system_auth keyspace

2015-10-15 Thread Robert Coli
On Thu, Oct 15, 2015 at 10:24 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

>   we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
> For the system_auth keyspace, what should be the ideal replication_factor
> set?
>
> We tried setting the replication factor equal to the number of nodes in a
> datacenter, and the repair for the system_auth keyspace took really long.
> Your suggestions would be of great help.
>

More than 1 and a lot less than 48.

=Rob


Cassandra 2.2.1 stuck at 100% on Windows

2015-10-15 Thread Alaa Zubaidi (PDF)
Hi,
We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
Nodes are stuck at 100% CPU bringing the whole VM to a halt.
We suspect that there is another process that IT/Windows is causing the CPU
issue, but the problem is Cassandra does NOT recover, the CPU utilization
start climbing until the VM is not usable.If we restart Cassandra it
Anyone have seen this before?

Thanks
-- Alaa

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.departm...@pdf.com* *.*


Re: Cassandra 2.2.1 stuck at 100% on Windows

2015-10-15 Thread Robert Coli
On Thu, Oct 15, 2015 at 6:04 PM, Alaa Zubaidi (PDF) 
wrote:

> We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
> Nodes are stuck at 99% CPU bringing the whole VM to a halt.
> We suspect that there is another process that IT/Windows is causing the
> CPU issue, but the problem is Cassandra does NOT recover, the CPU
> utilization start climbing until the VM is not usable. If we restart
> Cassandra, things go back to normal.
>

Most cases where a JVM does not recover and churns at maxed CPU are the
result of GC failure and/or OOM.

Check your logs for OOM and long GCs.

Also FWIW you are among a relatively small group of Windows operators.
Other than with the people working at datastax to support Windows, there is
not a whole lot of well understood operational best practice for Cassandra
on Windows.

=Rob


Cassandra 2.2.1 stuck at 100% on Windows

2015-10-15 Thread Alaa Zubaidi (PDF)
Hi,
We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
Nodes are stuck at 99% CPU bringing the whole VM to a halt.
We suspect that there is another process that IT/Windows is causing the CPU
issue, but the problem is Cassandra does NOT recover, the CPU utilization
start climbing until the VM is not usable. If we restart Cassandra, things
go back to normal.
Anyone have seen this before?

Thanks
-- Alaa

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.departm...@pdf.com* *.*


Re: unchecked_tombstone_compaction - query

2015-10-15 Thread Robert Coli
On Thu, Oct 15, 2015 at 9:01 AM, Paulo Motta 
wrote:

> (OP says:) So - isn't setting unchecked_tombstone_compaction to "true" a
>> dangerous setting?  Won't it cause resurrections?  What is the use case for
>> this knob, and when do I know I can set it to true safely?
>>
>
To expand slightly on Paulo's great answer :

The only time to really consider use this feature is if you have a
reasonable suspicion that because of your write patterns that you will do
less net work if you simply skip the pre-check. Like many other performance
centric features whose use case seems difficult to grasp, it was likely
added because of a single significant user who was in exactly that case.

=Rob


Re: Collections (MAP) data in Column Family

2015-10-15 Thread Robert Coli
On Wed, Oct 14, 2015 at 10:03 PM, Saladi Naidu 
wrote:

> Thanks for the reply. yes this indeed due to range tombstones with MAP
> data, even after tombstone past the gc_grace_poeriod and compactions ran in
> cluster, still no change with tombstone data in SSTables. Do you or nayone
> in the group know how to delete this bad data from the cluster?
>

Other than applying the patch from the upthread tickets, you could :

1) stop node

2) for each sstable affected:
a) dump to JSON with sstable2json
b) remove all duplicate range tombstones from JSON
c) recreate sstable with same name via json2sstable

3) start node

However based on my understanding of this issue, they'll just build up
again until you are running a version with the upthread patches.

=Rob


Re : Replication factor for system_auth keyspace

2015-10-15 Thread sai krishnam raju potturi
hi;
  we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
For the system_auth keyspace, what should be the ideal replication_factor
set?

We tried setting the replication factor equal to the number of nodes in a
datacenter, and the repair for the system_auth keyspace took really long.
Your suggestions would be of great help.

thanks
Sai


RE: reiserfs - DirectoryNotEmptyException

2015-10-15 Thread Modha, Digant
It is deployed on an existing cluster but will be migrated soon to a different 
file system & Linux distribution.

-Original Message-
From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael Shuler
Sent: Wednesday, October 14, 2015 6:02 PM
To: user@cassandra.apache.org
Subject: Re: reiserfs - DirectoryNotEmptyException

On 10/13/2015 01:58 PM, Modha, Digant wrote:
> I am running Cassandra 2.1.10 and noticed intermittent 
> DirectoryNotEmptyExceptions during repair.  My cassandra data drive is 
> reiserfs.

Why? I'm genuinely interested in this filesystem selection, since it is 
unmaintained, has been dropped from some mainstream linux distributions, and 
some may call it "dead". ;)

> I noticed that on reiserfs wiki site
> https://en.m.wikipedia.org/wiki/ReiserFS#Criticism, it states that 
> unlink operation is not synchronous. Is that the reason for the 
> exception below:
>
> ERROR [ValidationExecutor:137] 2015-10-13 00:46:30,759
> CassandraDaemon.java:227 - Exception in thread 
> Thread[ValidationExecutor:137,1,main]
>
> org.apache.cassandra.io.FSWriteError:
> java.nio.file.DirectoryNotEmptyException:
>
> at
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.jav
> a:135)
>
>~[apache-cassandra-2.1.10.jar:2.1.10]
<...>

This seems like a reasonable explanation. Using a modern filesystem like
ext4 or xfs would certainly be helpful in getting you within the realm of a 
"common" hardware setup.

https://wiki.apache.org/cassandra/CassandraHardware
https://www.safaribooksonline.com/library/view/cassandra-high-performance/9781849515122/ch04s06.html

I think Al Tobey had a slide deck on filesystem tuning for C*, but I didn't 
find it quickly.

--
Kind regards,
Michael


TD Securities disclaims any liability or losses either direct or consequential 
caused by the use of this information. This communication is for informational 
purposes only and is not intended as an offer or solicitation for the purchase 
or sale of any financial instrument or as an official confirmation of any 
transaction. TD Securities is neither making any investment recommendation nor 
providing any professional or advisory services relating to the activities 
described herein. All market prices, data and other information are not 
warranted as to completeness or accuracy and are subject to change without 
notice Any products described herein are (i) not insured by the FDIC, (ii) not 
a deposit or other obligation of, or guaranteed by, an insured depository 
institution and (iii) subject to investment risks, including possible loss of 
the principal amount invested. The information shall not be further distributed 
or duplicated in whole or in part by any means without the prior written 
consent of TD Securities. TD Securities is a trademark of The Toronto-Dominion 
Bank and represents TD Securities (USA) LLC and certain investment banking 
activities of The Toronto-Dominion Bank and its subsidiaries.