cqlsh -e output - How to change the default delimiter '|' in the output

2017-08-13 Thread Harikrishnan A
Hello,
When I execute cqlsh -e "SELECT statement .."  , it gives the output with a 
pipe ('|') separator. Is there anyway I can change this default delimiter in 
the output of cqlsh -e " SELECT statement ..". 
Thanks & Regards,Hari

live dsc upgrade from 2.0 to 2.1 behind the scenes

2017-08-13 Thread Park Wu
Hi, folks: I am planning to upgrade our production from dsc 2.0.16 to 2.1.18 
for 2 DC (20 nodes each, 600GB per node). Few questions:1), what happen when 
doing rolling upgrade? Let''s say we upgrade only one node to new version, 
before upgrade sstable, the data coming in will stay in the node and not be 
able to stream to other nodes?2), What if I have very active writes? how much 
data it can hold until it sees other nodes with new version so it can 
stream?3), Should I upgrade sstable when all nodes in one DC upgraded? or wait 
until all 2 DC upgraded?4), any idea or experience how long it will take to 
upgrade sstable for 600 GB data on each node?5), what is the max time I can 
take for rolling upgrade on each DC?6), I was doing a test with 3-nodes 
cluster, one node with 2.1.18, other two are 2.0.16. I got a warning on the 
node with newer version when I tried to create keyspace and insert some sample 
data:"Warning: schema version mismatch detected, which might be caused by DOWN 
nodes; if this is not the case, check the schema versions of your nodes in 
system.local and system.peers. OperationTimedOut: errors={}, last_host=xxx"But 
data upserted successfully, even not be seen on other nodes. Any 
suggestion?Great thanks for any help or comments!- Park

live dsc upgrade from 2.0 to 2.1 behind the scenes

2017-08-13 Thread Park Wu
Hi, folks: I am planning to upgrade our production from dsc 2.0.16 to 2.1.18 
for 2 DC (20 nodes each, 600GB per node). Few questions:1), what happen when 
doing rolling upgrade. Let's say we only upgrade one node to new version, 
before upgrade sstable, the data coming in will stay in the node and not be 
able to stream to other nodes?2), What if I have very active writes? how much 
data it can hold until it sees other nodes with new version so it can 
stream?3), Should I upgrade sstable when all nodes in one DC upgraded? or wait 
until all 2 DC upgraded?4), any idea or experience how long it will take to 
upgrade sstable for 600 GB data on each node?5), what is the max time I can 
take for rolling upgrade on each DC?6), I was doing a test with 3-nodes 
cluster, one node with 2.1.18, other two are 2.0.16. I got a warning on the 
node with newer version when I tried to create keyspace and insert some sample 
data:"Warning: schema version mismatch detected, which might be caused by DOWN 
nodes; if this is not the case, check the schema versions of your nodes in 
system.local and system.peers. OperationTimedOut: errors={}, last_host=xxx"But 
data upserted successfully, even not be seen on other nodes. Any 
suggestion?Great thanks for any help or comments!- Park

Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Thanks Kurt.

We had one sstable from a cf of ours.  I am actually running a repair on
that cf now and then plan to try and join the additional nodes as you
suggest.  I deleted the opscenter corrupt sstables as well but will not
bother repairing that before adding capacity.

Been keeping an eye across all nodes for corrupt exceptions - so far no new
occurrences.

Thanks again.

-B

On Sun, Aug 13, 2017 at 17:52 kurt greaves  wrote:

>
>
> On 14 Aug. 2017 00:59, "Brian Spindler"  wrote:
>
> Do you think with the setup I've described I'd be ok doing that now to
> recover this node?
>
> The node died trying to run the scrub; I've restarted it but I'm not sure
> it's going to get past a scrub/repair, this is why I deleted the other
> files as a brute force method.  I think I might have to do the same here
> and then kick off a repair if I can't just replace it?
>
> is it just opscenter keyspace that has corrupt sstables? if so I wouldn't
> worry about repairing too much. If you can get that third node in C to join
> I'd say your best bet is to just do that until you have enough nodes in C.
> Dropping and increasing RF is pretty risky on a live system.
>
> It sounds to me like you stand a good chance of getting the new nodes in C
> to join so I'd pursue that before trying anything more complicated
>
>
> Doing the repair on the node that had the corrupt data deleted should be
> ok?
>
> Yes. as long as you also deleted corrupt SSTables on any other nodes that
> had them.
>
-- 
-Brian


Re: MemtablePostFlush pending

2017-08-13 Thread Akhil Mehra
HI Asad,

The post flush task frees up allocated commit log segments.

Apart for commit log segment allocation the post flush task "synchronises 
custom secondary indexes and provides ordering guarantees for futures on 
switchMemtable/flush
etc, which expect to be able to wait until the flush (and all prior flushes) 
requested have completed."

The post flush executor is a single threaded thread pool that cannot be tuned 
(https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L160-L167
 
)

Are you using secondary indexes? I there a high write throughput which is 
resulting in frequent meltable flushes?

Regards,
Akhil





> On 12/08/2017, at 7:05 AM, ZAIDI, ASAD A  wrote:
> 
> Hello Folks,
>  
> I’m using Cassandra 2.2 on 14 node cluster.
>  
> Now a days, I’m observing memtablepostflush pending number going high , this 
> happens intermittently. I’m looking if  Is there way to ‘tune’ 
> memtablepostflush stage?
>  
> Thanks/ASad



Re: Dropping down replication factor

2017-08-13 Thread kurt greaves
On 14 Aug. 2017 00:59, "Brian Spindler"  wrote:

Do you think with the setup I've described I'd be ok doing that now to
recover this node?

The node died trying to run the scrub; I've restarted it but I'm not sure
it's going to get past a scrub/repair, this is why I deleted the other
files as a brute force method.  I think I might have to do the same here
and then kick off a repair if I can't just replace it?

is it just opscenter keyspace that has corrupt sstables? if so I wouldn't
worry about repairing too much. If you can get that third node in C to join
I'd say your best bet is to just do that until you have enough nodes in C.
Dropping and increasing RF is pretty risky on a live system.

It sounds to me like you stand a good chance of getting the new nodes in C
to join so I'd pursue that before trying anything more complicated


Doing the repair on the node that had the corrupt data deleted should be
ok?

Yes. as long as you also deleted corrupt SSTables on any other nodes that
had them.


Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Do you think with the setup I've described I'd be ok doing that now to
recover this node?

The node died trying to run the scrub; I've restarted it but I'm not sure
it's going to get past a scrub/repair, this is why I deleted the other
files as a brute force method.  I think I might have to do the same here
and then kick off a repair if I can't just replace it?

Doing the repair on the node that had the corrupt data deleted should be
ok?

On Sun, Aug 13, 2017 at 10:29 AM Jeff Jirsa  wrote:

> Running repairs when you have corrupt sstables can spread the corruption
>
> In 2.1.15, corruption is almost certainly from something like a bad disk
> or bad RAM
>
> One way to deal with corruption is to stop the node and replace is (with
> -Dcassandra.replace_address) so you restream data from neighbors. The
> challenge here is making sure you have a healthy replica for streaming
>
> Please make sure you have backups and snapshots if you have corruption
> popping up
>
> If you're using vnodes, once you get rid of the corruption you may
> consider adding another c node with fewer vnodes to try to get it joined
> faster with less data.
>
>
> --
> Jeff Jirsa
>
>
> On Aug 13, 2017, at 7:11 AM, Brian Spindler 
> wrote:
>
> Hi Jeff, I ran the scrub online and that didn't help.  I went ahead and
> stopped the node, deleted all the corrupted data files --*.db
> files and planned on running a repair when it came back online.
>
> Unrelated I believe, now another CF is corrupted!
>
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
> /ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db
> Caused by: org.apache.cassandra.io.compress.CorruptBlockException:
> (/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db):
> corruption detected, chunk at 101500 of length 26523398.
>
> Few days ago when troubleshooting this I did change the OpsCenter keyspace
> RF == 2 from 3 since I thought that would help reduce load.  Did that cause
> this corruption?
>
> running *'nodetool scrub OpsCenter rollups300'* on that node now
>
> And now I also see this when running nodetool status:
>
> *"Note: Non-system keyspaces don't have the same replication settings,
> effective ownership information is meaningless"*
>
> What to do?
>
> I still can't stream to this new node cause of this corruption.  Disk
> space is getting low on these nodes ...
>
> On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler 
> wrote:
>
>> nothing in logs on the node that it was streaming from.
>>
>> however, I think I found the issue on the other node in the C rack:
>>
>> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354
>> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5]
>> Streaming error occurred
>> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
>> /ephemeral/cassandra/data/...
>>
>> I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems
>> it's a single Index.db file and nothing on the other node.
>>
>> I think nodetool scrub or offline sstablescrub might be in order but with
>> the current load I'm not sure I can take it offline for very long.
>>
>> Thanks again for the help.
>>
>>
>> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:
>>
>>> Compaction is backed up – that may be normal write load (because of the
>>> rack imbalance), or it may be a secondary index build. Hard to say for
>>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack
>>> probably not necessary, streaming is being marked as failed and it’s
>>> turning itself off. Not sure why streaming is marked as failing, though,
>>> anything on the sending sides?
>>>
>>>
>>>
>>>
>>>
>>> From: Brian Spindler 
>>> Reply-To: 
>>> Date: Saturday, August 12, 2017 at 6:34 PM
>>> To: 
>>> Subject: Re: Dropping down replication factor
>>>
>>> Thanks for replying Jeff.
>>>
>>> Responses below.
>>>
>>> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
>>>
 Answers inline

 --
 Jeff Jirsa


 > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
 >
 > Hi folks, hopefully a quick one:
 >
 > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.
 It's all in one region but spread across 3 availability zones.  It was
 nicely balanced with 4 nodes in each.
 >
 > But with a couple of failures and subsequent provisions to the wrong
 az we now have a cluster with :
 >
 > 5 nodes in az A
 > 5 nodes in az B
 > 2 nodes in az C
 >
 > Not sure why, but when adding a third node in AZ C it fails to stream
 after getting all the way to completion and no apparent error in logs.
 I've looked at a couple of bugs 

Re: Dropping down replication factor

2017-08-13 Thread Jeff Jirsa
Running repairs when you have corrupt sstables can spread the corruption

In 2.1.15, corruption is almost certainly from something like a bad disk or bad 
RAM

One way to deal with corruption is to stop the node and replace is (with 
-Dcassandra.replace_address) so you restream data from neighbors. The challenge 
here is making sure you have a healthy replica for streaming 

Please make sure you have backups and snapshots if you have corruption popping 
up

If you're using vnodes, once you get rid of the corruption you may consider 
adding another c node with fewer vnodes to try to get it joined faster with 
less data.


-- 
Jeff Jirsa


> On Aug 13, 2017, at 7:11 AM, Brian Spindler  wrote:
> 
> Hi Jeff, I ran the scrub online and that didn't help.  I went ahead and 
> stopped the node, deleted all the corrupted data files --*.db files 
> and planned on running a repair when it came back online.  
> 
> Unrelated I believe, now another CF is corrupted!  
> 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db
> Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
> (/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db):
>  corruption detected, chunk at 101500 of length 26523398.
> 
> Few days ago when troubleshooting this I did change the OpsCenter keyspace RF 
> == 2 from 3 since I thought that would help reduce load.  Did that cause this 
> corruption? 
> 
> running 'nodetool scrub OpsCenter rollups300' on that node now 
> 
> And now I also see this when running nodetool status: 
> 
> "Note: Non-system keyspaces don't have the same replication settings, 
> effective ownership information is meaningless"
> 
> What to do?  
> 
> I still can't stream to this new node cause of this corruption.  Disk space 
> is getting low on these nodes ... 
> 
>> On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler  
>> wrote:
>> nothing in logs on the node that it was streaming from.  
>> 
>> however, I think I found the issue on the other node in the C rack: 
>> 
>> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354 
>> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5] 
>> Streaming error occurred
>> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
>> /ephemeral/cassandra/data/...
>> 
>> I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems it's 
>> a single Index.db file and nothing on the other node.  
>> 
>> I think nodetool scrub or offline sstablescrub might be in order but with 
>> the current load I'm not sure I can take it offline for very long.  
>> 
>> Thanks again for the help. 
>> 
>> 
>>> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:
>>> Compaction is backed up – that may be normal write load (because of the 
>>> rack imbalance), or it may be a secondary index build. Hard to say for 
>>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack 
>>> probably not necessary, streaming is being marked as failed and it’s 
>>> turning itself off. Not sure why streaming is marked as failing, though, 
>>> anything on the sending sides?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: Brian Spindler 
>>> Reply-To: 
>>> Date: Saturday, August 12, 2017 at 6:34 PM
>>> To: 
>>> Subject: Re: Dropping down replication factor
>>> 
>>> Thanks for replying Jeff. 
>>> 
>>> Responses below. 
>>> 
 On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
 Answers inline
 
 --
 Jeff Jirsa
 
 
 > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
 >
 > Hi folks, hopefully a quick one:
 >
 > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's 
 > all in one region but spread across 3 availability zones.  It was nicely 
 > balanced with 4 nodes in each.
 >
 > But with a couple of failures and subsequent provisions to the wrong az 
 > we now have a cluster with :
 >
 > 5 nodes in az A
 > 5 nodes in az B
 > 2 nodes in az C
 >
 > Not sure why, but when adding a third node in AZ C it fails to stream 
 > after getting all the way to completion and no apparent error in logs.  
 > I've looked at a couple of bugs referring to scrubbing and possible OOM 
 > bugs due to metadata writing at end of streaming (sorry don't have 
 > ticket handy).  I'm worried I might not be able to do much with these 
 > since the disk space usage is high and they are under a lot of load 
 > given the small number of them for this rack.
 
 You'll definitely have higher load on az C instances with rf=3 in this 
 ratio 
 
 Streaming should still work - are you 

Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Hi Jeff, I ran the scrub online and that didn't help.  I went ahead and
stopped the node, deleted all the corrupted data files --*.db
files and planned on running a repair when it came back online.

Unrelated I believe, now another CF is corrupted!

org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db
Caused by: org.apache.cassandra.io.compress.CorruptBlockException:
(/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db):
corruption detected, chunk at 101500 of length 26523398.

Few days ago when troubleshooting this I did change the OpsCenter keyspace
RF == 2 from 3 since I thought that would help reduce load.  Did that cause
this corruption?

running *'nodetool scrub OpsCenter rollups300'* on that node now

And now I also see this when running nodetool status:

*"Note: Non-system keyspaces don't have the same replication settings,
effective ownership information is meaningless"*

What to do?

I still can't stream to this new node cause of this corruption.  Disk space
is getting low on these nodes ...

On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler 
wrote:

> nothing in logs on the node that it was streaming from.
>
> however, I think I found the issue on the other node in the C rack:
>
> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354
> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5]
> Streaming error occurred
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
> /ephemeral/cassandra/data/...
>
> I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems
> it's a single Index.db file and nothing on the other node.
>
> I think nodetool scrub or offline sstablescrub might be in order but with
> the current load I'm not sure I can take it offline for very long.
>
> Thanks again for the help.
>
>
> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:
>
>> Compaction is backed up – that may be normal write load (because of the
>> rack imbalance), or it may be a secondary index build. Hard to say for
>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack
>> probably not necessary, streaming is being marked as failed and it’s
>> turning itself off. Not sure why streaming is marked as failing, though,
>> anything on the sending sides?
>>
>>
>>
>>
>>
>> From: Brian Spindler 
>> Reply-To: 
>> Date: Saturday, August 12, 2017 at 6:34 PM
>> To: 
>> Subject: Re: Dropping down replication factor
>>
>> Thanks for replying Jeff.
>>
>> Responses below.
>>
>> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
>>
>>> Answers inline
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
>>> >
>>> > Hi folks, hopefully a quick one:
>>> >
>>> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's
>>> all in one region but spread across 3 availability zones.  It was nicely
>>> balanced with 4 nodes in each.
>>> >
>>> > But with a couple of failures and subsequent provisions to the wrong
>>> az we now have a cluster with :
>>> >
>>> > 5 nodes in az A
>>> > 5 nodes in az B
>>> > 2 nodes in az C
>>> >
>>> > Not sure why, but when adding a third node in AZ C it fails to stream
>>> after getting all the way to completion and no apparent error in logs.
>>> I've looked at a couple of bugs referring to scrubbing and possible OOM
>>> bugs due to metadata writing at end of streaming (sorry don't have ticket
>>> handy).  I'm worried I might not be able to do much with these since the
>>> disk space usage is high and they are under a lot of load given the small
>>> number of them for this rack.
>>>
>>> You'll definitely have higher load on az C instances with rf=3 in this
>>> ratio
>>
>>
>>> Streaming should still work - are you sure it's not busy doing
>>> something? Like building secondary index or similar? jstack thread dump
>>> would be useful, or at least nodetool tpstats
>>>
>>> Only other thing might be a backup.  We do incrementals x1hr and
>> snapshots x24h; they are shipped to s3 then links are cleaned up.  The
>> error I get on the node I'm trying to add to rack C is:
>>
>> ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception
>> encountered during startup
>> java.lang.RuntimeException: Error during boostrap: Stream failed
>> at
>> org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87)
>> ~[apache-cassandra-2.1.15.jar:2.1.15]
>> at
>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
>> ~[apache-cassandra-2.1.15.jar:2.1.15]
>> at
>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944)
>> ~[apache-cassandra-2.1.15.jar:2.1.15]
>>   

Re: Large tombstones creation

2017-08-13 Thread Christophe Schmitz
Hi Vlad,

Are you by any chance inserting null values? If so you will create
tombstones. The work around (Cassandra >= 2.2) is to use unset on your
bound statement (see https://issues.apache.org/jira/browse/CASSANDRA-7304)

Cheers,

Christophe

On 13 August 2017 at 20:48, Vlad  wrote:

> Hi,
>
> I insert about 45000 rows to empty table in Python using prepared
> statements and IF NOT EXISTS. While reading after insert I get warnings like
> *Server warning: Read 5000 live rows and 33191 tombstone cells for query
> SELECT * FROM ...  LIMIT 5000 (see tombstone_warn_threshold)*
>
> How it can happen? I have several SASI indexes for this table, can this be
> a reason?
>
> Regards, Vlad
>



-- 


*Christophe Schmitz*
*Director of consulting EMEA*AU: +61 4 03751980 / FR: +33 7 82022899




   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Large tombstones creation

2017-08-13 Thread Vlad
Hi,
I insert about 45000 rows to empty table in Python using prepared statements 
and IF NOT EXISTS. While reading after insert I get warnings likeServer 
warning: Read 5000 live rows and 33191 tombstone cells for query SELECT * FROM 
...  LIMIT 5000 (see tombstone_warn_threshold)

How it can happen? I have several SASI indexes for this table, can this be a 
reason?
Regards, Vlad