date:20190813

[jira] [Created] (CASSANDRA-15278) User's password for sstableloader tool is visible in ps command output.

2019-08-13 Thread Niket Vilas Bagwe (JIRA)

Niket Vilas Bagwe created CASSANDRA-15278:
-

 Summary: User's password for sstableloader tool is visible in ps 
command output.
 Key: CASSANDRA-15278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15278
 Project: Cassandra
  Issue Type: Bug
  Components: Tool/bulk load
Reporter: Niket Vilas Bagwe


As of now, the password is visible in ps auxww output to any of the system user 
if the command line utility for sstableloader is used. This seems to be a 
security flaw. There should be an alternate option to pass the user's password 
other than as a command line argument.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Dinesh Joshi (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15272:
-
Reviewers: Jon Meredith  (was: Dinesh Joshi, Jon Meredith)

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Dinesh Joshi (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15272:
-
Status: Ready to Commit  (was: Review In Progress)

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Jon Meredith (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906691#comment-16906691
 ] 

Jon Meredith commented on CASSANDRA-15272:
--

Thanks Dinesh, that's perfect.

+1 once CASSANDRA-15170 merges (once you update the imports)

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15080) Paxos tables should allow a configurable chunk length

2019-08-13 Thread Jon Haddad (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15080:
---
Description: 
In doing some research for a client on LWTs, I found that once we start pushing 
a node hard enough, with enough LWTs, decompressing the data in the paxos table 
takes up quite a bit of time. I've attached an SVG flame graph showing about 
10% of time is spent in LZ4_decompress_fast in queries hitting the paxos table. 

We should allow the user to modify the compression settings on this table.

  was:In doing some research for a client on LWTs, I found that once we start 
pushing a node hard enough, with enough LWTs, decompressing the data in the 
paxos table takes up quite a bit of time.  I've attached an SVG flame graph 
showing about 10% of time is spent in LZ4_decompress_fast in queries hitting 
the paxos table.  We should be able to get a nice little performance bump from 
changing this to 4KB chunk length or disabling it completely.


> Paxos tables should allow a configurable chunk length
> -
>
> Key: CASSANDRA-15080
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15080
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>  Labels: performance
> Attachments: flamegraph-bad-perf.svg
>
>
> In doing some research for a client on LWTs, I found that once we start 
> pushing a node hard enough, with enough LWTs, decompressing the data in the 
> paxos table takes up quite a bit of time. I've attached an SVG flame graph 
> showing about 10% of time is spent in LZ4_decompress_fast in queries hitting 
> the paxos table. 
> We should allow the user to modify the compression settings on this table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15080) Paxos tables should allow a configurable chunk length

2019-08-13 Thread Jon Haddad (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15080:
---
Summary: Paxos tables should allow a configurable chunk length  (was: Paxos 
tables should use a 4KB chunk length)

> Paxos tables should allow a configurable chunk length
> -
>
> Key: CASSANDRA-15080
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15080
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Lightweight Transactions
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>  Labels: performance
> Attachments: flamegraph-bad-perf.svg
>
>
> In doing some research for a client on LWTs, I found that once we start 
> pushing a node hard enough, with enough LWTs, decompressing the data in the 
> paxos table takes up quite a bit of time.  I've attached an SVG flame graph 
> showing about 10% of time is spent in LZ4_decompress_fast in queries hitting 
> the paxos table.  We should be able to get a nice little performance bump 
> from changing this to 4KB chunk length or disabling it completely.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Dinesh Joshi (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15272:
-
Reviewers: Jon Meredith, Dinesh Joshi  (was: Dinesh Joshi, Jon Meredith)
   Status: Review In Progress  (was: Patch Available)
Reviewers: Jon Meredith, Dinesh Joshi  (was: Jon Meredith)

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Dinesh Joshi (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906629#comment-16906629
 ] 

Dinesh Joshi commented on CASSANDRA-15272:
--

Thanks for the review [~jmeredithco]. I have updated the branch with your 
review feedback. Please take a look.

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Dinesh Joshi (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15272:
-
Reviewers: Jon Meredith

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15277:
-
 Complexity: Normal
Change Category: Operability

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906591#comment-16906591
 ] 

Jon Meredith commented on CASSANDRA-15277:
--

Branch: https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15277
Pull request: https://github.com/apache/cassandra/pull/340 Add support for 
resizing the SEPExecutor thread pools used by some of the work stages.

This version has the smallest change to the SEPExecutor itself and introduces a 
new flag to make the workers release and re-acquire work permits while the 
thread setting the size adds/discards work permits to get the desired maximum 
concurrency.

There are two other design choices I could explore.

1) Convert the work permit representation to signed and have worker threads 
return permits while it is non-positive. This allows the resizing thread to 
immediately exit.

2) Save introducing the resizing volatile boolean, by dedicating a bit in 
`permits` to mark when resizing is taking place - it gets checked anyway, but 
would be a slightly larger change and would reduce the maximum number of 
taskwork permits representable.

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15277:
---
Labels: pull-request-available  (was: )

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)

Jon Meredith created CASSANDRA-15277:


 Summary: Make it possible to resize concurrent read / write thread 
pools at runtime
 Key: CASSANDRA-15277
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/Other
Reporter: Jon Meredith
Assignee: Jon Meredith


To better mitigate cluster overload the executor services for various stages 
should be configurable at runtime (probably as a JMX hot property). 

Related to CASSANDRA-5044, this would add the capability to resize to 
multiThreadedLowSignalStage pools based on SEPExecutor.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Jon Meredith (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906419#comment-16906419
 ] 

Jon Meredith commented on CASSANDRA-15272:
--

+1 from me.

It's probably worth waiting until CASSANDRA-15170 lands first and it'll only 
need a minor tweak to change GOSSIP/NETWORK import path.

Only one nit, instead of commenting out the compressed test, what do you think 
about using @Ignore instead

{{@Ignore("test requires CASSANDRA-13938 to be merged")}}

That gives somebody looking at the output from test runners a better chance of 
noticing it is disabled for a reason and enabling in future.

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906354#comment-16906354
 ] 

Benedict commented on CASSANDRA-15274:
--

{{sstableexport}} / {{sstabledump}} are your friend here - pick a corrupted 
sstable, and print its contents.  I'm pretty sure that by default these tools 
do not verify the checksum, so if they print their entire contents successfully 
there's already a reasonable chance that the data is not corrupted.  But to be 
sure, exporting data for the same partition keys from sstables on other nodes, 
and comparing that the same data is produced, gives a high confidence that the 
data in the files is still valid.

This isn't quite as simple as it sounds, as there could be many records, many 
of which not contained in corrupt blocks, so it would be easier to modify 
{{sstableexport}} to detect the specifically corrupted blocks and only print 
the data contained within them.  There's also the problem that compaction can 
lead to different data on each node.  But picking a large and old sstable may 
give you a chance of fairly similar data residing on each node in comparable 
sstables.

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
>

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Phil O Conduin (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906344#comment-16906344
 ] 

Phil O Conduin commented on CASSANDRA-15274:


[~benedict] thanks a lot for the explanation.  We have a ticket open with Cisco 
for help on this also.

Can you explain a little more about how we validate for actual corruption, how 
would I go about comparing data written to files?

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deseriali

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

2019-08-13 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906288#comment-16906288
 ] 

Benedict commented on CASSANDRA-15274:
--

This error is _very_ suggestive of actual data file corruption, independent of 
C*.  This exception is thrown only when the raw data for a block, whose 
checksum was computed on write, no longer produces the same checksum.  C* never 
modifies a file once written, so in particular if these errors are being 
encountered for the first time against sstables that are older than your last 
successful repair we can essentially guarantee that the problem is with your 
system and not C*.  

How certain are you that your disks are reliable?

You can try to rule out actual corruption by comparing the contents of data 
written to files reporting these failures to the same data as it exists on 
other nodes in the cluster (whether or not the files on the other nodes report 
these errors).

> Multiple Corrupt datafiles across entire environment 
> -
>
> Key: CASSANDRA-15274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Phil O Conduin
>Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBuf

[jira] [Comment Edited] (CASSANDRA-15260) Add `allocate_tokens_for_dc_rf` yaml option for token allocation

2019-08-13 Thread mck (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905336#comment-16905336
 ] 

mck edited comment on CASSANDRA-15260 at 8/13/19 2:52 PM:
--

Thanks [~blambov]. The rename is done.


||branch||circleci||asf jenkins testall||
|[CASSANDRA-15260|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk__allocate_tokens_for_dc_rf]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk__allocate_tokens_for_dc_rf]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43/]|

I've opened the ticket, and will 'Submit Patch' it after I get some unit tests 
in.


was (Author: michaelsembwever):
Thanks [~blambov]. The rename is done.


||branch||circleci||asf jenkins testall||
|[CASSANDRA-15260|https://github.com/thelastpickle/cassandra/commit/4513af58a532b91ab4449161a79e70f78b7ebcfc]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk__allocate_tokens_for_dc_rf]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43/]|

I've opened the ticket, and will 'Submit Patch' it after I get some unit tests 
in.

> Add `allocate_tokens_for_dc_rf` yaml option for token allocation
> 
>
> Key: CASSANDRA-15260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: mck
>Assignee: mck
>Priority: Normal
> Fix For: 4.x
>
>
> Similar to DSE's option: {{allocate_tokens_for_local_replication_factor}}
> Currently the 
> [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm]
>  requires a defined keyspace and a replica factor specified in the current 
> datacenter.
> This is problematic in a number of ways. The real keyspace can not be used 
> when adding new datacenters as, in practice, all its nodes need to be up and 
> running before it has the capacity to replicate data into it. New datacenters 
> (or lift-and-shifting a cluster via datacenter migration) therefore has to be 
> done using a dummy keyspace that duplicates the replication strategy+factor 
> of the real keyspace. This gets even more difficult come version 4.0, as the 
> replica factor can not even be defined in new datacenters before those 
> datacenters are up and running. 
> These issues are removed by avoiding the keyspace definition and lookup, and 
> presuming the replica strategy is by datacenter, ie NTS. This can be done 
> with the use of an {{allocate_tokens_for_dc_rf}} option.
> It may also be of value considering whether {{allocate_tokens_for_dc_rf=3}} 
> becomes the default? as this is the replication factor for the vast majority 
> of datacenters in production. I suspect this would be a good improvement over 
> the existing randomly generated tokens algorithm.
> Initial patch is available in 
> [https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97]
> The patch does not remove the existing {{allocate_tokens_for_keyspace}} 
> option, as that provides the codebase for handling different replication 
> strategies.
>  
> fyi [~blambov] [~jay.zhuang] [~chovatia.jayd...@gmail.com] [~alokamvenki] 
> [~alexchueshev]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14336) sstableloader fails if sstables contains removed columns

2019-08-13 Thread Barriere (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906161#comment-16906161
 ] 

Barriere commented on CASSANDRA-14336:
--

Hello,

Any up for this bug ? i have exactly the same issue on cassandra 3.0.9 using 
sstableloader to restore on an other cluster a keyspace with previously tables 
deleted.

For my part, the compaction or upgradesstable dont't work.

Thanks for your feedback

> sstableloader fails if sstables contains removed columns
> 
>
> Key: CASSANDRA-14336
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14336
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: Hannu Kröger
>Assignee: Jaydeepkumar Chovatia
>Priority: Normal
>
> If I copy the schema and try to load in sstables with sstableloader, loading 
> sometimes fails with
> {code:java}
> Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: 
> java.lang.RuntimeException: Failed to list files in /tmp/test/bug3_dest-acdc
>     at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:93)
>     at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
> Caused by: java.lang.RuntimeException: Failed to list files in 
> /tmp/test/bug3_dest-acdc
>     at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:77)
>     at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:561)
>     at 
> org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:76)
>     at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:165)
>     at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:80)
>     ... 1 more
> Caused by: java.lang.RuntimeException: Unknown column d during deserialization
>     at 
> org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:321)
>     at 
> org.apache.cassandra.io.sstable.format.SSTableReader.openForBatch(SSTableReader.java:440)
>     at 
> org.apache.cassandra.io.sstable.SSTableLoader.lambda$openSSTables$0(SSTableLoader.java:121)
>     at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.lambda$innerList$2(LogAwareFileLister.java:99)
>     at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>     at java.util.TreeMap$EntrySpliterator.forEachRemaining(TreeMap.java:2969)
>     at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>     at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>     at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>     at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>     at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>     at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:101)
>     at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:73)
>     ... 5 more{code}
> This requires that we have dropped columns in the source table and sstables 
> exist from the "old schema" time.
> This can be very easily reproduced. I used following script:
> {code:java}
> KS=test
> SRCTABLE=bug3_source
> DESTTABLE=bug3_dest
> DATADIR=/usr/local/var/lib/cassandra/data
> TMPDIR=/tmp
> cqlsh -e "CREATE TABLE $KS.$SRCTABLE(a int primary key, b int, c int, d int);"
> cqlsh -e "CREATE TABLE $KS.$DESTTABLE(a int primary key, b int, c int);"
> cqlsh -e "INSERT INTO $KS.$SRCTABLE(a,b,c,d) values(1,2,3,4);"
> nodetool flush $KS $SRCTABLE
> cqlsh -e "ALTER TABLE $KS.$SRCTABLE DROP d;"
> nodetool flush $KS $SRCTABLE
> mkdir -p $TMPDIR/$KS/$DESTTABLE-acdc
> cp $DATADIR/$KS/$SRCTABLE-*/* $TMPDIR/$KS/$DESTTABLE-acdc
> sstableloader -d 127.0.0.1 $TMPDIR/$KS/$DESTTABLE-acdc{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15276) SSTableLoader failes if loading large table

2019-08-13 Thread Roman Danko (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Danko updated CASSANDRA-15276:

Description: 
I'm trying to restore backups created by Opscenter. I restore to fresh cluster, 
so it is necessary to use sstableloader. Backups are created from DSE cluster 
version 6.7.3, restored to 6.7.4. Restore procedure hangs on large 
tables(190GB). 

Cluster is tunned and configured to withstand higher load, than normaly. 
Maintanance features like nodesync and compactions are throttled to minimum. 
All configuration changes are:
 * disabled noderepair
 * nodesync: rate_in_kb: 64 #originaly 8192
 * compaction_throughput_mb_per_sec: 1
 * concurrent_compactors: 1
 * memtable_allocation_type: offheap_objects
 * memtable_space_in_mb: 2048
 * max_heapsize_mb: 32786
 * streaming_keep_alive_period_in_secs: 72000
 * streaming_socket_timeout_in_ms: 8640 #tested, but dse 6.7.4 won't start 
up with this setting
 * dse_cassandra_streaming_connections_per_host: 3 #originaly 1

Logs from system.log 

 
{code:java}
INFO  [STREAM-INIT-/10.132.16.10:38620] 2019-08-12 14:16:51,722  
StreamResultFuture.java:129 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:16:51,741  
StreamResultFuture.java:194 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d 
ID#0] Prepare completed. Receiving 17 files(191.218GiB), sending 0 
files(0.000KiB)
ERROR [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,640  
StreamSession.java:650 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Streaming error occurred on session with peer 10.132.16.10
java.net.SocketException: End-of-stream reached
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:107)
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:318)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
INFO  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,648  
StreamResultFuture.java:208 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Session with /10.132.16.10 is complete
WARN  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,650  
StreamResultFuture.java:235 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Stream failed
INFO  [STREAM-INIT-/10.132.16.8:44336] 2019-08-12 14:21:48,227  
StreamResultFuture.java:122 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7 
ID#0] Creating new streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.8:44336] 2019-08-12 14:21:48,229  
StreamResultFuture.java:129 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.8:44342] 2019-08-12 14:21:48,230  
StreamResultFuture.java:129 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.8:44342] 2019-08-12 14:21:48,243  
StreamResultFuture.java:194 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7 
ID#0] Prepare completed. Receiving 13 files(185.643GiB), sending 0 
files(0.000KiB)
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:24:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 0B (0B/s), 0% was 
inconsistent.
INFO  [STREAM-INIT-/10.132.16.9:48802] 2019-08-12 14:27:12,562  
StreamResultFuture.java:122 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08 
ID#0] Creating new streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.9:48802] 2019-08-12 14:27:12,564  
StreamResultFuture.java:129 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.9:48806] 2019-08-12 14:27:12,565  
StreamResultFuture.java:129 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.9:48806] 2019-08-12 14:27:12,582  
StreamResultFuture.java:194 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08 
ID#0] Prepare completed. Receiving 15 files(188.633GiB), sending 0 
files(0.000KiB)
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:34:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 29MB (50kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:44:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 98MB (168kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:54:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 20MB (35kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 15:04:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 0B (0B/s), 0% was 
inconsistent.
ERROR [STREAM-IN-/10.132.16.8:44342] 2019-08-12 15:14:36,634  
StreamSession.java:650 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7] 
Streaming er

[jira] [Created] (CASSANDRA-15276) SSTableLoader failes if loading large table

2019-08-13 Thread Roman Danko (JIRA)

Roman Danko created CASSANDRA-15276:
---

 Summary: SSTableLoader failes if loading large table
 Key: CASSANDRA-15276
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15276
 Project: Cassandra
  Issue Type: Bug
  Components: Legacy/Tools
Reporter: Roman Danko


I'm trying to restore backups created by Opscenter. I restore to fresh cluster, 
so it is necessary to use sstableloader. Backups are created from DSE cluster 
version 6.7.3, restored to 6.7.4. Restore procedure hangs on large 
tables(190GB). 

Cluster is tunned and configured to withstand higher load, than normaly. 
Maintanance features like nodesync and compactions are throttled to minimum. 
All configuration changes are:
 * 
disabled noderepair
 * 
nodesync: rate_in_kb: 64 #originaly 8192
 * 
compaction_throughput_mb_per_sec: 1
 * 
concurrent_compactors: 1
 * 
memtable_allocation_type: offheap_objects
 * 
memtable_space_in_mb: 2048
 * 
max_heapsize_mb: 32786
 * 
streaming_keep_alive_period_in_secs: 72000
 * 
streaming_socket_timeout_in_ms: 8640 #tested, but dse 6.7.4 won't start up 
with this setting
 * 
dse_cassandra_streaming_connections_per_host: 3 #originaly 1

Logs from system.log 

 
{code:java}
INFO  [STREAM-INIT-/10.132.16.10:38620] 2019-08-12 14:16:51,722  
StreamResultFuture.java:129 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:16:51,741  
StreamResultFuture.java:194 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d 
ID#0] Prepare completed. Receiving 17 files(191.218GiB), sending 0 
files(0.000KiB)
ERROR [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,640  
StreamSession.java:650 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Streaming error occurred on session with peer 10.132.16.10
java.net.SocketException: End-of-stream reached
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:107)
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:318)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
INFO  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,648  
StreamResultFuture.java:208 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Session with /10.132.16.10 is complete
WARN  [STREAM-IN-/10.132.16.10:38620] 2019-08-12 14:18:00,650  
StreamResultFuture.java:235 - [Stream #ccaadb30-bd0b-11e9-b05b-75f1a4f2840d] 
Stream failed
INFO  [STREAM-INIT-/10.132.16.8:44336] 2019-08-12 14:21:48,227  
StreamResultFuture.java:122 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7 
ID#0] Creating new streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.8:44336] 2019-08-12 14:21:48,229  
StreamResultFuture.java:129 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.8:44342] 2019-08-12 14:21:48,230  
StreamResultFuture.java:129 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.8:44342] 2019-08-12 14:21:48,243  
StreamResultFuture.java:194 - [Stream #7dfbc520-bd0c-11e9-9cf8-d3315f3fffd7 
ID#0] Prepare completed. Receiving 13 files(185.643GiB), sending 0 
files(0.000KiB)
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:24:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 0B (0B/s), 0% was 
inconsistent.
INFO  [STREAM-INIT-/10.132.16.9:48802] 2019-08-12 14:27:12,562  
StreamResultFuture.java:122 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08 
ID#0] Creating new streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.9:48802] 2019-08-12 14:27:12,564  
StreamResultFuture.java:129 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-INIT-/10.132.16.9:48806] 2019-08-12 14:27:12,565  
StreamResultFuture.java:129 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08, 
ID#0] Received streaming plan for Bulk Load
INFO  [STREAM-IN-/10.132.16.9:48806] 2019-08-12 14:27:12,582  
StreamResultFuture.java:194 - [Stream #3f501c30-bd0d-11e9-9641-6d81d6b3fb08 
ID#0] Prepare completed. Receiving 15 files(188.633GiB), sending 0 
files(0.000KiB)
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:34:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 29MB (50kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:44:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 98MB (168kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 14:54:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 20MB (35kB/s), 0% 
was inconsistent.
INFO  [NodeSyncMaintenanceTasks:1] 2019-08-12 15:04:42,740  
NodeSyncMaintenanceTasks.java:172 - In last 10m: validated 0B (0B/s),

[jira] [Commented] (CASSANDRA-15193) Add ability to cap max negotiable protocol version

2019-08-13 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906086#comment-16906086
 ] 

Alex Petrov commented on CASSANDRA-15193:
-

The patch looks good to me, +1. 

One improvement we can do in {{Frame}}, when throwing a {{ProtocolException}}, 
is to use {{versionCap}} instead of {{CURRENT_VERSION}} in the error message.

I just have a couple of minor comments:

  * should we add a protocol negotiation test for cqlsh? It's a minor behaviour 
change, and we might want to ensure we preserve it.
  * should we stick to "max native protocol version" or to "max negotiable 
protocol version"? I think both have similar semantics / meaning, but it might 
be not obvious from the first glance.
  * {{maybeUpdateVersion}} can be private
  * in {{maybeUpdateVersion}}, we can avoid doubly-nested {{if}} by checking 
for {{!enforceV3Cap}} and returning if it happens. I wouldn't say it makes a 
huge change. Please feel free to ignore.
  * in {{Server.java}}, we can use {{ProtocolVersionLimit}} interface instead 
of {{ConfiguredLimit}} class, and make {{ConfiguredLimit}} package-private.
  * in {{ProtocolNegotiationTest}}, we can avoid calling 
{{setStaticLimitInConfig}} in {{finally}}, because it's already called in 
{{Before}}.

> Add ability to cap max negotiable protocol version
> --
>
> Key: CASSANDRA-15193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15193
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> 3.0 and native protocol V4 introduced a change to how PagingState is 
> serialized. Unfortunately that can break requests during upgrades: since 
> paging states are opaque, it's possible for a client to receive a paging 
> state encoded as V3 on a 2.1 node, and then send it to a 3.0 node on a V4 
> session. The version of the current session will be used to deserialize the 
> paging state, instead of the actual version used to serialize it, and the 
> request will fail.
> CASSANDRA-15176 solves half of this problem by enabling 3.0 nodes to 
> serialize mis-versioned PagingStates. To address the other side of the issue, 
> 2.1 nodes receiving V4 PagingStates, we can introduce a property to cap the 
> max native protocol version that the 3.0 nodes will negotiate with clients. 
> If we cap this to V3 during upgrades, no V4 connections will be established 
> and so no incompatible PagingStates will be sent to clients.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

2019-08-13 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906036#comment-16906036
 ] 

Benedict commented on CASSANDRA-15263:
--

Sorry [~ferozshaik...@gmail.com], I should have been clearer.  This may have a 
user visible impact, with the affected queries not responding from 3.0 nodes 
contacted by a 2.1 node when this error occurs.  This will have an availability 
impact, which might result in failed queries.  Once fully upgraded the problem 
will resolve, and there will be no lasting impact.  Depending on your cluster 
topology, you might be able to upgrade without any queries failing, only having 
reduced tolerance to node failures until the upgrade completes.

> LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null
> ---
>
> Key: CASSANDRA-15263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15263
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: feroz shaik
>Assignee: Benedict
>Priority: Normal
>  Labels: 2.1.16, 3.11.4
> Attachments: sample.system.log, schema.txt, 
> sstabledump_sal_purge_d03.json, sstablemetadata_sal_purge_d03, 
> stack_trace.txt, system.log, system.log, system.log, system.log, 
> system_latest.log
>
>
> We have  hit a problem today while upgrading from 2.1.16 to 3.11.4.
> we encountered this as soon as the first node started up with 3.11.4 
> The full error stack is attached - [^stack_trace.txt] 
>  
> The below errors continued in the log file as long as the process was up.
> ERROR [Native-Transport-Requests-12] 2019-08-06 03:00:47,135 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-8] 2019-08-06 03:00:48,778 
> ErrorMessage.java:384 - Unexpected exception during request
>  java.lang.NullPointerException: null
>  ERROR [Native-Transport-Requests-13] 2019-08-06 03:00:57,454 
>  
> The nodetool version says 3.11.4 and the no of connections on native por t- 
> 9042 was similar to other nodes. The exceptions were scary that we had to 
> call off the change. Any help and insights to this problem from the community 
> is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15275) Repair failed due to data corruption in one of the production table

2019-08-13 Thread Sumit (JIRA)

Sumit created CASSANDRA-15275:
-

 Summary: Repair failed due to data corruption in one of the 
production table
 Key: CASSANDRA-15275
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15275
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair
Reporter: Sumit
 Attachments: cassandra_repair_logs

Issue description:

We found data corruption in one of table, while performing cluster syncing.

 

Corrective actions:

We have performed 'nodetool repair' to sync up the nodes, post removing the 
corrupted table. There were two, out of four node, in the cluster was down 
before we triggered the 'repair'. However, during the repair all the servers 
were live and active.

 

We are using Cassandra 3.11.1.

Appreciate your help to pinpoint the issue, and prevent recur.
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-15278) User's password for sstableloader tool is visible in ps command output.

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Updated] (CASSANDRA-15080) Paxos tables should allow a configurable chunk length

[jira] [Updated] (CASSANDRA-15080) Paxos tables should allow a configurable chunk length

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Updated] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Updated] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

[jira] [Commented] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

[jira] [Updated] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

[jira] [Created] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Commented] (CASSANDRA-15274) Multiple Corrupt datafiles across entire environment

[jira] [Comment Edited] (CASSANDRA-15260) Add `allocate_tokens_for_dc_rf` yaml option for token allocation

[jira] [Commented] (CASSANDRA-14336) sstableloader fails if sstables contains removed columns

[jira] [Updated] (CASSANDRA-15276) SSTableLoader failes if loading large table

[jira] [Created] (CASSANDRA-15276) SSTableLoader failes if loading large table

[jira] [Commented] (CASSANDRA-15193) Add ability to cap max negotiable protocol version

[jira] [Commented] (CASSANDRA-15263) LegacyLayout RangeTombstoneList throws java.lang.NullPointerException: null

[jira] [Created] (CASSANDRA-15275) Repair failed due to data corruption in one of the production table

24 matches

Site Navigation

Mail list logo

Footer information