Re: SSTableloader questions

2020-11-12 Thread Erick Ramirez
>
> Can the sstableloader job run from outside a Cassandra node? or it has to
> be run from inside Cassandra node.
>

Yes, I'm a fan of running sstableloader on a server that is not one of the
nodes in the cluster. You can maximise the throughput by running multiple
instances of sstableloader loading SSTables from separate
sources/filesystems.

My suspicion is that the failed connection to the nodes is due to the SSL
options so check that you've specified the truststore/keystore correctly.
Cheers!

>


Re: SSTableloader questions

2020-11-12 Thread Jai Bheemsen Rao Dhanwada
Hello Erick,

I have one more question.

Can the sstableloader job run from outside a Cassandra node? or it has to
be run from inside Cassandra node.

When I tried it from the cassandra node it worked but when I try to run it
from outside the cassandra cluster(a standalone machine which doesn't have
any Cassandra process running) using the below command it fails with
streaming error.

*Command:*

> $ /root/apache-cassandra-3.11.6/bin/sstableloader -d ip1,ip2,ip3
> keyspace1/table1 --truststore truststore.p12 --truststore-password
> cassandra --keystore-password cassandra --keystore keystore.p12 -v -u user
> -pw password --ssl-storage-port 7001 -prtcl TLS


*Errors:*

> ERROR 21:48:22,078 [Stream #be7a0de0-2530-11eb-bc56-c7c5c59d560b]
> Streaming error occurred on session with peer 10.66.129.194
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:482) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:474) ~[na:1.8.0_272]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
> ~[na:1.8.0_272]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
> ~[na:1.8.0_272]
> at
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:283)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:270)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:269)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_272]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_272]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]
> progress: total: 100% 0.000KiB/s (avg: 0.000KiB/s)


On Mon, Nov 9, 2020 at 3:08 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks Erick, I will go through the posts and get back if I have any
> questions.
>
> On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez 
> wrote:
>
>> A few months ago, I was asked a similar question so I wrote instructions
>> for this. It depends on whether the clusters are identical or not. The
>> posts define what "identical" means.
>>
>> If the source and target cluster are identical in configuration, follow
>> the procedure here -- https://community.datastax.com/questions/4534/.
>>
>> If the source and target cluster have different configurations, follow
>> the procedure here -- https://community.datastax.com/questions/4477/.
>> Cheers!
>>
>


RE: Cassandra in a container - what to do (sequence of events) to snapshot the storage volume?

2020-11-12 Thread Manu Chadha
Hi Jeff

By “dropping” the periodic time, do you mean making it 0 or commenting commit 
log or changing commit log to batch? Referring to the comments in 
Cassandra.yaml, if I use commitlog_sync as batch with 2ms window, does it mean 
that when a write is done on the db, then it is immediately flushed to disk so 
when a snapshot would be taken the disk should have all the data except for any 
new writes which might come within the 2 ms window? I suppose that DB would be 
much slower now compared to 10s periodic window.

# commitlog_sync may be either "periodic" or "batch."
#
# When in batch mode, Cassandra won't ack writes until the commit log
# has been fsynced to disk.  It will wait
# commitlog_sync_batch_window_in_ms milliseconds between fsyncs.
# This window should be kept short because the writer threads will
# be unable to do extra work while waiting.  (You may need to increase
# concurrent_writes for the same reason.)
#
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2

thanks
Sent from Mail for Windows 10

From: Jeff Jirsa
Sent: 11 November 2020 00:24
To: cassandra
Subject: Re: Cassandra in a container - what to do (sequence of events) to 
snapshot the storage volume?

The commitlog defaults to periodic mode, which writes a sync marker to the file 
and fsync's the data to disk every 10s by default.

`nodetool flush` will force a sync marker / fsync

Data written since the last fsync will not be replayed on startup and will be 
lost.

If you drop the periodic time, the number of writes you lose on restart 
decreases.

Alternatively, you can switch to group/batch commitlog, and it goes to zero, 
but you'll fsync far more frequently.



On Tue, Nov 10, 2020 at 4:19 PM Florin Andrei 
mailto:flo...@andrei.myip.org>> wrote:
That sounds great! Now here's my question:

I do "nodetool flush", then snapshot the storage. Meanwhile, the DB is
under heavy read/write traffic, with lots of writes per second. What's
the worst that could happen, lose a few writes?


On 2020-11-10 15:59, Jeff Jirsa wrote:
> If you want all of the instances to be consistent with each other,
> this is much harder, but if you only want a container that can stop
> and resume, you don't have to do anything more than flush + snapshot
> the storage. The data files on cassandra should ALWAYS be in a state
> where the database will restart, because they have to be to tolerate
> power outage.
>
> On Tue, Nov 10, 2020 at 3:39 PM Florin Andrei 
> mailto:flo...@andrei.myip.org>>
> wrote:
>
>> Running Apache Cassandra 3 in Docker. I need to snapshot the storage
>>
>> volumes. Obviously, I want to be able to re-launch Cassandra from
>> the
>> snapshots later on. So the snapshots need to be in a consistent
>> state.
>>
>> With most DBs, the sequence of events is this:
>>
>> - flush the DB to disk
>> - "freeze" the DB
>> - snapshot the storage
>> - "unfreeze" the DB
>>
>> What does that sequence translate to, in Cassandra parlance?
>>
>> What is the sequence of events that needs to happen when I bring the
>> DB
>> up from an old snapshot? Will there be a restore procedure, or can I
>>
>> just start it as usual?
>>
>> --
>> Florin Andrei
>> https://florin.myip.org/
>>
>>
> -
>> To unsubscribe, e-mail: 
>> user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: 
>> user-h...@cassandra.apache.org

--
Florin Andrei
https://florin.myip.org/

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org