Re: all the nost are not reacheable when running massive deletes

2016-04-04 Thread daemeon reiydelle
Network issues. Could be jumbo frames not consistent or other.

sent from my mobile

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 4, 2016 5:34 AM, "Paco Trujillo"  wrote:

> Hi everyone
>
>
>
> We are having problems with our cluster (7 nodes version 2.0.17) when
> running “massive deletes” on one of the nodes (via cql command line). At
> the beginning everything is fine, but after a while we start getting
> constant NoHostAvailableException using the datastax driver:
>
>
>
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried: /172.31.7.243:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.245:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.246:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, /
> 172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3
> hosts, use getErrors() for more details])
>
>
>
>
>
> All the nodes are running:
>
>
>
> UN  172.31.7.244  152.21 GB  256 14.5%
> 58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
>
> UN  172.31.7.245  168.4 GB   256 14.5%
> bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
>
> UN  172.31.7.246  177.71 GB  256 13.7%
> 8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
>
> UN  172.31.7.247  158.57 GB  256 14.1%
> 94022081-a563-4042-81ab-75ffe4d13194  RAC1
>
> UN  172.31.7.243  176.83 GB  256 14.6%
> 0dda3410-db58-42f2-9351-068bdf68f530  RAC1
>
> UN  172.31.7.233  159 GB 256 13.6%
> 01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
>
> UN  172.31.7.232  166.05 GB  256 15.0%
> 4d009603-faa9-4add-b3a2-fe24ec16a7c1
>
>
>
> but two of them have high cpu load, especially the 232 because I am
> running a lot of deletes using cqlsh in that node.
>
>
>
> I know that deletes generate tombstones, but with 7 nodes in the cluster I
> do not think is normal that all the host are not accesible.
>
>
>
> We have a replication factor of 3 and for the deletes I am not using any
> consistency (so it is using the default ONE).
>
>
>
> I check the nodes which a lot of CPU (near 96%) and th gc activity remains
> on 1.6% (using only 3 GB from the 10 which have assigned). But looking at
> the thread pool stats, the mutation stages pending column grows without
> stop, could be that the problem?
>
>
>
> I cannot find the reason that originates the timeouts. I already have
> increased the timeouts, but It do not think that is a solution because the
> timeouts indicated another type of error. Anyone have a tip to try to
> determine where is the problem?
>
>
>
> Thanks in advance
>


Re: Speeding up "nodetool rebuild"

2016-04-04 Thread Alain RODRIGUEZ
+1, unless it is already logged and I ignore it.

Cassandra being open source feel free to participate by creating an issue
and/or sharing a patch to do whatever you think would improve Cassandra.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-04 17:57 GMT+02:00 Anubhav Kale :

> Thanks. Would it be better to log it clearly or expose as a metric or
> something else that can be easily automated ?
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Friday, April 1, 2016 1:55 AM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Speeding up "nodetool rebuild"
>
>
>
> Hi,
>
>
>
> is there any way to determine that rebuild is complete
>
>
>
> If you ran it from a screen (
> https://www.gnu.org/software/screen/manual/screen.html
> )
> or similar stuff, you should see the command return.
>
>
>
> Also, 'nodetool netstats | grep -v 100%'  will show you remaining stream.
> No stream = rebuild finish (look for possible errors in the logs though...).
>
>
>
> Last tip is you should be able to imagine how big the dataset is going to
> be and checking the on disk size give good progress information too. This
> is not really accurate though.
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> 
>
>
>
> 2016-03-31 23:19 GMT+02:00 Anubhav Kale :
>
> Thanks, is there any way to determine that rebuild is complete.
>
> Based on following line in StorageService.java, it's not logged. So, any
> other way to check besides checking data size through nodetool status ?
>
> finally
> {
> // rebuild is done (successfully or not)
> isRebuilding.set(false);
> }
>
>
> -Original Message-
> From: Eric Evans [mailto:eev...@wikimedia.org]
> Sent: Thursday, March 31, 2016 9:50 AM
> To: user@cassandra.apache.org
> Subject: Re: Speeding up "nodetool rebuild"
>
> On Wed, Mar 30, 2016 at 3:44 PM, Anubhav Kale 
> wrote:
> > Any other ways to make the “rebuild” faster ?
>
> TL;DR add more nodes
>
> If you're encountering a per-stream bottleneck (easy to do if using
> compression), then having a higher node count will translate to higher
> stream concurrency, and greater throughput.
>
> Another thing to keep in mind, the streamthroughput value is *outbound*,
> it doesn't matter what you have that set to on the rebuilding/bootstrapping
> node, it *does* matter what it is set to on the nodes that are sending to
> it (
> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-11303=01%7c01%7cAnubhav.Kale%40microsoft.com%7c27fd8203aa364253b6fc08d3598493a8%7c72f988bf86f141af91ab2d7cd011db47%7c1=rnPHvE12p04CnRXkHgD%2bkllLOqGA4gnlSuM3QsCTpDE%3d
> aims to introduce an inbound tunable though).
>
>
> --
> Eric Evans
> eev...@wikimedia.org
>
>
>


Re: all the nost are not reacheable when running massive deletes

2016-04-04 Thread Alain RODRIGUEZ
Hola Paco,


> the mutation stages pending column grows without stop, could be that the
> problem



> CPU (near 96%)
>

Yes, basically I think you are over using this cluster.

but two of them have high cpu load, especially the 232 because I am running
> a lot of deletes using cqlsh in that node.
>

Solutions would be to run delete at a slower & constant path, against all
the nodes, using a balancing policy or adding capacity if all the nodes are
facing the issue and you can't slow deletes. You should also have a look at
iowait and steal, see if CPU are really used 100% or masking an other
issue. (disk not answering fast enough or hardware / shared instance
issue). I had some noisy neighbours at some point while using Cassandra on
AWS.

 I cannot find the reason that originates the timeouts.


I don't see it that weird while being overusing some/all the nodes.

I already have increased the timeouts, but It do not think that is a
> solution because the timeouts indicated another type of error


Any relevant logs in Cassandra nodes (other than dropped mutations INFO)?

7 nodes version 2.0.17


Note: Be aware that this Cassandra version is quite old and no longer
supported. Plus you might face issues that were solved already. I know that
upgrading is not straight forward, but 2.0 --> 2.1 brings an amazing set of
optimisations and some fixes too. You should try it out :-).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-04-04 14:33 GMT+02:00 Paco Trujillo :

> Hi everyone
>
>
>
> We are having problems with our cluster (7 nodes version 2.0.17) when
> running “massive deletes” on one of the nodes (via cql command line). At
> the beginning everything is fine, but after a while we start getting
> constant NoHostAvailableException using the datastax driver:
>
>
>
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried: /172.31.7.243:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.245:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.246:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, /
> 172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3
> hosts, use getErrors() for more details])
>
>
>
>
>
> All the nodes are running:
>
>
>
> UN  172.31.7.244  152.21 GB  256 14.5%
> 58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
>
> UN  172.31.7.245  168.4 GB   256 14.5%
> bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
>
> UN  172.31.7.246  177.71 GB  256 13.7%
> 8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
>
> UN  172.31.7.247  158.57 GB  256 14.1%
> 94022081-a563-4042-81ab-75ffe4d13194  RAC1
>
> UN  172.31.7.243  176.83 GB  256 14.6%
> 0dda3410-db58-42f2-9351-068bdf68f530  RAC1
>
> UN  172.31.7.233  159 GB 256 13.6%
> 01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
>
> UN  172.31.7.232  166.05 GB  256 15.0%
> 4d009603-faa9-4add-b3a2-fe24ec16a7c1
>
>
>
> but two of them have high cpu load, especially the 232 because I am
> running a lot of deletes using cqlsh in that node.
>
>
>
> I know that deletes generate tombstones, but with 7 nodes in the cluster I
> do not think is normal that all the host are not accesible.
>
>
>
> We have a replication factor of 3 and for the deletes I am not using any
> consistency (so it is using the default ONE).
>
>
>
> I check the nodes which a lot of CPU (near 96%) and th gc activity remains
> on 1.6% (using only 3 GB from the 10 which have assigned). But looking at
> the thread pool stats, the mutation stages pending column grows without
> stop, could be that the problem?
>
>
>
> I cannot find the reason that originates the timeouts. I already have
> increased the timeouts, but It do not think that is a solution because the
> timeouts indicated another type of error. Anyone have a tip to try to
> determine where is the problem?
>
>
>
> Thanks in advance
>


Re: Hardware used

2016-04-04 Thread Alain RODRIGUEZ
It depends on your use case.

I believe nice options in AWS would be (if not using AWS, you will have a
rough idea of what machines to use)


I2 family - Fast (SSD), relatively limited capacity (800 GB+), expensive
D2 family - Relatively fast many HD (consider using raid-0), Very high
storage capacity per node. Cheaper

If using a very small dataset or if you are wanting to go with EBS, use 'C'
or 'R' families.

Basically, it is very hard to answer. It is more complicated and random
than answering you about what vehicle you need to use, without knowing
where you are, where you go, how much time or money you have and want to
use for this trip...

This might help you too:
https://docs.datastax.com/en/cassandra/2.1/cassandra/planning/architecturePlanningHardware_c.html

My questions are primarily on capacity utilized vs allocated., to justify
> hardware proposal requirements.


Regarding this I would say

Use 50 to 80 % max of the available disk space per node (depending on the
compaction strategy - 50% for STCS/DTCS - 80% if using LCS)
CPU I would not go higher than 70 or 80 % but in many clusters this is not
the bottleneck.
Memory is automatically used near to 100% (Heap, off-heap memory directly
used by cassandra + system disk page caching). So the more memory, the
better (set the heap size carefully though).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-04 0:41 GMT+02:00 Mohankalyan, Sathyanarayanan <
s.mohankal...@irco.com>:

> Hi
> Please comment on what sort of hardwares you have configured for Cassandra?
> My questions are primarily on capacity utilized vs allocated., to justify
> hardware proposal requirements.
> Thanks
> Sathya
>
>
> --
>
> The information contained in this message is privileged and intended only
> for the recipients named. If the reader is not a representative of the
> intended recipient, any review, dissemination or copying of this message or
> the information it contains is prohibited. If you have received this
> message in error, please immediately notify the sender, and delete the
> original message and attachments.
>


all the nost are not reacheable when running massive deletes

2016-04-04 Thread Paco Trujillo
Hi everyone

We are having problems with our cluster (7 nodes version 2.0.17) when running 
"massive deletes" on one of the nodes (via cql command line). At the beginning 
everything is fine, but after a while we start getting constant 
NoHostAvailableException using the datastax driver:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (tried: /172.31.7.243:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.245:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.246:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, 
/172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3 hosts, 
use getErrors() for more details])


All the nodes are running:

UN  172.31.7.244  152.21 GB  256 14.5%  
58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
UN  172.31.7.245  168.4 GB   256 14.5%  
bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
UN  172.31.7.246  177.71 GB  256 13.7%  
8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
UN  172.31.7.247  158.57 GB  256 14.1%  
94022081-a563-4042-81ab-75ffe4d13194  RAC1
UN  172.31.7.243  176.83 GB  256 14.6%  
0dda3410-db58-42f2-9351-068bdf68f530  RAC1
UN  172.31.7.233  159 GB 256 13.6%  
01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
UN  172.31.7.232  166.05 GB  256 15.0%  4d009603-faa9-4add-b3a2-fe24ec16a7c1

but two of them have high cpu load, especially the 232 because I am running a 
lot of deletes using cqlsh in that node.

I know that deletes generate tombstones, but with 7 nodes in the cluster I do 
not think is normal that all the host are not accesible.

We have a replication factor of 3 and for the deletes I am not using any 
consistency (so it is using the default ONE).

I check the nodes which a lot of CPU (near 96%) and th gc activity remains on 
1.6% (using only 3 GB from the 10 which have assigned). But looking at the 
thread pool stats, the mutation stages pending column grows without stop, could 
be that the problem?

I cannot find the reason that originates the timeouts. I already have increased 
the timeouts, but It do not think that is a solution because the timeouts 
indicated another type of error. Anyone have a tip to try to determine where is 
the problem?

Thanks in advance


Re: Cassandra sstable to Mysql

2016-04-04 Thread Abhishek Aggarwal
Thanks Bryan,

I don't want to use the sstable as it will increase the load on Cassandra .
I directly want to use the data file created using the CQLSSTablewriter for
migrating to Mysql.

Abhishek Aggarwal

*Senior Software Engineer*
*M*: +91 8861212073 , 8588840304
*T*: 0124 6600600 *EXT*: 12128
ASF Center -A, ASF Center Udyog Vihar Phase IV,
Download Our App
[image: A]

[image:
A]

[image:
W]


On Sun, Apr 3, 2016 at 12:30 AM, Bryan Cheng  wrote:

> You have SSTables and you want to get importable data?
>
> You could use a tool like sstabletojson to get json formatted data
> directly from the sstables; however, unless they've been perfectly
> compacted, there will be duplicates and updates interleaved that will be
> properly ordered.
>
> If this is a full dump from a single machine that has a complete dataset
> (eg. with RF=n) you could spin up a new machine with just itself as a seed
> but all other configuration intact. If this new machine gets an identical
> copy of the cassandra data directory, it will start itself as a clone of
> the machine the dump came off of, but walled off from the previous cluster.
> (I have tested this with vnodes, but I believe it is more involved without
> vnodes). Then you can use CQL COPY or an application to bulk load into
> MySQL.
>
> On Fri, Apr 1, 2016 at 2:55 AM, Abhishek Aggarwal <
> abhishek.aggarwa...@snapdeal.com> wrote:
>
>> Hi ,
>>
>> We have the data dump into directory  taken from Mysql using the
>> CQLSSTableWriter.
>>
>> Our requirement is to read this data and load it into MySql. We don't
>> want to use Cassandra as it will lead to read traffic and this operation is
>> just for some validation .
>>
>> Can anyone help us with the solution.
>>
>> Abhishek Aggarwal
>>
>> *Senior Software Engineer*
>> *M*: +91 8861212073 , 8588840304
>> *T*: 0124 6600600 *EXT*: 12128
>> ASF Center -A, ASF Center Udyog Vihar Phase IV,
>> Download Our App
>> [image: A]
>> 
>>  [image:
>> A]
>> 
>>  [image:
>> W]
>> 
>>
>
>