Re: Urgent Problem - Disk full

2018-04-04 Thread Jürgen Albersdorfer
Thank You All for your hints on this. 
I added another folder on the commitlog Disk to compensate immediate urgency.

Next Step will be to reorganize and deduplicate the data into a 2nd table. 
Then drop the original one, clean the snapshot, consolidate back all data Files 
away from the commitlog Disk and Setup Monitoring ;)

Thank You, regards
Jürgen 

RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman
Agreed that you tend to add capacity to nodes or add nodes once you know you 
have no unneeded data in the cluster.

 

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Wednesday, April 04, 2018 9:10 AM
To: user cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

 

Hi,

 

When the disks are full, here are the options I can think of depending on the 
situation and how 'full' the disk really is:

- Add capacity - Add a disk, use JBOD adding a second location folder for the 
sstables and move some of them around then restart Cassandra. Or add a new node.
- Reduce disk space used. Some options come to my mind to reduce space used:


1 - Clean tombstones if any (use sstablemetadata for example to check the 
number of tombstones). If you have some not being purged, my first guess would 
be to set 'unchecked_tombstone_compaction' to 'true' at the node level. Yet be 
aware that this will trigger some compactions, that before freeing space, start 
by taking some more temporary!

 

If remaining space is really low on one node, you can control to compact only 
on the sstables having the higher tombstone ratio after you made the change 
above and that fit in the disk space you have left. It can even be scripted. It 
worked for me in the past with disk 100% full. If you do so, you might have to 
disable/reenable automatic compactions at key moments as well

 

2 -  If you added nodes recently to the data center you can consider running a 
'nodetool cleanup', but here again, it will start by using more space for 
temporary sstables, and might have no positive impacts if the node only own 
data for its token ranges.

 

3 - Another common way to easily claim space is to clear snapshots that are not 
needed and might have been forgotten or taken by Cassandra: 'nodetool 
clearsnapshot'. This has no other risk than removing a useful backup.


4 - Delete data from this table or another table (effectively), directly 
removing the sstables indeed - as you use TWCS. If you don't need the data 
anyway.

 

5 - Truncate one of those other tables we tend to have that are written 'just 
in case' and actually never used and never read for months. It has been a 
powerful way out of this situation for me in the past too :). I would say: be 
sure that the disk space is used properly.

 

 

There is zero reason to believe a full repair would make this better and a lot 
of reason to believe it’ll make it worse

 

I second that too, just in case. Really, do not run a repair. The only thing it 
could do is bring more data to a node that really don't need it for now.

 

Finally, when this is behind you, the disk size is something you could consider 
monitoring as it is way easier to fix it when the disk is not completely full 
and it can be fixed preemptively. Usually, 50 to 20% of free disk is 
recommended depending on your use case.

 

C*heers,

---

Alain Rodriguez - @arodream - al...@thelastpickle.com

France / Spain

 

The Last Pickle - Apache Cassandra Consulting

http://www.thelastpickle.com  

 

2018-04-04 15:34 GMT+01:00 Kenneth Brotman :

There's also the old snapshots to remove that could be a significant amount of 
memory.


-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID]
Sent: Wednesday, April 04, 2018 7:28 AM
To: user@cassandra.apache.org
Subject: RE: Urgent Problem - Disk full

Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster,
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS.

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



--
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
>  wrote:
>
> Hi,
>
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
>
> From Application Standpoint it would be safe to loose some of the oldest Data.
>
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get 

Re: datastax cassandra minimum hardware recommendation

2018-04-04 Thread Ben Bromhead
Also, DS charge by core ;)

Anecdotally, we run a large fleet of Apache C* nodes on AWS with a good
portion of supported instances that run with 16GB of RAM and 4 cores, which
is fine for those workloads.

On Wed, Apr 4, 2018 at 11:08 AM sujeet jog  wrote:

> Thanks Alain
>
> On Wed, Apr 4, 2018 at 3:12 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hello.
>>
>> For questions to Datastax, I recommend you to ask them directly. I often
>> had a quick answer and they probably can answer this better than we do :).
>>
>> Apache Cassandra (and probably DSE-Cassandra) can work with 8 CPU (and
>> less!). I would not go much lower though. I believe the memory amount and
>> good disk throughputs are more important. It also depends on the
>> workload type and intensity, encryption, compression etc.
>>
>> 8 CPUs is probably just fine if well tuned, and here in the mailing list,
>> we 'support' any fancy configuration settings, but with no guarantee on the
>> response time and without taking the responsibility for your cluster :).
>>
>> It reminds me of my own start with Apache Cassandra. I started with
>> t1.micro back then on AWS, and people were still helping me here, of course
>> after a couple of jokes such as 'you should rather try to play a
>> PlayStation 4 game in your Gameboy', that's fair enough I guess :). Well it
>> was working in prod and I learned how to tune Apache Cassandra, I had no
>> other options to have this working.
>>
>> Having more CPU probably improves resiliency to some problems and reduces
>> the importance of having a cluster perfectly tuned.
>>
>> Benchmark your workload, test it. This would be the most accurate answer
>> here given the details we have.
>>
>> C*heers,
>> ---
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2018-04-04 9:44 GMT+01:00 sujeet jog :
>>
>>> the datastax site has a hardware recommendation of 16CPU / 32G RAM for
>>> DSE Enterprise,  Any idea what is the minimum hardware recommendation
>>> supported, can each node be 8CPU and the support covering it ?..
>>>
>>
>>
> --
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer


Re: Urgent Problem - Disk full

2018-04-04 Thread Alain RODRIGUEZ
Hi,

When the disks are full, here are the options I can think of depending on
the situation and how 'full' the disk really is:

- Add capacity - Add a disk, use JBOD adding a second location folder for
the sstables and move some of them around then restart Cassandra. Or add a
new node.
- Reduce disk space used. Some options come to my mind to reduce space used:

1 - Clean tombstones *if any* (use sstablemetadata for example to check the
number of tombstones). If you have some not being purged, my first guess
would be to set 'unchecked_tombstone_compaction' to 'true' at the node
level. Yet be aware that this will trigger some compactions, that before
freeing space, start by taking some more temporary!

If remaining space is really low on one node, you can control to compact
only on the sstables having the higher tombstone ratio after you made the
change above and that fit in the disk space you have left. It can even be
scripted. It worked for me in the past with disk 100% full. If you do so,
you might have to disable/reenable automatic compactions at key moments as
well

2 -  If you added nodes recently to the data center you can consider
running a 'nodetool cleanup', but here again, it will start by using more
space for temporary sstables, and might have no positive impacts if the
node only own data for its token ranges.

3 - Another common way to easily claim space is to clear snapshots that are
not needed and might have been forgotten or taken by Cassandra:
'nodetool clearsnapshot'. This has no other risk than removing a useful
backup.

4 - Delete data from this table or another table (effectively),
directly removing the sstables indeed - as you use TWCS. If you don't need
the data anyway.

5 - Truncate one of those other tables we tend to have that are written
'just in case' and actually never used and never read for months. It has
been a powerful way out of this situation for me in the past too :). I
would say: be sure that the disk space is used properly.


There is zero reason to believe a full repair would make this better and a
> lot of reason to believe it’ll make it worse
>

I second that too, just in case. Really, do not run a repair. The only
thing it could do is bring more data to a node that really don't need it
for now.

Finally, when this is behind you, the disk size is something you could
consider monitoring as it is way easier to fix it when the disk is not
completely full and it can be fixed preemptively. Usually, 50 to 20% of
free disk is recommended depending on your use case.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-04-04 15:34 GMT+01:00 Kenneth Brotman :

> There's also the old snapshots to remove that could be a significant
> amount of memory.
>
> -Original Message-
> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID]
> Sent: Wednesday, April 04, 2018 7:28 AM
> To: user@cassandra.apache.org
> Subject: RE: Urgent Problem - Disk full
>
> Jeff,
>
> Just wondering: why wouldn't the answer be to:
> 1. move anything you want to archive to colder storage off the
> cluster,
> 2. nodetool cleanup
> 3. snapshot
> 4. use delete command to remove archived data.
>
> Kenneth Brotman
>
> -Original Message-
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Wednesday, April 04, 2018 7:10 AM
> To: user@cassandra.apache.org
> Subject: Re: Urgent Problem - Disk full
>
> Yes, this works in TWCS.
>
> Note though that if you have tombstone compaction subproperties set, there
> may be sstables with newer filesystem timestamps that actually hold older
> Cassandra data, in which case sstablemetadata can help finding the sstables
> with truly old timestamps
>
> Also if you’ve expanded the cluster over time and you see an imbalance of
> disk usage on the oldest hosts, “nodetool cleanup” will likely free up some
> of that data
>
>
>
> --
> Jeff Jirsa
>
>
> > On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer <
> juergen.albersdor...@zweiradteile.net> wrote:
> >
> > Hi,
> >
> > I have an urgent Problem. - I will run out of disk space in near future.
> > Largest Table is a Time-Series Table with TimeWindowCompactionStrategy
> (TWCS) and default_time_to_live = 0
> > Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> > We have grown the Cluster over time, so SSTable files have different
> Dates on different Nodes.
> >
> > From Application Standpoint it would be safe to loose some of the oldest
> Data.
> >
> > Is it safe to delete some of the oldest SSTable Files, which will no
> longer get touched by TWCS Compaction any more, while Node is clean
> Shutdown? - And doing so for one Node after another?
> >
> > Or maybe there is a different way to free some disk space? - Any
> suggestions?
> >
> > best regards
> > Jürgen Albersdorfer
> >
> > 

Re: Text or....

2018-04-04 Thread Jon Haddad
Depending on the compression rate, I think it would generate less garbage on 
the Cassandra side if you compressed it client side.  Something to test out.


> On Apr 4, 2018, at 7:19 AM, Jeff Jirsa  wrote:
> 
> Compressing server side and validating checksums is hugely important in the 
> more frequently used versions of cassandra - so since you probably want to 
> run compression on the server anyway, I’m not sure why you’d compress it 
> twice 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Apr 4, 2018, at 6:23 AM, DuyHai Doan  > wrote:
> 
>> Compressing client-side is better because it will save:
>> 
>> 1) a lot of bandwidth on the network
>> 2) a lot of Cassandra CPU because no decompression server-side
>> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
>> small (text data compress very well) compared to the raw size
>> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>> > wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges > > wrote:
>> Hi All, 
>> 
>> A certain application is writing ~55,000 characters for a single row. Most 
>> of these characters are entered to one column with "text" data type. 
>> 
>> This looks insanely large for one row. 
>> Would you suggest to change the data type from "text" to BLOB or any other 
>> option that might fit this scenario?
>> 
>> Thanks!
>> 
>> 



Re: datastax cassandra minimum hardware recommendation

2018-04-04 Thread sujeet jog
Thanks Alain

On Wed, Apr 4, 2018 at 3:12 PM, Alain RODRIGUEZ  wrote:

> Hello.
>
> For questions to Datastax, I recommend you to ask them directly. I often
> had a quick answer and they probably can answer this better than we do :).
>
> Apache Cassandra (and probably DSE-Cassandra) can work with 8 CPU (and
> less!). I would not go much lower though. I believe the memory amount and
> good disk throughputs are more important. It also depends on the workload
> type and intensity, encryption, compression etc.
>
> 8 CPUs is probably just fine if well tuned, and here in the mailing list,
> we 'support' any fancy configuration settings, but with no guarantee on the
> response time and without taking the responsibility for your cluster :).
>
> It reminds me of my own start with Apache Cassandra. I started with
> t1.micro back then on AWS, and people were still helping me here, of course
> after a couple of jokes such as 'you should rather try to play a
> PlayStation 4 game in your Gameboy', that's fair enough I guess :). Well it
> was working in prod and I learned how to tune Apache Cassandra, I had no
> other options to have this working.
>
> Having more CPU probably improves resiliency to some problems and reduces
> the importance of having a cluster perfectly tuned.
>
> Benchmark your workload, test it. This would be the most accurate answer
> here given the details we have.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2018-04-04 9:44 GMT+01:00 sujeet jog :
>
>> the datastax site has a hardware recommendation of 16CPU / 32G RAM for
>> DSE Enterprise,  Any idea what is the minimum hardware recommendation
>> supported, can each node be 8CPU and the support covering it ?..
>>
>
>


RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman
There's also the old snapshots to remove that could be a significant amount of 
memory.

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, April 04, 2018 7:28 AM
To: user@cassandra.apache.org
Subject: RE: Urgent Problem - Disk full

Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster, 
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS. 

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
>  wrote:
> 
> Hi,
> 
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
> 
> From Application Standpoint it would be safe to loose some of the oldest Data.
> 
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
> 
> Or maybe there is a different way to free some disk space? - Any suggestions?
> 
> best regards
> Jürgen Albersdorfer
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman
Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster, 
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS. 

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
>  wrote:
> 
> Hi,
> 
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
> 
> From Application Standpoint it would be safe to loose some of the oldest Data.
> 
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
> 
> Or maybe there is a different way to free some disk space? - Any suggestions?
> 
> best regards
> Jürgen Albersdorfer
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Text or....

2018-04-04 Thread Jeff Jirsa
Compressing server side and validating checksums is hugely important in the 
more frequently used versions of cassandra - so since you probably want to run 
compression on the server anyway, I’m not sure why you’d compress it twice 

-- 
Jeff Jirsa


> On Apr 4, 2018, at 6:23 AM, DuyHai Doan  wrote:
> 
> Compressing client-side is better because it will save:
> 
> 1) a lot of bandwidth on the network
> 2) a lot of Cassandra CPU because no decompression server-side
> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
> small (text data compress very well) compared to the raw size
> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>>  wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges  
>>> wrote:
>>> Hi All, 
>>> 
>>> A certain application is writing ~55,000 characters for a single row. Most 
>>> of these characters are entered to one column with "text" data type. 
>>> 
>>> This looks insanely large for one row. 
>>> Would you suggest to change the data type from "text" to BLOB or any other 
>>> option that might fit this scenario?
>>> 
>>> Thanks!
>> 
> 


Re: Urgent Problem - Disk full

2018-04-04 Thread Jeff Jirsa
There is zero reason to believe a full repair would make this better and a lot 
of reason to believe it’ll make it worse

For casual observers following along at home, this is probably not the answer 
you’re looking for.

-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:37 AM, Rahul Singh  wrote:
> 
> Nothing a full repair won’t be able to fix. 
> 
>> On Apr 4, 2018, 7:32 AM -0400, Jürgen Albersdorfer 
>> , wrote:
>> Hi,
>> 
>> I have an urgent Problem. - I will run out of disk space in near future.
>> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy 
>> (TWCS) and default_time_to_live = 0
>> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
>> We have grown the Cluster over time, so SSTable files have different Dates 
>> on different Nodes.
>> 
>> From Application Standpoint it would be safe to loose some of the oldest 
>> Data.
>> 
>> Is it safe to delete some of the oldest SSTable Files, which will no longer 
>> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
>> doing so for one Node after another?
>> 
>> Or maybe there is a different way to free some disk space? - Any suggestions?
>> 
>> best regards
>> Jürgen Albersdorfer
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Urgent Problem - Disk full

2018-04-04 Thread Jeff Jirsa
Yes, this works in TWCS. 

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
>  wrote:
> 
> Hi,
> 
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
> 
> From Application Standpoint it would be safe to loose some of the oldest Data.
> 
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
> 
> Or maybe there is a different way to free some disk space? - Any suggestions?
> 
> best regards
> Jürgen Albersdorfer
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Text or....

2018-04-04 Thread DuyHai Doan
Compressing client-side is better because it will save:

1) a lot of bandwidth on the network
2) a lot of Cassandra CPU because no decompression server-side
3) a lot of Cassandra HEAP because the compressed blob should be relatively
small (text data compress very well) compared to the raw size

On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> Hi,
>
> We use a pseudo file-system table where the chunks are blobs of 64 KB and
> we never had any performance issue.
>
> Primary-key structure is ((file-uuid), chunck-id).
>
> Jero
>
> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges 
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>


Re: Text or....

2018-04-04 Thread Jeronimo de A. Barros
Hi,

We use a pseudo file-system table where the chunks are blobs of 64 KB and
we never had any performance issue.

Primary-key structure is ((file-uuid), chunck-id).

Jero

On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges 
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>


Re: Text or....

2018-04-04 Thread Nicolas Guyomar
Hi Shalom,

You might want to compress on application side before inserting in
Cassandra, using the algorithm on your choice, based on compression ratio
and speed that you found acceptable with your use case


On 4 April 2018 at 14:38, shalom sagges  wrote:

> Thanks DuyHai!
>
> I'm using the default table compression. Is there anything else I should
> look into?
> Regarding the table compression, I understand that for write heavy tables,
> it's best to keep the default and not compress it further. Have I
> understood correctly?
>
> On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan  wrote:
>
>> Compress it and stores it as a blob.
>> Unless you ever need to index it but I guess even with SASI indexing a so
>> huge text block is not a good idea
>>
>> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
>> wrote:
>>
>>> Hi All,
>>>
>>> A certain application is writing ~55,000 characters for a single row.
>>> Most of these characters are entered to one column with "text" data type.
>>>
>>> This looks insanely large for one row.
>>> Would you suggest to change the data type from "text" to BLOB or any
>>> other option that might fit this scenario?
>>>
>>> Thanks!
>>>
>>
>>
>


Re: Text or....

2018-04-04 Thread shalom sagges
Thanks DuyHai!

I'm using the default table compression. Is there anything else I should
look into?
Regarding the table compression, I understand that for write heavy tables,
it's best to keep the default and not compress it further. Have I
understood correctly?

On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan  wrote:

> Compress it and stores it as a blob.
> Unless you ever need to index it but I guess even with SASI indexing a so
> huge text block is not a good idea
>
> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>


RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman
Assuming the data model is good and there haven’t been any sudden jumps in 
memory use, it seems like the normal thing to do is archive some of the old 
time series data that you don’t care about.

 

Kenneth Brotman

 

From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com] 
Sent: Wednesday, April 04, 2018 4:38 AM
To: user@cassandra.apache.org; user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

 

Nothing a full repair won’t be able to fix. 


On Apr 4, 2018, 7:32 AM -0400, Jürgen Albersdorfer 
, wrote:



Hi,

I have an urgent Problem. - I will run out of disk space in near future.
Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
and default_time_to_live = 0
Keyspace Replication Factor RF=3. I run C* Version 3.11.2
We have grown the Cluster over time, so SSTable files have different Dates on 
different Nodes.

>From Application Standpoint it would be safe to loose some of the oldest Data.

Is it safe to delete some of the oldest SSTable Files, which will no longer get 
touched by TWCS Compaction any more, while Node is clean Shutdown? - And doing 
so for one Node after another?

Or maybe there is a different way to free some disk space? - Any suggestions?

best regards
Jürgen Albersdorfer

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Text or....

2018-04-04 Thread DuyHai Doan
Compress it and stores it as a blob.
Unless you ever need to index it but I guess even with SASI indexing a so
huge text block is not a good idea

On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>


Text or....

2018-04-04 Thread shalom sagges
Hi All,

A certain application is writing ~55,000 characters for a single row. Most
of these characters are entered to one column with "text" data type.

This looks insanely large for one row.
Would you suggest to change the data type from "text" to BLOB or any other
option that might fit this scenario?

Thanks!


Re: Urgent Problem - Disk full

2018-04-04 Thread Rahul Singh
Nothing a full repair won’t be able to fix.

On Apr 4, 2018, 7:32 AM -0400, Jürgen Albersdorfer 
, wrote:
> Hi,
>
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
>
> From Application Standpoint it would be safe to loose some of the oldest Data.
>
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
>
> Or maybe there is a different way to free some disk space? - Any suggestions?
>
> best regards
> Jürgen Albersdorfer
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org


Urgent Problem - Disk full

2018-04-04 Thread Jürgen Albersdorfer
Hi,

I have an urgent Problem. - I will run out of disk space in near future.
Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
and default_time_to_live = 0
Keyspace Replication Factor RF=3. I run C* Version 3.11.2
We have grown the Cluster over time, so SSTable files have different Dates on 
different Nodes.

From Application Standpoint it would be safe to loose some of the oldest Data.

Is it safe to delete some of the oldest SSTable Files, which will no longer get 
touched by TWCS Compaction any more, while Node is clean Shutdown? - And doing 
so for one Node after another?

Or maybe there is a different way to free some disk space? - Any suggestions?

best regards
Jürgen Albersdorfer

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: datastax cassandra minimum hardware recommendation

2018-04-04 Thread Rahul Singh
Agree with Alain.

Remember that DSE is not Cassandra. It includes Cassandra, SolR, Spark, and 
Graph. So if you run all of some , it’s more than just Cassandra.

OpsCenter is another thing altogether.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 4, 2018, 5:42 AM -0400, Alain RODRIGUEZ , wrote:
> Hello.
>
> For questions to Datastax, I recommend you to ask them directly. I often had 
> a quick answer and they probably can answer this better than we do :).
>
> Apache Cassandra (and probably DSE-Cassandra) can work with 8 CPU (and 
> less!). I would not go much lower though. I believe the memory amount and 
> good disk throughputs are more important. It also depends on the workload 
> type and intensity, encryption, compression etc.
>
> 8 CPUs is probably just fine if well tuned, and here in the mailing list, we 
> 'support' any fancy configuration settings, but with no guarantee on the 
> response time and without taking the responsibility for your cluster :).
>
> It reminds me of my own start with Apache Cassandra. I started with t1.micro 
> back then on AWS, and people were still helping me here, of course after a 
> couple of jokes such as 'you should rather try to play a PlayStation 4 game 
> in your Gameboy', that's fair enough I guess :). Well it was working in prod 
> and I learned how to tune Apache Cassandra, I had no other options to have 
> this working.
>
> Having more CPU probably improves resiliency to some problems and reduces the 
> importance of having a cluster perfectly tuned.
>
> Benchmark your workload, test it. This would be the most accurate answer here 
> given the details we have.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> > 2018-04-04 9:44 GMT+01:00 sujeet jog :
> > > the datastax site has a hardware recommendation of 16CPU / 32G RAM for 
> > > DSE Enterprise,  Any idea what is the minimum hardware recommendation 
> > > supported, can each node be 8CPU and the support covering it ?..
>


Re: datastax cassandra minimum hardware recommendation

2018-04-04 Thread Alain RODRIGUEZ
Hello.

For questions to Datastax, I recommend you to ask them directly. I often
had a quick answer and they probably can answer this better than we do :).

Apache Cassandra (and probably DSE-Cassandra) can work with 8 CPU (and
less!). I would not go much lower though. I believe the memory amount and
good disk throughputs are more important. It also depends on the workload
type and intensity, encryption, compression etc.

8 CPUs is probably just fine if well tuned, and here in the mailing list,
we 'support' any fancy configuration settings, but with no guarantee on the
response time and without taking the responsibility for your cluster :).

It reminds me of my own start with Apache Cassandra. I started with
t1.micro back then on AWS, and people were still helping me here, of course
after a couple of jokes such as 'you should rather try to play a
PlayStation 4 game in your Gameboy', that's fair enough I guess :). Well it
was working in prod and I learned how to tune Apache Cassandra, I had no
other options to have this working.

Having more CPU probably improves resiliency to some problems and reduces
the importance of having a cluster perfectly tuned.

Benchmark your workload, test it. This would be the most accurate answer
here given the details we have.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-04-04 9:44 GMT+01:00 sujeet jog :

> the datastax site has a hardware recommendation of 16CPU / 32G RAM for DSE
> Enterprise,  Any idea what is the minimum hardware recommendation
> supported, can each node be 8CPU and the support covering it ?..
>


datastax cassandra minimum hardware recommendation

2018-04-04 Thread sujeet jog
the datastax site has a hardware recommendation of 16CPU / 32G RAM for DSE
Enterprise,  Any idea what is the minimum hardware recommendation
supported, can each node be 8CPU and the support covering it ?..