I started to respond, then realized I and the other OP posters are not
thinking the same: What is the business case for availability, data
los/reload/recoverability? You all argue for higher availability and damn
the cost. But noone asked "can you lose access, for 20 minutes, to a
portion of the data, 10 times a year, on a 250 node cluster in AWS, if it
is not lost"? Can you lose access 1-2 times a year for the cost of a 500
node cluster holding the same data?

Then we can discuss 32/64g JVM and SSD's.
*.*
*Arthur C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us....*

*Daemeon Reiydelle*
*email: daeme...@gmail.com <daeme...@gmail.com>*
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Thu, Aug 17, 2023 at 1:53 PM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Was assuming reaper did incremental?  That was probably a bad assumption.
>
> nodetool repair -pr
> I know it well now!
>
> :)
>
> -Joe
>
> On 8/17/2023 4:47 PM, Bowen Song via user wrote:
> > I don't have experience with Cassandra on Kubernetes, so I can't
> > comment on that.
> >
> > For repairs, may I interest you with incremental repairs? It will make
> > repairs hell of a lot faster. Of course, occasional full repair is
> > still needed, but that's another story.
> >
> >
> > On 17/08/2023 21:36, Joe Obernberger wrote:
> >> Thank you.  Enjoying this conversation.
> >> Agree on blade servers, where each blade has a small number of SSDs.
> >> Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I
> >> think that might be easier to manage.
> >>
> >> In my current benchmarks, the performance is excellent, but the
> >> repairs are painful.  I come from the Hadoop world where it was all
> >> about large servers with lots of disk.
> >> Relatively small number of tables, but some have a high number of
> >> rows, 10bil + - we use spark to run across all the data.
> >>
> >> -Joe
> >>
> >> On 8/17/2023 12:13 PM, Bowen Song via user wrote:
> >>> The optimal node size largely depends on the table schema and
> >>> read/write pattern. In some cases 500 GB per node is too large, but
> >>> in some other cases 10TB per node works totally fine. It's hard to
> >>> estimate that without benchmarking.
> >>>
> >>> Again, just pointing out the obvious, you did not count the off-heap
> >>> memory and page cache. 1TB of RAM for 24GB heap * 40 instances is
> >>> definitely not enough. You'll most likely need between 1.5 and 2 TB
> >>> memory for 40x 24GB heap nodes. You may be better off with blade
> >>> servers than single server with gigantic memory and disk sizes.
> >>>
> >>>
> >>> On 17/08/2023 15:46, Joe Obernberger wrote:
> >>>> Thanks for this - yeah - duh - forgot about replication in my example!
> >>>> So - is 2TBytes per Cassandra instance advisable?  Better to use
> >>>> more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so
> >>>> assume 80Tbytes per server, you could do:
> >>>> (1024*3)/80 = 39 servers, but you'd have to run 40 instances of
> >>>> Cassandra on each server; maybe 24G of heap per instance, so a
> >>>> server with 1TByte of RAM would work.
> >>>> Is this what folks would do?
> >>>>
> >>>> -Joe
> >>>>
> >>>> On 8/17/2023 9:13 AM, Bowen Song via user wrote:
> >>>>> Just pointing out the obvious, for 1PB of data on nodes with 2TB
> >>>>> disk each, you will need far more than 500 nodes.
> >>>>>
> >>>>> 1, it is unwise to run Cassandra with replication factor 1. It
> >>>>> usually makes sense to use RF=3, so 1PB data will cost 3PB of
> >>>>> storage space, minimal of 1500 such nodes.
> >>>>>
> >>>>> 2, depending on the compaction strategy you use and the write
> >>>>> access pattern, there's a disk space amplification to consider.
> >>>>> For example, with STCS, the disk usage can be many times of the
> >>>>> actual live data size.
> >>>>>
> >>>>> 3, you will need some extra free disk space as temporary space for
> >>>>> running compactions.
> >>>>>
> >>>>> 4, the data is rarely going to be perfectly evenly distributed
> >>>>> among all nodes, and you need to take that into consideration and
> >>>>> size the nodes based on the node with the most data.
> >>>>>
> >>>>> 5, enough of bad news, here's a good one. Compression will save
> >>>>> you (a lot) of disk space!
> >>>>>
> >>>>> With all the above considered, you probably will end up with a lot
> >>>>> more than the 500 nodes you initially thought. Your choice of
> >>>>> compaction strategy and compression ratio can dramatically affect
> >>>>> this calculation.
> >>>>>
> >>>>>
> >>>>> On 16/08/2023 16:33, Joe Obernberger wrote:
> >>>>>> General question on how to configure Cassandra.  Say I have
> >>>>>> 1PByte of data to store.  The general rule of thumb is that each
> >>>>>> node (or at least instance of Cassandra) shouldn't handle more
> >>>>>> than 2TBytes of disk.  That means 500 instances of Cassandra.
> >>>>>>
> >>>>>> Assuming you have very fast persistent storage (such as a NetApp,
> >>>>>> PorterWorx etc.), would using Kubernetes or some orchestration
> >>>>>> layer to handle those nodes be a viable approach? Perhaps the
> >>>>>> worker nodes would have enough RAM to run 4 instances (pods) of
> >>>>>> Cassandra, you would need 125 servers.
> >>>>>> Another approach is to build your servers with 5 (or more) SSD
> >>>>>> devices - one for OS, four for each instance of Cassandra running
> >>>>>> on that server.  Then build some scripts/ansible/puppet that
> >>>>>> would manage Cassandra start/stops, and other maintenance items.
> >>>>>>
> >>>>>> Where I think this runs into problems is with repairs, or
> >>>>>> sstablescrubs that can take days to run on a single instance. How
> >>>>>> is that handled 'in the real world'?  With seed nodes, how many
> >>>>>> would you have in such a configuration?
> >>>>>> Thanks for any thoughts!
> >>>>>>
> >>>>>> -Joe
> >>>>>>
> >>>>>>
> >>>>
> >>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com
>

Reply via email to