Wanted to add some that I remembered: * https://issues.apache.org/jira/browse/CASSANDRA-12811 - data resurrection, but was marked as normal because was discovered with a test. Should've marked it as critical. * https://issues.apache.org/jira/browse/CASSANDRA-12956 - data loss (commit log isn't replayed on custom 2i exception) * https://issues.apache.org/jira/browse/CASSANDRA-12144 - undeletable/duplicate rows problem; can be considered data resurrection and/or sstable corruption.
On Thu, May 7, 2020 at 6:55 PM Joshua McKenzie <jmcken...@apache.org> wrote: > "ML is plaintext bro" - thanks Mick. ಠ_ಠ > > Since we're stuck in the late 90's, here's some links to a gsheet: > > Defects by month: > https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1584867240 > Defects by component: > https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1946109279 > Defects by type: > https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=385136105 > > On Thu, May 7, 2020 at 12:31 PM Joshua McKenzie <joshua.mcken...@gmail.com> > wrote: > >> Hearing the images got killed by the web server. Trying from gmail (sorry >> for spam). Time to see if it's the apache smtp server or the list culling >> images: >> >> ------------------------------------------- >> I did a little analysis on this data (any defect marked with fixversion >> 4.0 that rose to the level of critical in terms of availability, >> correctness, or corruption/loss) and charted some things the rest of the >> project community might find interesting: >> >> 1: Critical (availability, correctness, corruption/loss) defects fixed >> per month since about 6 months before 3.11.0: >> [image: monthly.png] >> >> 2: Components in which critical defects arose (note: bright red bar == >> sum of 3 dark red): >> [image: Total Defects by Component.png] >> >> 3: Type of defect found and fixed (bright red: cluster down or permaloss, >> dark red: temp corrupt/loss, yellow: incorrect response): >> >> [image: Total Defects by Type.png] >> >> My personal takeaways from this: a ton of great defect fixing work has >> gone into 4.0. I'd love it if we had both code coverage analysis for >> testing on the codebase as well as data to surface where hotspots of >> defects are in the code that might need further testing (caveat: many have >> voiced their skepticism of the value of this type of data in the past in >> this project community, so that's probably another conversation to have on >> another thread) >> >> Hope someone else finds the above interesting if not useful. >> >> -- >> Joshua McKenzie >> >> On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie <jmcken...@apache.org> >> wrote: >> >>> I did a little analysis on this data (any defect marked with fixversion >>> 4.0 that rose to the level of critical in terms of availability, >>> correctness, or corruption/loss) and charted some things the rest of the >>> project community might find interesting: >>> >>> 1: Critical (availability, correctness, corruption/loss) defects fixed >>> per month since about 6 months before 3.11.0: >>> [image: monthly.png] >>> >>> 2: Components in which critical defects arose (note: bright red bar == >>> sum of 3 dark red): >>> [image: Total Defects by Component.png] >>> >>> 3: Type of defect found and fixed (bright red: cluster down or >>> permaloss, dark red: temp corrupt/loss, yellow: incorrect response): >>> >>> [image: Total Defects by Type.png] >>> >>> My personal takeaways from this: a ton of great defect fixing work has >>> gone into 4.0. I'd love it if we had both code coverage analysis for >>> testing on the codebase as well as data to surface where hotspots of >>> defects are in the code that might need further testing (caveat: many have >>> voiced their skepticism of the value of this type of data in the past in >>> this project community, so that's probably another conversation to have on >>> another thread) >>> >>> Hope someone else finds the above interesting if not useful. >>> >>> ~Josh >>> >>> >>> On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi <djo...@apache.org> wrote: >>> >>>> Hi Sankalp, >>>> >>>> Thanks for bringing this up. At the very minimum, I hope we have >>>> regression tests for the specific issues we have fixed. >>>> >>>> I personally think, the project should focus on building a >>>> comprehensive test suite. However, some of these issues can only be >>>> detected at scale. We need users to test* C* in their environment for their >>>> use-cases. Ideally these folks stand up large clusters and tee their >>>> traffic to the new cluster and report issues. >>>> >>>> If we had an automated test suite that everyone can run at a large >>>> scale that would be even better. >>>> >>>> Thanks, >>>> >>>> Dinesh >>>> >>>> >>>> * test != starting C* in a few nodes and looking at logs. >>>> >>>> > On May 6, 2020, at 10:11 AM, sankalp kohli <kohlisank...@gmail.com> >>>> wrote: >>>> > >>>> > Hi, >>>> > I want to share some of the serious issues that were found and >>>> fixed in >>>> > 3.0.x. I have created this list from JIRA to help us identify areas >>>> for >>>> > validating 4.0. This will also give an insight to the dev community. >>>> > >>>> > Let us know if anyone has suggestions on how to better use this data >>>> in >>>> > validating 4.0. Also this list might be missing some issues identified >>>> > early on in 3.0.x or some latest ones. >>>> > >>>> > Link: https://tinyurl.com/30seriousissues >>>> > >>>> > Thanks, >>>> > Sankalp >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> -- alex p