Re: [DISCUSS] Cassandra and future Java

2018-05-30 Thread kurt greaves
I suppose so, I guess just highlighting that we shouldn't bother dedicating
much testing resources to 4.0 on 11.

I will note however that if we wanted to start making changes to support 11
it'd have to be (as Robert alluded to) based off an incomplete 11 as I'm
assuming we'd make no further changes after 4.0 is frozen and instead
they'd go to 4.1.

On Wed., 30 May 2018, 20:40 Stefan Podkowinski,  wrote:

> That's probably not far off what Robert suggested:
>
> "The idea here is to default to Java 8, but the code also runs on 11"
>
> "Initially, only the combination of C* 4.0 + Java 8 would be labeled as
> "stable" and the combination of C* 4.0 + Java 11 as "experimental"."
>
> If Robert wants to go ahead making the code "also run Java 11", while we
> keep testing for and officially supporting Java 8, then I can't really
> think of any argument against that, as long as we don't end up with tons
> of version based code toggles, or too risky changes regarding stability
> in general. We would still release 4.0 for Java 8 and afterwards
> officially switch to 11 in 4.1, based on the work done in #9608 and
> already merged into 4.0. Downloads and packages for 4.0 would be
> released for Java 8, while running 4.0 with Java 11 would produce a
> warning message indicating that it's experimental.
>
>
> On 30.05.2018 08:54, kurt greaves wrote:
> > So for anyone that missed it, Java 11 will be released in September 2018.
> >
> > I'd prefer we target one Java release only. This is purely because we
> don't
> > have the capacity or capability to test both releases. We hardly do a
> good
> > enough job as it is of testing and lumping another JVM into the mix is
> just
> > going to complicate things a lot and all the effort we expend into
> testing
> > both releases is probably better off just spent focusing on one.
> >
> > At this point I don't think there is *much* value in supporting 11 for
> 4.0,
> > seeing as we won't be able to fully utilise features in 11 as our feature
> > freeze for 4.0 will occur before 11 is released. There is obviously the
> > support problem but adoptOpenJDK are claiming they'll support Java 8
> until
> > September 2022 (https://adoptopenjdk.net/support.html) - which on top of
> > all the existing releases is probably good enough for us, and 2022 is far
> > enough away that hopefully 4.0 will be EOL'd by then. I don't think it's
> a
> > big risk that support for Java 8 will stop anytime soon, it's pretty
> > widespread and it's going to take people a *long* time to get off 8.
> >
> > It would make much more sense to me to support 11 in 4.1 that way we can
> > actually utilise any benefits of 11.
> >
> > On 29 May 2018 at 12:22, Robert Stupp  wrote:
> >
> >> Ideally, CI would run against both Java 8 and 11. I’ve no clue about
> b.a.o
> >> though.
> >>
> >> There will definitely be a log of smaller issues - both for OpenJDK 8
> and
> >> 11.
> >> I think, it’s sufficient to deal with the Linux distros' (RH/deb)
> openjdk
> >> dependencies - just making sure, that we’re using the right Java
> version -
> >> and not let the package manger just pull the newest available.
> >> The version-string from adoptopenjdk for example is one of these “minor
> >> issues"...
> >>
> >> —
> >> Robert Stupp
> >> @snazy
> >>
> >> On 28. May 2018, at 15:46, Stefan Podkowinski  wrote:
> >>
> >> The main issue that I see, for supporting both Java 8 + 11, is testing.
> >> We should first decide how this would effect builds.apache.org, or how
> >> we're going to do CI testing in general for that situation.
> >>
> >> There are probably also smaller issues that we're not aware of yet, such
> >> as which Java dependency to use for our deb and rpm packages,
> >> differences in Java distributions (Oracle, AdoptOpenJDK, Redhat,..) and
> >> so on. I'd expect we could deal with this on the Java side, but the
> >> infra, scripting and testing implications give me a greater headache
> >> when thinking of it.
> >>
> >>
> >> On 25.05.2018 15:33, J. D. Jordan wrote:
> >>
> >> +1 for “Option 3: both 8 + 11” it shouldn’t be too hard to maintain code
> >> wise, and leaves people’s options open.
> >>
> >> -Jeremiah
> >>
> >> On May 25, 2018, at 6:31 AM, Robert Stupp  wrote:
> >>
> >> I'd like to bring up the C*/Java discussion again. It's been a while
> since
> >> we've discussed this.
> >>
> >> To me it sounds like there's still the question about which version(s)
> of
> >> Java we want to support beginning with C* 4.0.
> >>
> >> I assume, that it's legit (and probably very necessary) to assume that
> >> OpenJDK is now (i.e. after Java 6) considered as "production ready" for
> C*.
> >> The public (and legal and free) availability of Oracle's Java 8 will
> end in
> >> January 2019 (unless you're using it privately on your desktop). Java 9
> and
> >> 10 are not a thing, as both will be EOL when the C* 4.0 branch is about
> to
> >> be cut. The most recent available Java version will be 11, which is
> meant
> >> to be publicly available f

Re: [DISCUSS] Cassandra and future Java

2018-05-30 Thread Stefan Podkowinski
That's probably not far off what Robert suggested:

"The idea here is to default to Java 8, but the code also runs on 11"

"Initially, only the combination of C* 4.0 + Java 8 would be labeled as
"stable" and the combination of C* 4.0 + Java 11 as "experimental"."

If Robert wants to go ahead making the code "also run Java 11", while we
keep testing for and officially supporting Java 8, then I can't really
think of any argument against that, as long as we don't end up with tons
of version based code toggles, or too risky changes regarding stability
in general. We would still release 4.0 for Java 8 and afterwards
officially switch to 11 in 4.1, based on the work done in #9608 and
already merged into 4.0. Downloads and packages for 4.0 would be
released for Java 8, while running 4.0 with Java 11 would produce a
warning message indicating that it's experimental.


On 30.05.2018 08:54, kurt greaves wrote:
> So for anyone that missed it, Java 11 will be released in September 2018.
> 
> I'd prefer we target one Java release only. This is purely because we don't
> have the capacity or capability to test both releases. We hardly do a good
> enough job as it is of testing and lumping another JVM into the mix is just
> going to complicate things a lot and all the effort we expend into testing
> both releases is probably better off just spent focusing on one.
> 
> At this point I don't think there is *much* value in supporting 11 for 4.0,
> seeing as we won't be able to fully utilise features in 11 as our feature
> freeze for 4.0 will occur before 11 is released. There is obviously the
> support problem but adoptOpenJDK are claiming they'll support Java 8 until
> September 2022 (https://adoptopenjdk.net/support.html) - which on top of
> all the existing releases is probably good enough for us, and 2022 is far
> enough away that hopefully 4.0 will be EOL'd by then. I don't think it's a
> big risk that support for Java 8 will stop anytime soon, it's pretty
> widespread and it's going to take people a *long* time to get off 8.
> 
> It would make much more sense to me to support 11 in 4.1 that way we can
> actually utilise any benefits of 11.
> 
> On 29 May 2018 at 12:22, Robert Stupp  wrote:
> 
>> Ideally, CI would run against both Java 8 and 11. I’ve no clue about b.a.o
>> though.
>>
>> There will definitely be a log of smaller issues - both for OpenJDK 8 and
>> 11.
>> I think, it’s sufficient to deal with the Linux distros' (RH/deb) openjdk
>> dependencies - just making sure, that we’re using the right Java version -
>> and not let the package manger just pull the newest available.
>> The version-string from adoptopenjdk for example is one of these “minor
>> issues"...
>>
>> —
>> Robert Stupp
>> @snazy
>>
>> On 28. May 2018, at 15:46, Stefan Podkowinski  wrote:
>>
>> The main issue that I see, for supporting both Java 8 + 11, is testing.
>> We should first decide how this would effect builds.apache.org, or how
>> we're going to do CI testing in general for that situation.
>>
>> There are probably also smaller issues that we're not aware of yet, such
>> as which Java dependency to use for our deb and rpm packages,
>> differences in Java distributions (Oracle, AdoptOpenJDK, Redhat,..) and
>> so on. I'd expect we could deal with this on the Java side, but the
>> infra, scripting and testing implications give me a greater headache
>> when thinking of it.
>>
>>
>> On 25.05.2018 15:33, J. D. Jordan wrote:
>>
>> +1 for “Option 3: both 8 + 11” it shouldn’t be too hard to maintain code
>> wise, and leaves people’s options open.
>>
>> -Jeremiah
>>
>> On May 25, 2018, at 6:31 AM, Robert Stupp  wrote:
>>
>> I'd like to bring up the C*/Java discussion again. It's been a while since
>> we've discussed this.
>>
>> To me it sounds like there's still the question about which version(s) of
>> Java we want to support beginning with C* 4.0.
>>
>> I assume, that it's legit (and probably very necessary) to assume that
>> OpenJDK is now (i.e. after Java 6) considered as "production ready" for C*.
>> The public (and legal and free) availability of Oracle's Java 8 will end in
>> January 2019 (unless you're using it privately on your desktop). Java 9 and
>> 10 are not a thing, as both will be EOL when the C* 4.0 branch is about to
>> be cut. The most recent available Java version will be 11, which is meant
>> to be publicly available from Oracle until March 2019 and should get LTS
>> support for OpenJDK 11 from major Linux distros (RHEL and derivates,
>> Ubuntu, Azul Zulu).
>>
>> (Side note: adoptopenjdk is different here, because it does not include
>> the patch version in the version banner (java.version=1.8.0-adoptopenjdk),
>> so difficult to check the minimum patch version on startup of C*.)
>>
>> (Attn, rant: I'm not particularly happy with the new release and support
>> model for Java, because developing something now, that's about to release
>> end of the year on a Java version that has not even reached
>> feature-complete status, is, ge

Re: secondary index table - tombstones surviving compactions

2018-05-30 Thread Jordan West
Hi Roman,

I was able to reproduce the issue you described. I filed
https://issues.apache.org/jira/browse/CASSANDRA-14479. More details there.

Thanks for reporting!
Jordan


On Wed, May 23, 2018 at 12:06 AM, Roman Bielik <
roman.bie...@openmindnetworks.com> wrote:

> Hi,
>
> I apologise for a late response I wanted to run some further tests so I can
> provide more information to you.
>
> @Jeff, no I don't set the "only_purge_repaired_tombstone" option. It
> should
> be default: False.
> But no I don't run repairs during the tests.
>
> @Eric, I understand that rapid deletes/inserts are some kind of
> antipattern, nevertheless I'm not experiencing any problems with that
> (except for the 2nd indices).
>
> Update: I run a new test where I delete the indexed columns extra, plus
> delete the whole row at the end.
> And surprisingly this test scenario works fine. Using nodetool flush +
> compact (in order to expedite the test) seems to always purge the index
> table.
> So that's great because I seem to have found a workaround, on the other
> hand, could there be a bug in Cassandra - leaking index table?
>
> Test details:
> Create table with LeveledCompactionStrategy;
> 'tombstone_compaction_interval': 60; gc_grace_seconds=60
> There are two indexed columns for comparison: column1, column2
> Insert keys {1..x} with random values in column1 & column2
> Delete {key:column2} (but not column1)
> Delete {key}
> Repeat n-times from the inserts
> Wait 1 minute
> nodetool flush
> nodetool compact (sometimes compact  
> nodetool cfstats
>
> What I observe is, that the data table is empty, column2 index table is
> also empty and column1 index table has non-zero (leaked) "space used" and
> "estimated rows".
>
> Roman
>
>
>
>
>
>
> On 18 May 2018 at 16:13, Jeff Jirsa  wrote:
>
> > This would matter for the base table, but would be less likely for the
> > secondary index, where the partition key is the value of the base row
> >
> > Roman: there’s a config option related to only purging repaired
> tombstones
> > - do you have that enabled ? If so, are you running repairs?
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On May 18, 2018, at 6:41 AM, Eric Stevens  wrote:
> > >
> > > The answer to Question 3 is "yes."  One of the more subtle points about
> > > tombstones is that Cassandra won't remove them during compaction if
> there
> > > is a bloom filter on any SSTable on that replica indicating that it
> > > contains the same partition (not primary) key.  Even if it is older
> than
> > > gc_grace, and would otherwise be a candidate for cleanup.
> > >
> > > If you're recycling partition keys, your tombstones may never be able
> to
> > be
> > > cleaned up, because in this scenario there is a high probability that
> an
> > > SSTable not involved in that compaction also contains the same
> partition
> > > key, and so compaction cannot have confidence that it's safe to remove
> > the
> > > tombstone (it would have to fully materialize every record in the
> > > compaction, which is too expensive).
> > >
> > > In general it is an antipattern in Cassandra to write to a given
> > partition
> > > indefinitely for this and other reasons.
> > >
> > > On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> > > roman.bie...@openmindnetworks.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have a Cassandra 3.11 table (with compact storage) and using
> secondary
> > >> indices with rather unique data stored in the indexed columns. There
> are
> > >> many inserts and deletes, so in order to avoid tombstones piling up
> I'm
> > >> re-using primary keys from a pool (which works fine).
> > >> I'm aware that this design pattern is not ideal, but for now I can not
> > >> change it easily.
> > >>
> > >> The problem is, the size of 2nd index tables keeps growing (filled
> with
> > >> tombstones) no matter what.
> > >>
> > >> I tried some aggressive configuration (just for testing) in order to
> > >> expedite the tombstone removal but with little-to-zero effect:
> > >> COMPACTION = { 'class':
> > >> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction':
> 'true',
> > >> 'tombstone_compaction_interval': 600 }
> > >> gc_grace_seconds = 600
> > >>
> > >> I'm aware that perhaps Materialized views could provide a solution to
> > this,
> > >> but I'm bind to the Thrift interface, so can not use them.
> > >>
> > >> Questions:
> > >> 1. Is there something I'm missing? How come compaction does not remove
> > the
> > >> obsolete indices/tombstones from 2nd index tables? Can I trigger the
> > >> cleanup manually somehow?
> > >> I have tried nodetool flush, compact, rebuild_index on both data table
> > and
> > >> internal Index table, but with no result.
> > >>
> > >> 2. When deleting a record I'm deleting the whole row at once - which
> > would
> > >> create one tombstone for the whole record if I'm correct. Would it
> help
> > to
> > >> delete the indexed columns separately creating extra tombstone for
> each
> > >> cell?
> > >> As I understand the underlying mechan