Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-04-24 Thread Dan Jatnieks
New failures from Build Lead week 16 (17 Apr - 21 Apr, 2023)

* The only new failures observed were timeouts and I did not open new
issues just for that reason alone (if I should have done so, please let me
know).

Otherwise, tagged failing tests in Butler with the previously opened
CASSANDRA-18426 and noted in that ticket that the 3.11 tests are now
passing and expect that the affected 3.0 tests will pass next time there is
a build.

Dan

On Mon, Apr 17, 2023 at 10:20 AM Josh McKenzie  wrote:

> Sorry for the delay on the summary of build lead for those two weeks.
>
> We had quite a few new failures during the 2 week period:
>
> CASSANDRA-18427
>
> Test failure: pdtest:
> dtest-novnode.ttl_test.TestDistributedTTL.test_ttl_is_respected_on_delayed_replication
>
> CASSANDRA-18426
>
> Test failure: pdtest:
> dtest-upgrade.upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_3_11_x.test_clustering_order_and_functions
>
> CASSANDRA-18425
>
> Test failure: utest:
> org.apache.cassandra.db.RepairedDataTombstonesTest.readTestPartitionTombstones-.jdk11
>
> CASSANDRA-18393
>
> Flaky test:
> org.apache.cassandra.cql3.validation.operations.InsertUpdateIfConditionTest.testConditionalUpdate[1:
> clusterMinVersion=3.11]-compression.jdk1.8 on trunk
>
> CASSANDRA-18392
>
> flaky test
> org.apache.cassandra.net.ConnectionTest.testMessageDeliveryOnReconnect-compression.jdk1.8
> on trunk
>
> CASSANDRA-18391
>
> consistent timeout:
> dtest-upgrade.upgrade_tests.cql_tests.cls.test_cql3_non_compound_range_tombstones
> on trunk
>
> Quite a few other failures that were timeouts as well. Can see the
> spikiness on butler here for trunk: https://butler.cassandra.apache.org/#/,
> however it looks like we've settled back down to ~ 6 failures right now.
>
> On Mon, Mar 27, 2023, at 12:27 PM, Josh McKenzie wrote:
>
> I'll take build lead for the next 2 weeks.
>
> On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote:
>
> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>
> *** CASSANDRA-18338
> <https://issues.apache.org/jira/browse/CASSANDRA-18338> -  
> dtest.bootstrap_test.TestBootstrap.test_cleanup
> trunk
> ***  CASSANDRA-18338
> <https://issues.apache.org/jira/browse/CASSANDRA-18338> - test:
> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest ,
> this failed twice with jdk 8 and jdk 11, on trunk and  4.1
> others are some timeout exception.
>
>
>
> New failures from Week 12
> *** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades
>
> otherwise all test failures are timeouts.
>
> We need volunteers for the Build Lead the weeks ahead.
>
>
>
>


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-04-17 Thread Josh McKenzie
Sorry for the delay on the summary of build lead for those two weeks.

We had quite a few new failures during the 2 week period:

CASSANDRA-18427

Test failure: pdtest: 
dtest-novnode.ttl_test.TestDistributedTTL.test_ttl_is_respected_on_delayed_replication

CASSANDRA-18426

Test failure: pdtest: 
dtest-upgrade.upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_3_11_x.test_clustering_order_and_functions

CASSANDRA-18425

Test failure: utest: 
org.apache.cassandra.db.RepairedDataTombstonesTest.readTestPartitionTombstones-.jdk11


CASSANDRA-18393

Flaky test: 
org.apache.cassandra.cql3.validation.operations.InsertUpdateIfConditionTest.testConditionalUpdate[1:
 clusterMinVersion=3.11]-compression.jdk1.8 on trunk

CASSANDRA-18392

flaky test 
org.apache.cassandra.net.ConnectionTest.testMessageDeliveryOnReconnect-compression.jdk1.8
 on trunk

CASSANDRA-18391

consistent timeout: 
dtest-upgrade.upgrade_tests.cql_tests.cls.test_cql3_non_compound_range_tombstones
 on trunk


Quite a few other failures that were timeouts as well. Can see the spikiness on 
butler here for trunk: https://butler.cassandra.apache.org/#/, however it looks 
like we've settled back down to ~ 6 failures right now.

On Mon, Mar 27, 2023, at 12:27 PM, Josh McKenzie wrote:
> I'll take build lead for the next 2 weeks.
> 
> On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote:
>>> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>>> 
>>> *** CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338> 
>>> -  dtest.bootstrap_test.TestBootstrap.test_cleanup trunk
>>> ***  CASSANDRA-18338 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18338> - test: 
>>> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest , 
>>> this failed twice with jdk 8 and jdk 11, on trunk and  4.1
>>> others are some timeout exception.
>> 
>> 
>> New failures from Week 12
>> *** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades
>> 
>> otherwise all test failures are timeouts.
>> 
>> We need volunteers for the Build Lead the weeks ahead. 
>> 
>> 
>> 


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-27 Thread Josh McKenzie
I'll take build lead for the next 2 weeks.

On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote:
>> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>> 
>> *** CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338> 
>> -  dtest.bootstrap_test.TestBootstrap.test_cleanup trunk
>> ***  CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338> 
>> - test: 
>> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest , this 
>> failed twice with jdk 8 and jdk 11, on trunk and  4.1
>> others are some timeout exception.
> 
> 
> New failures from Week 12
> *** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades
> 
> otherwise all test failures are timeouts.
> 
> We need volunteers for the Build Lead the weeks ahead. 
> 
> 
> 

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-25 Thread Mick Semb Wever
>
> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>
> *** CASSANDRA-18338
> <https://issues.apache.org/jira/browse/CASSANDRA-18338> -  
> dtest.bootstrap_test.TestBootstrap.test_cleanup
> trunk
> ***  CASSANDRA-18338
> <https://issues.apache.org/jira/browse/CASSANDRA-18338> - test:
> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest ,
> this failed twice with jdk 8 and jdk 11, on trunk and  4.1
> others are some timeout exception.
>


New failures from Week 12
*** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades

otherwise all test failures are timeouts.

We need volunteers for the Build Lead the weeks ahead.


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-21 Thread guo Maxwell
Hi all :
Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :

*** CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338>
 -  dtest.bootstrap_test.TestBootstrap.test_cleanup trunk
***  CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338>
 - test:
org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest ,
this failed twice with jdk 8 and jdk 11, on trunk and  4.1
others are some timeout exception.


Mick Semb Wever  于2023年2月28日周二 05:37写道:

> New failures from Build Lead week 8
>
> *** CASSANDRA-18290 – SecondaryIndexTest.testUpdatesToMemtableData
> 4.1, row did not delete
>
> *** CASSANDRA-18289 –
>
> sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ssl_client_auth_required_fail
> trunk, error not found in logs
>
> *** CASSANDRA-18287 –
> InsertUpdateIfConditionTest.testMultiExistConditionOnSameRowClustering
> trunk,
>
> *** CASSANDRA-18288 – TopPartitionsTest.basicRegularTombstonesTest
> trunk, missing tombstones
>
> ***CASSANDRA-18286 – TTLTest.testCapWarnExpirationOverflowPolicy
> negative deletion and expiry value
> 3.0, MarshalException: A local expiration time should not be negative
>
>
>
> On Tue, 21 Feb 2023 at 03:41, guo Maxwell  wrote:
> >
> > Hi all :
> > Here comes Cassandra CI status for  2023-2-13 - 2023-2-17 :
> >
> > *** CASSANDRA-18274 - Test
> Failures:org.apache.cassandra.utils.binlog.BinLogTest.testTruncationReleasesLogSpace-compression
> -linked in 4.1
> > Other tests below are time out exceptions, and we can ignore them as
> it's considered test-infrastructure failures. Which we are working on
> separately (CASSANDRA-18137), and I have already modify this notification
> in Build Lead page.
> > *** CASSANDRA-18273: Timeout occurred. Please note the time in the
> report does not reflect the time until the timeout. - linked in trunk, 4.0
> > *** CASSANDRA-11493:dtest failure in
> consistency_test.TestAccuracy.test_simple_strategy_users - it is timeout
> exception for CASSANDRA-11493 so I do not reopen it .
> >
> >
> >
> > German Eichberger via dev  于2023年2月14日周二
> 00:29写道:
> >>
> >> First, one of my learnings was that a ticket assigned to an issue in
> one branch of butler doesn't carry to another. So always search.
> >>
> >> New failures from build lead week 7:
> >>
> >> I created a Jira filter for finding the tickets I created:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)
> >>
> >> *** CASSANDRA-18257 - Test Failures: 
> >> org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome
> - linked in 4.0, 4.1, trunk
> >> *** CASSANDRA-18253 - Test Failures: dtest
> repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked
> in 4.0, trunk
> >> *** CASSANDRA-18246 - Test Failures:
> org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
> - linked in 3.11
> >> *** CASSANDRA-18245 - Test Failures:
> org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally
> - linked in 3.11
> >> -
> >>
> >> 
> >> From: Dan Jatnieks 
> >> Sent: Friday, February 10, 2023 2:42 PM
> >> To: dev@cassandra.apache.org ; Claude
> Warren, Jr 
> >> Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07
> >>
> >> You don't often get email from d...@datastax.com. Learn why this is
> important
> >> New Failures from Build Lead Week 6:
> >>
> >> *** CASSANDRA-18021 - Flaky
> org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
> >> - This existing ticket has been linked in butler to new failures on 3.11
> >>
> >> *** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
> >> - Re-opened as intermittent failure occurred in build 1445 on trunk
> >>
> >> Several new failures had only a single occurrence; no new tickets were
> opened during this time.
> >>
> >>
> >>
> >> On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
> >>
> >> New Failures from Build Lead Week 5
> >>
> >> *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute
> 'io'" reported in multiple tests

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-02-27 Thread Mick Semb Wever
New failures from Build Lead week 8

*** CASSANDRA-18290 – SecondaryIndexTest.testUpdatesToMemtableData
4.1, row did not delete

*** CASSANDRA-18289 –
sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ssl_client_auth_required_fail
trunk, error not found in logs

*** CASSANDRA-18287 –
InsertUpdateIfConditionTest.testMultiExistConditionOnSameRowClustering
trunk,

*** CASSANDRA-18288 – TopPartitionsTest.basicRegularTombstonesTest
trunk, missing tombstones

***CASSANDRA-18286 – TTLTest.testCapWarnExpirationOverflowPolicy
negative deletion and expiry value
3.0, MarshalException: A local expiration time should not be negative



On Tue, 21 Feb 2023 at 03:41, guo Maxwell  wrote:
>
> Hi all :
> Here comes Cassandra CI status for  2023-2-13 - 2023-2-17 :
>
> *** CASSANDRA-18274 - Test 
> Failures:org.apache.cassandra.utils.binlog.BinLogTest.testTruncationReleasesLogSpace-compression
>  -linked in 4.1
> Other tests below are time out exceptions, and we can ignore them as it's 
> considered test-infrastructure failures. Which we are working on separately 
> (CASSANDRA-18137), and I have already modify this notification in Build Lead 
> page.
> *** CASSANDRA-18273: Timeout occurred. Please note the time in the report 
> does not reflect the time until the timeout. - linked in trunk, 4.0
> *** CASSANDRA-11493:dtest failure in 
> consistency_test.TestAccuracy.test_simple_strategy_users - it is timeout 
> exception for CASSANDRA-11493 so I do not reopen it .
>
>
>
> German Eichberger via dev  于2023年2月14日周二 00:29写道:
>>
>> First, one of my learnings was that a ticket assigned to an issue in one 
>> branch of butler doesn't carry to another. So always search.
>>
>> New failures from build lead week 7:
>>
>> I created a Jira filter for finding the tickets I created: 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)
>>
>> *** CASSANDRA-18257 - Test Failures: 
>> org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome - linked 
>> in 4.0, 4.1, trunk
>> *** CASSANDRA-18253 - Test Failures: dtest 
>> repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked 
>> in 4.0, trunk
>> *** CASSANDRA-18246 - Test Failures: 
>> org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
>>  - linked in 3.11
>> *** CASSANDRA-18245 - Test Failures: 
>> org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally 
>> - linked in 3.11
>> -
>>
>> 
>> From: Dan Jatnieks 
>> Sent: Friday, February 10, 2023 2:42 PM
>> To: dev@cassandra.apache.org ; Claude Warren, Jr 
>> 
>> Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07
>>
>> You don't often get email from d...@datastax.com. Learn why this is important
>> New Failures from Build Lead Week 6:
>>
>> *** CASSANDRA-18021 - Flaky 
>> org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
>> - This existing ticket has been linked in butler to new failures on 3.11
>>
>> *** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
>> - Re-opened as intermittent failure occurred in build 1445 on trunk
>>
>> Several new failures had only a single occurrence; no new tickets were 
>> opened during this time.
>>
>>
>>
>> On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev 
>>  wrote:
>>
>> New Failures from Build Lead Week 5
>>
>> *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'" 
>> reported in multiple tests
>> - reported in 4.1, 3.11, and 3.0
>> - identified as a possible class loader issue associated with CASSANDRA-18150
>>
>> *** CASSANDRA-18191 - Native Transport SSL tests failing
>> - TestNativeTransportSSL.test_connect_to_ssl and 
>> TestNativeTransportSSL.test_connect_to_ssl (novnode)
>> - TestNativeTransportSSL.test_connect_to_ssl_optional and 
>> TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)
>>
>>
>> On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe  
>> wrote:
>>
>> New failures from Build Lead Week 4:
>>
>> *** CASSANDRA-18188 - Test failure in 
>> upgrade_tests.cql_tests.cls.test_limit_ranges
>> - trunk
>> - AttributeError: module 'py' has no attribute 'io'
>>
>> *** CASSAN

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-02-20 Thread guo Maxwell
Hi all :
Here comes Cassandra CI status for  2023-2-13 - 2023-2-17 :

*** CASSANDRA-18274 <https://issues.apache.org/jira/browse/CASSANDRA-18274>
 - Test
Failures:org.apache.cassandra.utils.binlog.BinLogTest.testTruncationReleasesLogSpace-compression
-linked in 4.1
Other tests below are time out exceptions, and we can ignore them as it's
considered test-infrastructure failures. Which we are working on separately
(CASSANDRA-18137 <https://issues.apache.org/jira/browse/CASSANDRA-18137>),
and I have already modify this notification in Build Lead page.
*** CASSANDRA-18273 <https://issues.apache.org/jira/browse/CASSANDRA-18273>
: Timeout occurred. Please note the time in the report does not reflect the
time until the timeout. - linked in trunk, 4.0
*** CASSANDRA-11493
<https://issues.apache.org/jira/browse/CASSANDRA-11493>:dtest
failure in consistency_test.TestAccuracy.test_simple_strategy_users - it is
timeout exception for CASSANDRA-11493 so I do not reopen it .



German Eichberger via dev  于2023年2月14日周二 00:29写道:

> First, one of my learnings was that a ticket assigned to an issue in one
> branch of butler doesn't carry to another. So always search.
>
> New failures from build lead week 7:
>
> I created a Jira filter for finding the tickets I created:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)
>
> *** CASSANDRA-18257
> <https://issues.apache.org/jira/browse/CASSANDRA-18257> - Test Failures:
> org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome -
> linked in 4.0, 4.1, trunk
> *** CASSANDRA-18253
> <https://issues.apache.org/jira/browse/CASSANDRA-18253> - Test Failures:
> dtest repair_tests.repair_test.TestRepair.test_simple_sequential_repair -
> linked in 4.0, trunk
> *** CASSANDRA-18246
> <https://issues.apache.org/jira/browse/CASSANDRA-18246> - Test Failures:
> org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
> - linked in 3.11
> *** CASSANDRA-18245
> <https://issues.apache.org/jira/browse/CASSANDRA-18245> - Test Failures:
> org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally
> - linked in 3.11
> -
>
> --
> *From:* Dan Jatnieks 
> *Sent:* Friday, February 10, 2023 2:42 PM
> *To:* dev@cassandra.apache.org ; Claude Warren,
> Jr 
> *Subject:* [EXTERNAL] Re: Cassandra CI Status 2023-01-07
>
> You don't often get email from d...@datastax.com. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
> New Failures from Build Lead Week 6:
>
> *** CASSANDRA-18021 - Flaky
> org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
> - This existing ticket has been linked in butler to new failures on 3.11
>
> *** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
> - Re-opened as intermittent failure occurred in build 1445 on trunk
>
> Several new failures had only a single occurrence; no new tickets were
> opened during this time.
>
>
>
> On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> New Failures from Build Lead Week 5
>
> *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'"
> reported in multiple tests
> - reported in 4.1, 3.11, and 3.0
> - identified as a possible class loader issue associated with
> CASSANDRA-18150
>
> *** CASSANDRA-18191 - Native Transport SSL tests failing
> - TestNativeTransportSSL.test_connect_to_ssl and
> TestNativeTransportSSL.test_connect_to_ssl (novnode)
> - TestNativeTransportSSL.test_connect_to_ssl_optional and
> TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)
>
>
> On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
> wrote:
>
> New failures from Build Lead Week 4:
>
> *** CASSANDRA-18188 - Test failure in
> upgrade_tests.cql_tests.cls.test_limit_ranges
> - trunk
> - AttributeError: module 'py' has no attribute 'io'
>
> *** CASSANDRA-18189 - Test failure in
> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
> - 4.0
> - assert 10 == 94764
> - other failures currently open in this test class, but at least
> superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162)
>
> Timeouts continue to manifest in many places.
>
> On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever  wrote:
>
> *** The But

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-02-13 Thread German Eichberger via dev
First, one of my learnings was that a ticket assigned to an issue in one branch 
of butler doesn't carry to another. So always search.

New failures from build lead week 7:

I created a Jira filter for finding the tickets I created: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)

*** CASSANDRA-18257<https://issues.apache.org/jira/browse/CASSANDRA-18257> - 
Test Failures: 
org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome - linked in 
4.0, 4.1, trunk
*** CASSANDRA-18253<https://issues.apache.org/jira/browse/CASSANDRA-18253> - 
Test Failures: dtest 
repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked in 
4.0, trunk
*** CASSANDRA-18246<https://issues.apache.org/jira/browse/CASSANDRA-18246> - 
Test Failures: 
org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
 - linked in 3.11
*** CASSANDRA-18245<https://issues.apache.org/jira/browse/CASSANDRA-18245> - 
Test Failures: 
org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally - 
linked in 3.11
-


From: Dan Jatnieks 
Sent: Friday, February 10, 2023 2:42 PM
To: dev@cassandra.apache.org ; Claude Warren, Jr 

Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

You don't often get email from d...@datastax.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
New Failures from Build Lead Week 6:

*** CASSANDRA-18021 - Flaky 
org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
- This existing ticket has been linked in butler to new failures on 3.11

*** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
- Re-opened as intermittent failure occurred in build 1445 on trunk

Several new failures had only a single occurrence; no new tickets were opened 
during this time.



On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev 
mailto:dev@cassandra.apache.org>> wrote:
New Failures from Build Lead Week 5

*** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'" 
reported in multiple tests
- reported in 4.1, 3.11, and 3.0
- identified as a possible class loader issue associated with CASSANDRA-18150

*** CASSANDRA-18191 - Native Transport SSL tests failing
- TestNativeTransportSSL.test_connect_to_ssl and 
TestNativeTransportSSL.test_connect_to_ssl (novnode)
- TestNativeTransportSSL.test_connect_to_ssl_optional and 
TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)


On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
mailto:calebrackli...@gmail.com>> wrote:
New failures from Build Lead Week 4:

*** CASSANDRA-18188 - Test failure in 
upgrade_tests.cql_tests.cls.test_limit_ranges
- trunk
- AttributeError: module 'py' has no attribute 'io'

*** CASSANDRA-18189 - Test failure in 
cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
- 4.0
- assert 10 == 94764
- other failures currently open in this test class, but at least superficially, 
different errors (see CASSANDRA-17322, CASSANDRA-18162)

Timeouts continue to manifest in many places.

On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever 
mailto:m...@apache.org>> wrote:
*** The Butler (Build Lead)

The introduction of Butler and the Build Lead was a wonderful
improvement to our CI efforts.  It has brought a lot of hygiene in
listing out flakies as they happened.  Noted that this has in-turn
increased the burden in getting our major releases out, but that's to
be seen as a one-off cost.


New Failures from Build Lead Week 3.


*** CASSANDRA-18156 – 
repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
 - AssertionError: Node logs don't have an error message for the failed repair
 - hard regression
 - 3.0, 3.11,

*** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match what 
was written with serialize(out, 12) for verb PAXOS2_COMMIT_AND_PREPARE_RSP
 - serializer class org.apache.cassandra.net.Message$Serializer; expected 1077, 
actual 1079
 - 4.1, trunk

*** CASSANDRA-18158 – 
org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
 - Cannot achieve consistency level ALL
 - 3.11, trunk

*** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
  - AssertionError: null in MemtablePool$SubPool.released(MemtablePool.java:193)
 - 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18160 – 
cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
 - Found orphaned index file in after CDC state not in former
 - 4.1, trunk

*** CASSANDRA-18

Re: Cassandra CI Status 2023-01-07

2023-02-10 Thread Dan Jatnieks
New Failures from Build Lead Week 6:

*** CASSANDRA-18021 - Flaky
org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
- This existing ticket has been linked in butler to new failures on 3.11

*** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
- Re-opened as intermittent failure occurred in build 1445 on trunk

Several new failures had only a single occurrence; no new tickets were
opened during this time.



On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> New Failures from Build Lead Week 5
>
> *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'"
> reported in multiple tests
> - reported in 4.1, 3.11, and 3.0
> - identified as a possible class loader issue associated with
> CASSANDRA-18150
>
> *** CASSANDRA-18191 - Native Transport SSL tests failing
> - TestNativeTransportSSL.test_connect_to_ssl and
> TestNativeTransportSSL.test_connect_to_ssl (novnode)
> - TestNativeTransportSSL.test_connect_to_ssl_optional and
> TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)
>
>
> On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
> wrote:
>
>> New failures from Build Lead Week 4:
>>
>> *** CASSANDRA-18188 - Test failure in
>> upgrade_tests.cql_tests.cls.test_limit_ranges
>> - trunk
>> - AttributeError: module 'py' has no attribute 'io'
>>
>> *** CASSANDRA-18189 - Test failure in
>> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
>> - 4.0
>> - assert 10 == 94764
>> - other failures currently open in this test class, but at least
>> superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162)
>>
>> Timeouts continue to manifest in many places.
>>
>> On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever  wrote:
>>
>>> *** The Butler (Build Lead)

 The introduction of Butler and the Build Lead was a wonderful
 improvement to our CI efforts.  It has brought a lot of hygiene in
 listing out flakies as they happened.  Noted that this has in-turn
 increased the burden in getting our major releases out, but that's to
 be seen as a one-off cost.

>>>
>>>
>>> New Failures from Build Lead Week 3.
>>>
>>>
>>> *** CASSANDRA-18156
>>> – 
>>> repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
>>>  - AssertionError: Node logs don't have an error message for the failed
>>> repair
>>>  - hard regression
>>>  - 3.0, 3.11,
>>>
>>> *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
>>> what was written with serialize(out, 12) for verb
>>> PAXOS2_COMMIT_AND_PREPARE_RSP
>>>  - serializer class org.apache.cassandra.net.Message$Serializer;
>>> expected 1077, actual 1079
>>>  - 4.1, trunk
>>>
>>> *** CASSANDRA-18158
>>> – 
>>> org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
>>>  - Cannot achieve consistency level ALL
>>>  - 3.11, trunk
>>>
>>> *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
>>>   - AssertionError: null
>>> in MemtablePool$SubPool.released(MemtablePool.java:193)
>>>  - 3.11, 4.0, 4.1, trunk
>>>
>>> *** CASSANDRA-18160
>>> – 
>>> cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
>>>  - Found orphaned index file in after CDC state not in former
>>>  - 4.1, trunk
>>>
>>> *** CASSANDRA-18161 –
>>>  
>>> org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
>>>  - AssertionFailedError in
>>> CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
>>>  - 4.0, 4.1, trunk
>>>
>>> *** CASSANDRA-18162 –
>>> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
>>> - Inet address 127.0.0.3:7000
>>> 
>>> is not available: [Errno 98] Address already in use
>>> - 3.0, 3.11, 4.0, 4.1, trunk
>>>
>>> *** CASSANDRA-18163 –
>>>  
>>> transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
>>>  - AssertionError Incoming stream entireSSTable
>>>  - 4.0, 4.1, trunk
>>>
>>>
>>> While writing these up, some thoughts…
>>>  - While Butler reports failures against multiple branches, there's no
>>> feedback/sync that the ticket needs its fixVersions updated when failures
>>> happen in other branches after the ticket is created.
>>>  - In 4.0 onwards, a majority of the failures are timeouts (>900s),
>>> reinforcing that the current main problem we are facing in ci-cassandra.a.o
>>> is saturation/infra
>>>
>>>
>>>
>>>
>>>


Re: Cassandra CI Status 2023-01-07

2023-02-10 Thread Claude Warren, Jr via dev
New Failures from Build Lead Week 5

*** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'"
reported in multiple tests
- reported in 4.1, 3.11, and 3.0
- identified as a possible class loader issue associated with
CASSANDRA-18150

*** CASSANDRA-18191 - Native Transport SSL tests failing
- TestNativeTransportSSL.test_connect_to_ssl and
TestNativeTransportSSL.test_connect_to_ssl (novnode)
- TestNativeTransportSSL.test_connect_to_ssl_optional and
TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)


On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
wrote:

> New failures from Build Lead Week 4:
>
> *** CASSANDRA-18188 - Test failure in
> upgrade_tests.cql_tests.cls.test_limit_ranges
> - trunk
> - AttributeError: module 'py' has no attribute 'io'
>
> *** CASSANDRA-18189 - Test failure in
> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
> - 4.0
> - assert 10 == 94764
> - other failures currently open in this test class, but at least
> superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162)
>
> Timeouts continue to manifest in many places.
>
> On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever  wrote:
>
>> *** The Butler (Build Lead)
>>>
>>> The introduction of Butler and the Build Lead was a wonderful
>>> improvement to our CI efforts.  It has brought a lot of hygiene in
>>> listing out flakies as they happened.  Noted that this has in-turn
>>> increased the burden in getting our major releases out, but that's to
>>> be seen as a one-off cost.
>>>
>>
>>
>> New Failures from Build Lead Week 3.
>>
>>
>> *** CASSANDRA-18156
>> – 
>> repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
>>  - AssertionError: Node logs don't have an error message for the failed
>> repair
>>  - hard regression
>>  - 3.0, 3.11,
>>
>> *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
>> what was written with serialize(out, 12) for verb
>> PAXOS2_COMMIT_AND_PREPARE_RSP
>>  - serializer class org.apache.cassandra.net.Message$Serializer;
>> expected 1077, actual 1079
>>  - 4.1, trunk
>>
>> *** CASSANDRA-18158
>> – 
>> org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
>>  - Cannot achieve consistency level ALL
>>  - 3.11, trunk
>>
>> *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
>>   - AssertionError: null
>> in MemtablePool$SubPool.released(MemtablePool.java:193)
>>  - 3.11, 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18160
>> – 
>> cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
>>  - Found orphaned index file in after CDC state not in former
>>  - 4.1, trunk
>>
>> *** CASSANDRA-18161 –
>>  
>> org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
>>  - AssertionFailedError in
>> CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
>>  - 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18162 –
>> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
>> - Inet address 127.0.0.3:7000 is not available: [Errno 98] Address
>> already in use
>> - 3.0, 3.11, 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18163 –
>>  
>> transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
>>  - AssertionError Incoming stream entireSSTable
>>  - 4.0, 4.1, trunk
>>
>>
>> While writing these up, some thoughts…
>>  - While Butler reports failures against multiple branches, there's no
>> feedback/sync that the ticket needs its fixVersions updated when failures
>> happen in other branches after the ticket is created.
>>  - In 4.0 onwards, a majority of the failures are timeouts (>900s),
>> reinforcing that the current main problem we are facing in ci-cassandra.a.o
>> is saturation/infra
>>
>>
>>
>>
>>


Re: Cassandra CI Status 2023-01-07

2023-01-23 Thread Caleb Rackliffe
New failures from Build Lead Week 4:

*** CASSANDRA-18188 - Test failure in
upgrade_tests.cql_tests.cls.test_limit_ranges
- trunk
- AttributeError: module 'py' has no attribute 'io'

*** CASSANDRA-18189 - Test failure in
cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
- 4.0
- assert 10 == 94764
- other failures currently open in this test class, but at least
superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162)

Timeouts continue to manifest in many places.

On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever  wrote:

> *** The Butler (Build Lead)
>>
>> The introduction of Butler and the Build Lead was a wonderful
>> improvement to our CI efforts.  It has brought a lot of hygiene in
>> listing out flakies as they happened.  Noted that this has in-turn
>> increased the burden in getting our major releases out, but that's to
>> be seen as a one-off cost.
>>
>
>
> New Failures from Build Lead Week 3.
>
>
> *** CASSANDRA-18156
> – 
> repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
>  - AssertionError: Node logs don't have an error message for the failed
> repair
>  - hard regression
>  - 3.0, 3.11,
>
> *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
> what was written with serialize(out, 12) for verb
> PAXOS2_COMMIT_AND_PREPARE_RSP
>  - serializer class org.apache.cassandra.net.Message$Serializer; expected
> 1077, actual 1079
>  - 4.1, trunk
>
> *** CASSANDRA-18158
> – 
> org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
>  - Cannot achieve consistency level ALL
>  - 3.11, trunk
>
> *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
>   - AssertionError: null
> in MemtablePool$SubPool.released(MemtablePool.java:193)
>  - 3.11, 4.0, 4.1, trunk
>
> *** CASSANDRA-18160
> – 
> cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
>  - Found orphaned index file in after CDC state not in former
>  - 4.1, trunk
>
> *** CASSANDRA-18161 –
>  
> org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
>  - AssertionFailedError in
> CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
>  - 4.0, 4.1, trunk
>
> *** CASSANDRA-18162 –
> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
> - Inet address 127.0.0.3:7000 is not available: [Errno 98] Address
> already in use
> - 3.0, 3.11, 4.0, 4.1, trunk
>
> *** CASSANDRA-18163 –
>  
> transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
>  - AssertionError Incoming stream entireSSTable
>  - 4.0, 4.1, trunk
>
>
> While writing these up, some thoughts…
>  - While Butler reports failures against multiple branches, there's no
> feedback/sync that the ticket needs its fixVersions updated when failures
> happen in other branches after the ticket is created.
>  - In 4.0 onwards, a majority of the failures are timeouts (>900s),
> reinforcing that the current main problem we are facing in ci-cassandra.a.o
> is saturation/infra
>
>
>
>
>


Re: Cassandra CI Status 2023-01-07

2023-01-15 Thread Mick Semb Wever
>
> *** The Butler (Build Lead)
>
> The introduction of Butler and the Build Lead was a wonderful
> improvement to our CI efforts.  It has brought a lot of hygiene in
> listing out flakies as they happened.  Noted that this has in-turn
> increased the burden in getting our major releases out, but that's to
> be seen as a one-off cost.
>


New Failures from Build Lead Week 3.


*** CASSANDRA-18156
– 
repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
 - AssertionError: Node logs don't have an error message for the failed
repair
 - hard regression
 - 3.0, 3.11,

*** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
what was written with serialize(out, 12) for verb
PAXOS2_COMMIT_AND_PREPARE_RSP
 - serializer class org.apache.cassandra.net.Message$Serializer; expected
1077, actual 1079
 - 4.1, trunk

*** CASSANDRA-18158
– 
org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
 - Cannot achieve consistency level ALL
 - 3.11, trunk

*** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
  - AssertionError: null
in MemtablePool$SubPool.released(MemtablePool.java:193)
 - 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18160
– 
cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
 - Found orphaned index file in after CDC state not in former
 - 4.1, trunk

*** CASSANDRA-18161 –
 
org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
 - AssertionFailedError in
CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
 - 4.0, 4.1, trunk

*** CASSANDRA-18162 –
cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
- Inet address 127.0.0.3:7000 is not available: [Errno 98] Address already
in use
- 3.0, 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18163 –
 
transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
 - AssertionError Incoming stream entireSSTable
 - 4.0, 4.1, trunk


While writing these up, some thoughts…
 - While Butler reports failures against multiple branches, there's no
feedback/sync that the ticket needs its fixVersions updated when failures
happen in other branches after the ticket is created.
 - In 4.0 onwards, a majority of the failures are timeouts (>900s),
reinforcing that the current main problem we are facing in ci-cassandra.a.o
is saturation/infra


Re: Cassandra CI Status 2023-01-07

2023-01-10 Thread Josh McKenzie
> I don't believe it warrants a CEP, speak up if you disagree. 
I agree with this but I'm also biased having been working w/you on this for a 
bit.

My instinct is that most folks on the project want CI that works consistently, 
quickly, and is minimally complex to modify. So the less disruptive and more 
well documented and streamlined we can make interacting with this process the 
better. 

On Mon, Jan 9, 2023, at 2:06 PM, Mick Semb Wever wrote:
>Happy 2023 everyone!
> 
> With only four months in front of us before the first 5.0 release I'm
> hoping we can re-energize our focus on CI and Stable Trunk.
> 
> This post covers the following
> * Recap of CI improvements
> * State of Affair
> * The Butler (Build Lead)
> * Proposal for a Repeatable Containerised CI
> 
> and it calls for the following actions
> ** we need you to sign up for a week's rotation as Build Lead !
> ** please reply in-thread any CI issues I've forgotten,
> ** does CASSANDRA-18137 warrant a CEP?
> 
> 
> *** Recap of CI improvements
> 
> It's been over two years since my last CI Status post, with Adam and
> Josh covering much of it in their general Status emails (which are
> deeply appreciated).  I'm hoping we can continue with both, given
> their importance to a successful 5.0 release and the debt cost we face
> otherwise going from the initial alpha release to the eventual GA.
> 
> 
> We have made good efforts on moving towards a Stable Trunk.
> Special mentions to
> - improving parity between CircleCI and ci-cassandra.a.o (CASSANDRA-17930)
> - introducing Butler and the Build Lead role
> - pre-commit workflow, and automated multiplexing, in CircleCI
> (CASSANDRA-16625)
> - single digit flaky failures per build on 4.0, 4.1 and trunk
> ci-cassandra.a.o !!
> - CircleCI is as stable on Large as XLarge containers (CASSANDRA-18127)
> 
> 
> *** State of Affair
> 
> None of our CI systems are consistently green yet.  Flakies occur in
> both CircleCI and ci-cassandra.a.o  . We had to lower the 4.1 release
> CI criteria to accept three consequential green runs on CircleCI, as
> it would have been unlikely to achieve the same on ci-cassandra.a.o.
> While the flakey rate is lower than 4.0, the higher number of tests we
> run is making it harder to get those green runs.
> 
> Despite the overhead we continue to face with flakies and getting
> major releases out, 4.1 saw fewer releases to GA than 4.0, I think all
> will agree things are improving.  But the challenge in front of us up
> to the 5.0 release is huge with nine CEPs slated to land.  Pre-commit
> and post-commit CI needs investing in if we want our stable trunk
> efforts to continue to improve.
> 
> 
> *** The Butler (Build Lead)
> 
> The introduction of Butler and the Build Lead was a wonderful
> improvement to our CI efforts.  It has brought a lot of hygiene in
> listing out flakies as they happened.  Noted that this has in-turn
> increased the burden in getting our major releases out, but that's to
> be seen as a one-off cost.  This initiative lost traction and
> volunteers mid last year.
> 
> We really need you to take part in the Build Lead weekly rotation.
> 
> I've signed myself up for this week, please jump in and sign yourself
> up for the weeks ahead.  If you are a coach/manager for a team, please
> permit and encourage your engineers to be involved in this activity,
> it shouldn't be more than an hour over the week.  Further instructions
> found at https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead
> 
> If it's your first time being a Build Lead the community is here to
> help you, just reach out.  It's also a great way into our community
> for newcomers!
> 
> When it comes to Butler it's UX of history is a bit clumsy.  TIL that
> you can indeed list the full history of failures per test, see 'Full
> History' under a test page*.  Please use this information to help
> create jira tickets on flakies, specifically the versions it applies
> to and the rough rate of failure so far observed.
> 
> *) e.g. 
> https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/snapshot_test/TestArchiveCommitlog/test_archive_commitlog_point_in_time_ln
> 
> 
> *** Proposal for a Repeatable Containerised CI
> 
> Building on what Josh writes in his "Cassandra project status, Year in
> Review Holiday Edition" post, and many discussions offline with many
> folk, I've written up the ticket epic for creating a reproducible
> containerised ci-cassandra.a.o
> 
> Please read https://issues.apache.org/jira/browse/CASSANDRA-18137
> 
> The tl;dr of it is to create a script that, using the jenkins k8s
> operator, can set up a ci-cassandra.a.o clone in your k8s context.
> 
> The ticket is lengthy, despite being in bullet form.  I don't believe
> it warrants a CEP, speak up if you disagree.  The idea is to provide
> us a turnkey solution: the jenkins k8s operator based script (create
> ci-cassandra.a.o clone, run pipeline, save results, tear down clone);
> to bring our ex

Cassandra CI Status 2023-01-07

2023-01-09 Thread Mick Semb Wever
   Happy 2023 everyone!

With only four months in front of us before the first 5.0 release I'm
hoping we can re-energize our focus on CI and Stable Trunk.

This post covers the following
 * Recap of CI improvements
 * State of Affair
 * The Butler (Build Lead)
 * Proposal for a Repeatable Containerised CI

and it calls for the following actions
 ** we need you to sign up for a week's rotation as Build Lead !
 ** please reply in-thread any CI issues I've forgotten,
 ** does CASSANDRA-18137 warrant a CEP?


 *** Recap of CI improvements

It's been over two years since my last CI Status post, with Adam and
Josh covering much of it in their general Status emails (which are
deeply appreciated).  I'm hoping we can continue with both, given
their importance to a successful 5.0 release and the debt cost we face
otherwise going from the initial alpha release to the eventual GA.


We have made good efforts on moving towards a Stable Trunk.
Special mentions to
 - improving parity between CircleCI and ci-cassandra.a.o (CASSANDRA-17930)
 - introducing Butler and the Build Lead role
 - pre-commit workflow, and automated multiplexing, in CircleCI
(CASSANDRA-16625)
 - single digit flaky failures per build on 4.0, 4.1 and trunk
ci-cassandra.a.o !!
 - CircleCI is as stable on Large as XLarge containers (CASSANDRA-18127)


*** State of Affair

None of our CI systems are consistently green yet.  Flakies occur in
both CircleCI and ci-cassandra.a.o  . We had to lower the 4.1 release
CI criteria to accept three consequential green runs on CircleCI, as
it would have been unlikely to achieve the same on ci-cassandra.a.o.
While the flakey rate is lower than 4.0, the higher number of tests we
run is making it harder to get those green runs.

Despite the overhead we continue to face with flakies and getting
major releases out, 4.1 saw fewer releases to GA than 4.0, I think all
will agree things are improving.  But the challenge in front of us up
to the 5.0 release is huge with nine CEPs slated to land.  Pre-commit
and post-commit CI needs investing in if we want our stable trunk
efforts to continue to improve.


*** The Butler (Build Lead)

The introduction of Butler and the Build Lead was a wonderful
improvement to our CI efforts.  It has brought a lot of hygiene in
listing out flakies as they happened.  Noted that this has in-turn
increased the burden in getting our major releases out, but that's to
be seen as a one-off cost.  This initiative lost traction and
volunteers mid last year.

We really need you to take part in the Build Lead weekly rotation.

I've signed myself up for this week, please jump in and sign yourself
up for the weeks ahead.  If you are a coach/manager for a team, please
permit and encourage your engineers to be involved in this activity,
it shouldn't be more than an hour over the week.  Further instructions
found at https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead

If it's your first time being a Build Lead the community is here to
help you, just reach out.  It's also a great way into our community
for newcomers!

When it comes to Butler it's UX of history is a bit clumsy.  TIL that
you can indeed list the full history of failures per test, see 'Full
History' under a test page*.  Please use this information to help
create jira tickets on flakies, specifically the versions it applies
to and the rough rate of failure so far observed.

*) e.g. 
https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/snapshot_test/TestArchiveCommitlog/test_archive_commitlog_point_in_time_ln


*** Proposal for a Repeatable Containerised CI

Building on what Josh writes in his "Cassandra project status, Year in
Review Holiday Edition" post, and many discussions offline with many
folk, I've written up the ticket epic for creating a reproducible
containerised ci-cassandra.a.o

Please read https://issues.apache.org/jira/browse/CASSANDRA-18137

The tl;dr of it is to create a script that, using the jenkins k8s
operator, can set up a ci-cassandra.a.o clone in your k8s context.

The ticket is lengthy, despite being in bullet form.  I don't believe
it warrants a CEP, speak up if you disagree.  The idea is to provide
us a turnkey solution: the jenkins k8s operator based script (create
ci-cassandra.a.o clone, run pipeline, save results, tear down clone);
to bring our existing build and test scripts (including their docker
images) from cassandra-builds to be in-tree to give us a declarative
jenkins pipeline that (in a simple intuitive manner) maps stages to
CI-agnostic build and test scripts (that can be run locally without a
CI system if you so desire), where all branch specific testing context
(jdks, pythons, dists) is defined outside of the CI code.  Its success
depends upon providing a CI system that is stable and fast for
pre-commit testing.


Re: Cassandra CI Status – 2020-08-08

2020-08-10 Thread Charles Cao
This is awesome! Thanks Mick for all the great work.

~Charles

On Mon, Aug 10, 2020 at 11:41 AM Ekaterina Dimitrova
 wrote:
>
> Thank you Mick for all the work you are doing for the C* CI and not only!
> I am especially excited about the addition of the upgrade tests.
> It’s also great that things got documented.
>
> Best regards,
> Ekaterina
>
> On Sat, 8 Aug 2020 at 6:41, Mick Semb Wever  wrote:
>
> > The following is a summary of changes, status, and suggestions to our
> > community CI, ci-cassandra.apache.org
> > Please reply with questions, as well as any input on CircleCI status anyone
> > has to offer.
> >
> > This post will touch on…
> > * Upgrade Tests
> > * Build Times and Improvements
> > * Pre Commit Builds
> > * Stand-alone Pipeline Runs
> > * Standardising our CI Build Scripts
> > * JDK11
> > * Nightly Build Artefacts
> > * CI Documentation
> > * 4.0 QA Status
> >
> >
> > ** Upgrade Tests
> >
> > Both in-jvm and normal upgrade tests have been added to
> > ci-cassandra.apache.org
> >
> > Those in-jvm upgrade tests have been included in the pipeline builds.  The
> > normal upgrade tests currently remain stand-alone. Trunk’s version is found
> > here https://ci-cassandra.apache.org/job/Cassandra-trunk-dtest-upgrade/
> >
> >
> > ** Build Times and Improvements
> >
> > DTests have been parallelised. By default DTest jobs are divided into 64
> > splits now, with the exception of the Large DTest jobs which due to having
> > far fewer tests have only 8 splits.
> >
> > This brings dtest runs from ~12 hours down to ~45 minutes. It brings whole
> > pipeline builds from ~14 hours down to ~2.5 hours. Some patch (devbranch)
> > builds have completed in 90 minutes.
> >
> > For more speed we can split more, but ci-cassandra currently has 36 agents
> > (72 executors) and is now often saturated and build queues large. We can
> > also look into Unit Tests which are only ever using one runner on both
> > ci-cassandra and circleci. A problem here is that using more runners breaks
> > some of the unit tests. Another possible improvement is to avoid the
> > compiling in all the test stages, by re-using the built artefacts in the
> > beginning of each pipeline run.
> >
> >
> > ** Pre Commit Builds
> >
> > ci-cassandra.apache.org is less frequently used for pre-commit builds, for
> > reasons of resource limits and access reserved to committers. More
> > information on this can be found in
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=153815764
> >
> > Some trusted contributors have been given Jenkins API tokens, along with
> > using
> > the script found here
> > https://the-asf.slack.com/archives/C0162JU2CKY/p1595703740008400
> >
> > These tokens do rotate, and how/when still remains a little unclear.
> > Committers can generate tokens via their Jenkins profile pages. There’s
> > also investigation as to whether we can create jenkins accounts with build
> > permissions for trusted contributors. Again with the resources available,
> > especially in contrast to additional testing that will probably appear in
> > the 4.0 beta testing phase, we are restricted to what is feasible here.
> >
> > Notifications for all devbranch pipeline builds now go to
> > #cassandra-builds-patches. These report the build url, the commit SHA and
> > message, and the patch repository.
> >
> >
> > ** Stand-alone Pipeline Runs
> >
> > It has been raised a number of times that it would be great if users
> > (companies) could run the complete pipeline on their own resources/cloud,
> > from a single command line, including the setup and teardown of the jenkins
> > platform. This would be a big win for the community with standardised
> > testing and test reports to share, and helping to test all the possible
> > configuration combinations possible. There is some work involved to do this
> > but we appear to be moving in that direction anyway. For it to happen the
> > all stage jobs inside the pipeline need to be moved, from being generated
> > in the dsl script, to being defined in the in-tree Jenkinsfile.
> >
> >
> > ** Standardising our CI Build Scripts
> >
> > Today we have a lot of duplication of build scripts. Those in
> > cassandra-builds/build-scripts/ and those embedded into each of the
> > circleci config files in-tree.
> >
> > I would like to suggest we move the build-scripts in-tree, and start
> > migrating circleci to re-use the same build scripts. There are differences
> > from how test lists are split (round-robin `split` to circleci timings
> > based splitting) to how parallelisation works (circle’s containers vs the
> > jenkins matrix plugin), but I suspect by focusing on the easy stuff there’s
> > a lot that can be standardised.
> >
> >
> > ** JDK11
> >
> > JDK11 builds have been contributed, thanks to Shylaja. Trunk’s pipeline now
> > builds both JDK 8 and 11 artefacts.
> >
> > Adding JDK11 test runs hit a hurdle with how the JDK labels are named and
> > our tests are no

Re: Cassandra CI Status – 2020-08-08

2020-08-10 Thread Ekaterina Dimitrova
Thank you Mick for all the work you are doing for the C* CI and not only!
I am especially excited about the addition of the upgrade tests.
It’s also great that things got documented.

Best regards,
Ekaterina

On Sat, 8 Aug 2020 at 6:41, Mick Semb Wever  wrote:

> The following is a summary of changes, status, and suggestions to our
> community CI, ci-cassandra.apache.org
> Please reply with questions, as well as any input on CircleCI status anyone
> has to offer.
>
> This post will touch on…
> * Upgrade Tests
> * Build Times and Improvements
> * Pre Commit Builds
> * Stand-alone Pipeline Runs
> * Standardising our CI Build Scripts
> * JDK11
> * Nightly Build Artefacts
> * CI Documentation
> * 4.0 QA Status
>
>
> ** Upgrade Tests
>
> Both in-jvm and normal upgrade tests have been added to
> ci-cassandra.apache.org
>
> Those in-jvm upgrade tests have been included in the pipeline builds.  The
> normal upgrade tests currently remain stand-alone. Trunk’s version is found
> here https://ci-cassandra.apache.org/job/Cassandra-trunk-dtest-upgrade/
>
>
> ** Build Times and Improvements
>
> DTests have been parallelised. By default DTest jobs are divided into 64
> splits now, with the exception of the Large DTest jobs which due to having
> far fewer tests have only 8 splits.
>
> This brings dtest runs from ~12 hours down to ~45 minutes. It brings whole
> pipeline builds from ~14 hours down to ~2.5 hours. Some patch (devbranch)
> builds have completed in 90 minutes.
>
> For more speed we can split more, but ci-cassandra currently has 36 agents
> (72 executors) and is now often saturated and build queues large. We can
> also look into Unit Tests which are only ever using one runner on both
> ci-cassandra and circleci. A problem here is that using more runners breaks
> some of the unit tests. Another possible improvement is to avoid the
> compiling in all the test stages, by re-using the built artefacts in the
> beginning of each pipeline run.
>
>
> ** Pre Commit Builds
>
> ci-cassandra.apache.org is less frequently used for pre-commit builds, for
> reasons of resource limits and access reserved to committers. More
> information on this can be found in
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=153815764
>
> Some trusted contributors have been given Jenkins API tokens, along with
> using
> the script found here
> https://the-asf.slack.com/archives/C0162JU2CKY/p1595703740008400
>
> These tokens do rotate, and how/when still remains a little unclear.
> Committers can generate tokens via their Jenkins profile pages. There’s
> also investigation as to whether we can create jenkins accounts with build
> permissions for trusted contributors. Again with the resources available,
> especially in contrast to additional testing that will probably appear in
> the 4.0 beta testing phase, we are restricted to what is feasible here.
>
> Notifications for all devbranch pipeline builds now go to
> #cassandra-builds-patches. These report the build url, the commit SHA and
> message, and the patch repository.
>
>
> ** Stand-alone Pipeline Runs
>
> It has been raised a number of times that it would be great if users
> (companies) could run the complete pipeline on their own resources/cloud,
> from a single command line, including the setup and teardown of the jenkins
> platform. This would be a big win for the community with standardised
> testing and test reports to share, and helping to test all the possible
> configuration combinations possible. There is some work involved to do this
> but we appear to be moving in that direction anyway. For it to happen the
> all stage jobs inside the pipeline need to be moved, from being generated
> in the dsl script, to being defined in the in-tree Jenkinsfile.
>
>
> ** Standardising our CI Build Scripts
>
> Today we have a lot of duplication of build scripts. Those in
> cassandra-builds/build-scripts/ and those embedded into each of the
> circleci config files in-tree.
>
> I would like to suggest we move the build-scripts in-tree, and start
> migrating circleci to re-use the same build scripts. There are differences
> from how test lists are split (round-robin `split` to circleci timings
> based splitting) to how parallelisation works (circle’s containers vs the
> jenkins matrix plugin), but I suspect by focusing on the easy stuff there’s
> a lot that can be standardised.
>
>
> ** JDK11
>
> JDK11 builds have been contributed, thanks to Shylaja. Trunk’s pipeline now
> builds both JDK 8 and 11 artefacts.
>
> Adding JDK11 test runs hit a hurdle with how the JDK labels are named and
> our tests are not friendly with directory names containing spaces.
>
>
> ** Nightly Build Artefacts
>
> Build artefacts of Tarballs, as well as Debian and RedHat packages, are
> attached to the artefact stages inside each pipeline, for both JDK8 and
> JDK11 builds.
>
> To download the latest successful JDK8 build of these, use the following
> links.
>  Trunk:
>
> 

Cassandra CI Status – 2020-08-08

2020-08-08 Thread Mick Semb Wever
The following is a summary of changes, status, and suggestions to our
community CI, ci-cassandra.apache.org
Please reply with questions, as well as any input on CircleCI status anyone
has to offer.

This post will touch on…
* Upgrade Tests
* Build Times and Improvements
* Pre Commit Builds
* Stand-alone Pipeline Runs
* Standardising our CI Build Scripts
* JDK11
* Nightly Build Artefacts
* CI Documentation
* 4.0 QA Status


** Upgrade Tests

Both in-jvm and normal upgrade tests have been added to
ci-cassandra.apache.org

Those in-jvm upgrade tests have been included in the pipeline builds.  The
normal upgrade tests currently remain stand-alone. Trunk’s version is found
here https://ci-cassandra.apache.org/job/Cassandra-trunk-dtest-upgrade/


** Build Times and Improvements

DTests have been parallelised. By default DTest jobs are divided into 64
splits now, with the exception of the Large DTest jobs which due to having
far fewer tests have only 8 splits.

This brings dtest runs from ~12 hours down to ~45 minutes. It brings whole
pipeline builds from ~14 hours down to ~2.5 hours. Some patch (devbranch)
builds have completed in 90 minutes.

For more speed we can split more, but ci-cassandra currently has 36 agents
(72 executors) and is now often saturated and build queues large. We can
also look into Unit Tests which are only ever using one runner on both
ci-cassandra and circleci. A problem here is that using more runners breaks
some of the unit tests. Another possible improvement is to avoid the
compiling in all the test stages, by re-using the built artefacts in the
beginning of each pipeline run.


** Pre Commit Builds

ci-cassandra.apache.org is less frequently used for pre-commit builds, for
reasons of resource limits and access reserved to committers. More
information on this can be found in
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=153815764

Some trusted contributors have been given Jenkins API tokens, along with using
the script found here
https://the-asf.slack.com/archives/C0162JU2CKY/p1595703740008400

These tokens do rotate, and how/when still remains a little unclear.
Committers can generate tokens via their Jenkins profile pages. There’s
also investigation as to whether we can create jenkins accounts with build
permissions for trusted contributors. Again with the resources available,
especially in contrast to additional testing that will probably appear in
the 4.0 beta testing phase, we are restricted to what is feasible here.

Notifications for all devbranch pipeline builds now go to
#cassandra-builds-patches. These report the build url, the commit SHA and
message, and the patch repository.


** Stand-alone Pipeline Runs

It has been raised a number of times that it would be great if users
(companies) could run the complete pipeline on their own resources/cloud,
from a single command line, including the setup and teardown of the jenkins
platform. This would be a big win for the community with standardised
testing and test reports to share, and helping to test all the possible
configuration combinations possible. There is some work involved to do this
but we appear to be moving in that direction anyway. For it to happen the
all stage jobs inside the pipeline need to be moved, from being generated
in the dsl script, to being defined in the in-tree Jenkinsfile.


** Standardising our CI Build Scripts

Today we have a lot of duplication of build scripts. Those in
cassandra-builds/build-scripts/ and those embedded into each of the
circleci config files in-tree.

I would like to suggest we move the build-scripts in-tree, and start
migrating circleci to re-use the same build scripts. There are differences
from how test lists are split (round-robin `split` to circleci timings
based splitting) to how parallelisation works (circle’s containers vs the
jenkins matrix plugin), but I suspect by focusing on the easy stuff there’s
a lot that can be standardised.


** JDK11

JDK11 builds have been contributed, thanks to Shylaja. Trunk’s pipeline now
builds both JDK 8 and 11 artefacts.

Adding JDK11 test runs hit a hurdle with how the JDK labels are named and
our tests are not friendly with directory names containing spaces.


** Nightly Build Artefacts

Build artefacts of Tarballs, as well as Debian and RedHat packages, are
attached to the artefact stages inside each pipeline, for both JDK8 and
JDK11 builds.

To download the latest successful JDK8 build of these, use the following
links.
 Trunk:
https://ci-cassandra.apache.org/job/Cassandra-trunk-artifacts/jdk=JDK%201.8%20(latest),label=cassandra/lastSuccessfulBuild/artifact/
3.11:
https://ci-cassandra.apache.org/job/Cassandra-3.11-artifacts/jdk=JDK%201.8%20(latest),label=cassandra/lastSuccessfulBuild/artifact/
 3.0:
https://ci-cassandra.apache.org/job/Cassandra-3.0-artifacts/jdk=JDK%201.8%20(latest),label=cassandra/lastSuccessfulBuild/artifact/
 2.2:
https://ci-cassandra.apache.org/job/Cas

Cassandra CI Status – 2020-05-06

2020-05-06 Thread Mick Semb Wever
The following is a summary of changes and status over the past ~month.

Shout outs go to David Capwell, Ekaterina Dimitrova, Eduard
Tudenhöfner, Berenguer Blasi, Anthony Grasso, and Dinesh Joshi, for
with testing framework issues.


** New CI-Cassandra landing page

The default landing page for https://ci-cassandra.apache.org/  has
changed. Now it shows just the pipeline builds for each Cassandra
branch, and the additional sub-projects. This should make it as simple
as possible for developers to know the status of test results for
their development branch base.


** Aggregated Test Reports and a Permanent Archive of every build

Aggregated Test Reports have had a clean up. Jobs that re-run test
suites with different parameters (cqlshlib and dtests) now distinguish
those results by prefixing the test's package name with those
differing parameter values. This makes it possible to differentiate
those test results in the aggregated test report.

The latest HTML test report is available here:
https://ci-cassandra.apache.org/job/Cassandra-trunk/lastSuccessfulBuild/testReport/

This report is also made available in plaintext form. With a permanent
archive of these, for ~every commit, getting sent to
https://lists.apache.org/list.html?bui...@cassandra.apache.org


** Emails on Broken Builds

Broken build notifications are now enabled on the whole pipeline
builds. Now any failures in any of the artifacts, stress-test, or
fqltool-test stages will result in email notifications.


** Build Times and Improvements

The `dtest-large` job was previously the normal dtest run including
the resource-intensive annotated dtests. With the addition of the
`--only-resource-intensive-tests` flag the `dtest-large` job now only
runs the resource-intensive dtests. This reduces the total pipeline
build time by ~10 hours.

One of the additional subproject builds is the cassandra-website
build. As Anthony mentioned in a separate thread, it now generates and
deploys every pushed change on master to cassandra.staged.apache.org.
No one needs to commit changes under the "content/" directory anymore.
If the commit was to the C* repo docs you can manually trigger the
build so to update cassandra.staged.apache.org.


** Cassandra devbranch (patches)

While CircleCI is much better for pre-merge CI, at least for those
that have access to premium accounts and the large containers,
CI-Cassandra can still be used for committers and contributors that
don't have access to CircleCI.

An improvement has been made so that all the parameters are verified
at the beginning of the pipeline. This was based on feedback and
observations that these parameters were fiddly to get right.


** Help Wanted

The following items remain open, any help would be appreciated.
 - posting build summary from pipeline builds back to jira tickets,
 - adding tests for rpm/deb packaging,
 - adding the Driver tests,
 - jdk11 jobs in ci-cassandra.a.o,
 - upgrade dtests in ci-cassandra.a.o,
 - further comparison and oversight on slow, flaky, and ignored tests,
 - parallelising dtests (because 12 hours is still wild), and
 - efforts to standardise CI scripts (making them as CI system
agnostic as possible).

If you would like to help, feel free reach out to me on mail/slack and
we can create a jira ticket for scope, ideas, and visibility. Input
from more developers on what their needs from the CI systems are would
also be awesome.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status – 2020-04-06

2020-04-07 Thread Joshua McKenzie
I spoke with Mick offline about this a bit but wanted to relay it here for
posterity:

** Why are cdc and compression unit tests run separately?
>
In the case of cdc, I erred on the side of caution in impl assuming that
most people would not be using it so any degradation in performance for the
non-cdc case would be net negative for the majority of users. As such, the
CDC mutation allocation and segment management codepaths are separate from
the regular. Were we to only run tests with cdc enabled, we'd never
exercise the code-paths for what would presumably be the majority use-case
of the CommitLog. Which I say would be bad. :)

This amounts to checking a bool in Mutation a couple times and
synchronizing on an object during new segment creation and deletion, so it
may be worth considering unifying the commit log allocator paths on the CDC
path and remove the need for separate testing in 5.0. There are quite a few
different users running CDC out in the world today without significant
issue, and some robust system level benchmarking and micro-benchmarking of
CommitLog behavior at boundary conditions should establish further
confidence in the acceptable nature of a change like that.

On Mon, Apr 6, 2020 at 3:20 AM Mick Semb Wever  wrote:

> It's been three weeks since the last update¹. What follows is a quick
> state of affairs and a few questions for anyone to jump in and help
> with.
>
> The ASF CI is running at
> https://ci-cassandra.apache.org/view/branches/ and will soon have 36
> builds agents in total.  Thanks to Instaclustr, Amazon, iland, and
> DataStax. The previous `builds.apache.org` is now retired for all
> Cassandra builds.
>
>
> ** Broken Builds
>
> Tests results continue to improve. Last week at one point we saw trunk
> down to 24 test failures for the whole pipeline.  That's including
> three runs of unit tests (normal, cdc, compression) and three runs of
> dtests (normal, novnode, offheap).   Because of that it was easy to
> see the breakage CASSANDRA-15684. Breakages happen, but to be able to
> spot them so quickly is a win!
>
>
> ** Emails on Broken Builds
>
> Currently you will only get a broken build email if you break the
> artifacts stage of the pipeline.
>
> Pipeline stages that are now consistently passing and are candidates
> for adding broken build email notifications to are:
>   - Cassandra-2.2-jvm-dtest
>   - Cassandra-2.2-long-test
>   - Cassandra-3.0-jvm-dtest
>   - Cassandra-3.0-long-test
>   - Cassandra-3.0-test
>   - Cassandra-3.11-jvm-dtest
>   - Cassandra-3.11-long-test
>   - Cassandra-3.11-test
>   - Cassandra-3.11-stress-test
>   - Cassandra-3.11-test-compression
>   - Cassandra-trunk-fqltool-test
>   - Cassandra-trunk-stress-test
>   - Cassandra-trunk-test
>   - Cassandra-trunk-test-compression
>   - Cassandra-trunk-test-burn
>
> Given no objection, I will add broken email notification to these,
> starting with the smaller stages that have been longest stable.
> Please make sure that you are not filtering out emails from
> jenk...@builds.apache.org so you get to know you broke it before
> someone has to tell you.
>
>
> ** Our ci-cassandra Jenkins versus CircleCI
>
> Thanks to David Capwell, who spent some time comparing ci-cassandra to
> CircleCI.
>
> His findings were…
>   - There's a reasonable match across the unit, stress, fql, and
> jvm-dtest tests.
>   - CircleCI does not run the cdc unit, burn, long unit, cqlsh tests,
> or offheap dtests.
>   - ci-cassandra does not have JDK 11 test runs, or the upgrade
> (jvm-dtest and dtest) tests.
>   - The cqlsh tests: which only ci-cassandra runs; are broken in trunk
> with the recent python3 upgrade.
>
>
> ** Why are cdc and compression unit tests run separately?
>
> Why do we duplicate unit test runs for cdc and compression?
> Is there any ML or ticket to provide history on this?
> Can we not instead just enable these settings in the normal unit run?
>
>
> regards,
> Mick
>
>
> ¹)
> https://lists.apache.org/thread.html/re8122e4fdd8629e7fbca2abf27d72054b3bc0e3690ece8b8e66f618b%40%3Cdev.cassandra.apache.org%3E
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Cassandra CI Status – 2020-04-06

2020-04-06 Thread Mick Semb Wever
It's been three weeks since the last update¹. What follows is a quick
state of affairs and a few questions for anyone to jump in and help
with.

The ASF CI is running at
https://ci-cassandra.apache.org/view/branches/ and will soon have 36
builds agents in total.  Thanks to Instaclustr, Amazon, iland, and
DataStax. The previous `builds.apache.org` is now retired for all
Cassandra builds.


** Broken Builds

Tests results continue to improve. Last week at one point we saw trunk
down to 24 test failures for the whole pipeline.  That's including
three runs of unit tests (normal, cdc, compression) and three runs of
dtests (normal, novnode, offheap).   Because of that it was easy to
see the breakage CASSANDRA-15684. Breakages happen, but to be able to
spot them so quickly is a win!


** Emails on Broken Builds

Currently you will only get a broken build email if you break the
artifacts stage of the pipeline.

Pipeline stages that are now consistently passing and are candidates
for adding broken build email notifications to are:
  - Cassandra-2.2-jvm-dtest
  - Cassandra-2.2-long-test
  - Cassandra-3.0-jvm-dtest
  - Cassandra-3.0-long-test
  - Cassandra-3.0-test
  - Cassandra-3.11-jvm-dtest
  - Cassandra-3.11-long-test
  - Cassandra-3.11-test
  - Cassandra-3.11-stress-test
  - Cassandra-3.11-test-compression
  - Cassandra-trunk-fqltool-test
  - Cassandra-trunk-stress-test
  - Cassandra-trunk-test
  - Cassandra-trunk-test-compression
  - Cassandra-trunk-test-burn

Given no objection, I will add broken email notification to these,
starting with the smaller stages that have been longest stable.
Please make sure that you are not filtering out emails from
jenk...@builds.apache.org so you get to know you broke it before
someone has to tell you.


** Our ci-cassandra Jenkins versus CircleCI

Thanks to David Capwell, who spent some time comparing ci-cassandra to CircleCI.

His findings were…
  - There's a reasonable match across the unit, stress, fql, and
jvm-dtest tests.
  - CircleCI does not run the cdc unit, burn, long unit, cqlsh tests,
or offheap dtests.
  - ci-cassandra does not have JDK 11 test runs, or the upgrade
(jvm-dtest and dtest) tests.
  - The cqlsh tests: which only ci-cassandra runs; are broken in trunk
with the recent python3 upgrade.


** Why are cdc and compression unit tests run separately?

Why do we duplicate unit test runs for cdc and compression?
Is there any ML or ticket to provide history on this?
Can we not instead just enable these settings in the normal unit run?


regards,
Mick


¹) 
https://lists.apache.org/thread.html/re8122e4fdd8629e7fbca2abf27d72054b3bc0e3690ece8b8e66f618b%40%3Cdev.cassandra.apache.org%3E

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status – 2020-03-13

2020-03-13 Thread Jon Haddad
Thanks for working on this, Mick.  I think I'll have some time next week to
help out.

On Fri, Mar 13, 2020 at 5:38 AM Mick Semb Wever  wrote:

>
> The ASF new Jenkins setup is almost complete.
> See https://ci-cassandra.apache.org/view/branches/
> The setup has been tracked in INFRA-19951
>
> The next step here is to retire our usage of `builds.apache.org` and move
> all existing agents to the new setup. There are also new agents getting
> ready to being donated, that will be added to this new setup.
>
> If anyone has a problem with builds.apache.org going away please SPEAK UP
> NOW.
>
> On Jenkins, the main code's unit tests is close to (a japanese) green,
> with only 3 failures on both 2.2 and trunk. It's been noticed that some of
> the other unit tests (eg compression) are mixing up their run and/or
> reports. This, along with the formatting of  reports, still remains a todo.
>
> Otherwise there's been a huge amount of momentum in fixing and improving
> the tests. It's only to take a quick glance at the commit log to see test
> fixes coming in from everyone and everywhere!
>
> Some ideas for subsequent steps are:
>  - clean up report formats (slack notifications and emails sent),
>  - put together some oversight on slow, flaky, and ignored tests (McFadin,
> were you still keen to do this?),
>  - look into sending failure emails out on unit failures (if over time
> there's no more flaky units),
>  - look into posting test result summary from each pipeline back to jira
> as a comment,
>  - parallelise dtests (because 12 hours is still wild)
>  - look into testing building of rpm/deb packages
>  - look into testing of website building?
>  - look into pipelines only kicking off stages that are relevant to what
> changed in the commit,
>  - more comparison between CircleCI and Jenkins, and from any internal
> testing systems if folk can say,
>  - look into tying in the Driver smoke tests.
>
> If any contributors are keen on any of these things, do reach out.
> Anything to help C* be rock-solid.
>
> regards,
> Mick
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Cassandra CI Status – 2020-03-13

2020-03-13 Thread Mick Semb Wever


The ASF new Jenkins setup is almost complete.
See https://ci-cassandra.apache.org/view/branches/
The setup has been tracked in INFRA-19951

The next step here is to retire our usage of `builds.apache.org` and move all 
existing agents to the new setup. There are also new agents getting ready to 
being donated, that will be added to this new setup.

If anyone has a problem with builds.apache.org going away please SPEAK UP NOW.

On Jenkins, the main code's unit tests is close to (a japanese) green, with 
only 3 failures on both 2.2 and trunk. It's been noticed that some of the other 
unit tests (eg compression) are mixing up their run and/or reports. This, along 
with the formatting of  reports, still remains a todo.

Otherwise there's been a huge amount of momentum in fixing and improving the 
tests. It's only to take a quick glance at the commit log to see test fixes 
coming in from everyone and everywhere!

Some ideas for subsequent steps are:
 - clean up report formats (slack notifications and emails sent),
 - put together some oversight on slow, flaky, and ignored tests (McFadin, were 
you still keen to do this?),
 - look into sending failure emails out on unit failures (if over time there's 
no more flaky units),
 - look into posting test result summary from each pipeline back to jira as a 
comment,
 - parallelise dtests (because 12 hours is still wild)
 - look into testing building of rpm/deb packages
 - look into testing of website building?
 - look into pipelines only kicking off stages that are relevant to what 
changed in the commit,
 - more comparison between CircleCI and Jenkins, and from any internal testing 
systems if folk can say,
 - look into tying in the Driver smoke tests.

If any contributors are keen on any of these things, do reach out. Anything to 
help C* be rock-solid.

regards,
Mick


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-27 Thread Patrick McFadin
I would love to get involved promoting those if and when a list produced.
Could this be something as a Cassandra confluence page? I would be happy to
volunteer some time keeping that up-to-date.

Patrick

On Fri, Jan 24, 2020 at 6:49 AM Joshua McKenzie 
wrote:

> >
> > an entry in the progress report?
>
> That'd be slick. I've had some people pinging me on slack asking about the
> easiest way to get involved with the project and ramp up, and I think
> refactoring and cleaning up a dtest or two would be another vector for
> people to get their feet wet. I like it!
>
>
> On Fri, Jan 24, 2020 at 12:38 AM Mick Semb Wever  wrote:
>
> >
> > > >  - parallelise dtests (because 12 hours is wild)
> > >
> > > That's one word for it. :)
> > >
> > >  We used to ad hoc take a crack at sorting the individual test times by
> > > longest and taking top-N and seeing if there was LHF to shave off that.
> > > Being on a flight atm, not having that data handy right now, and that
> not
> > > being in the linked logs from that pipeline run here (awesome work
> btw!),
> > > do we think that might be something worth doing periodically on the
> > project?
> >
> >
> > Yes I think so! Maybe even the longest dtest(s) can be an entry in the
> > progress report? Especially now we can rewrite dtests into either quick
> > "unit" tests using jvm-dtests or event diagnostics.
> >
> > Along we focus on dtests execution time, it would be nice to shore up the
> > flakey unit tests (there's only a handful), so that they are more steps
> in
> > the pipeline that hard fail (and fail-fast), giving faster feedback to
> the
> > contributor/reviewer.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Cassandra CI Status

2020-01-24 Thread Joshua McKenzie
>
> an entry in the progress report?

That'd be slick. I've had some people pinging me on slack asking about the
easiest way to get involved with the project and ramp up, and I think
refactoring and cleaning up a dtest or two would be another vector for
people to get their feet wet. I like it!


On Fri, Jan 24, 2020 at 12:38 AM Mick Semb Wever  wrote:

>
> > >  - parallelise dtests (because 12 hours is wild)
> >
> > That's one word for it. :)
> >
> >  We used to ad hoc take a crack at sorting the individual test times by
> > longest and taking top-N and seeing if there was LHF to shave off that.
> > Being on a flight atm, not having that data handy right now, and that not
> > being in the linked logs from that pipeline run here (awesome work btw!),
> > do we think that might be something worth doing periodically on the
> project?
>
>
> Yes I think so! Maybe even the longest dtest(s) can be an entry in the
> progress report? Especially now we can rewrite dtests into either quick
> "unit" tests using jvm-dtests or event diagnostics.
>
> Along we focus on dtests execution time, it would be nice to shore up the
> flakey unit tests (there's only a handful), so that they are more steps in
> the pipeline that hard fail (and fail-fast), giving faster feedback to the
> contributor/reviewer.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Cassandra CI Status

2020-01-23 Thread Mick Semb Wever


> >  - parallelise dtests (because 12 hours is wild)
> 
> That's one word for it. :)
> 
>  We used to ad hoc take a crack at sorting the individual test times by
> longest and taking top-N and seeing if there was LHF to shave off that.
> Being on a flight atm, not having that data handy right now, and that not
> being in the linked logs from that pipeline run here (awesome work btw!),
> do we think that might be something worth doing periodically on the project?


Yes I think so! Maybe even the longest dtest(s) can be an entry in the progress 
report? Especially now we can rewrite dtests into either quick "unit" tests 
using jvm-dtests or event diagnostics.

Along we focus on dtests execution time, it would be nice to shore up the 
flakey unit tests (there's only a handful), so that they are more steps in the 
pipeline that hard fail (and fail-fast), giving faster feedback to the 
contributor/reviewer. 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-23 Thread Joshua McKenzie
>
>  - parallelise dtests (because 12 hours is wild)

That's one word for it. :)

 We used to ad hoc take a crack at sorting the individual test times by
longest and taking top-N and seeing if there was LHF to shave off that.
Being on a flight atm, not having that data handy right now, and that not
being in the linked logs from that pipeline run here (awesome work btw!),
do we think that might be something worth doing periodically on the project?

On Thu, Jan 23, 2020 at 8:37 AM Mick Semb Wever  wrote:

>
> > > If I don't hear any objection, I'll commit this. Off this, as it
> > > aggregates test reports, it's now possible to start test posting
> emails
> > > with the test report summary, as well as bringing in the dtest builds
> > > into the pipeline.
> >
> >
> > Based on the pipeline approach I've gotten notifications to slack and
> > email working.
> >
> > Does anyone object if I send these to #cassandra-builds (a brand new
> > slack room) and to bui...@cassandra.apache.org ?
> >
> > This is not meant as anything perfect or finished, just to get
> > something out there, on which a pragmatic discussion can continue…
>
>
> Closing the loop on this^, these commits have been made and each of our
> release branches have a pipeline build.
>
> An example of a pipeline build result is
>
> https://builds.apache.org/blue/organizations/jenkins/Cassandra-trunk/detail/Cassandra-trunk/6/pipeline
>
> Notifications from these are being sent to slack #cassandra-builds and to
> the builds@c.a.o ML
>
> Next steps (I see) are:
>  - remove scm polling on the other builds, so that only pipelines trigger
> off new commits,
>  - look into posting test result summary from each pipeline back to jira
> as a comment (there will be a separate comment for each release branch a
> patch is pushed to),
>  - stabilise disk usage on the jenkins nodes (nodes fall over from full
> disks), some discussion on builds@a.o is happening on this,
>  - parallelise dtests (because 12 hours is wild)
>
>
> And thanks to David, Josh, Dinesh, and Michael, for your input.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Cassandra CI Status

2020-01-23 Thread Mick Semb Wever


> > If I don't hear any objection, I'll commit this. Off this, as it 
> > aggregates test reports, it's now possible to start test posting emails 
> > with the test report summary, as well as bringing in the dtest builds 
> > into the pipeline. 
> 
> 
> Based on the pipeline approach I've gotten notifications to slack and 
> email working.
> 
> Does anyone object if I send these to #cassandra-builds (a brand new 
> slack room) and to bui...@cassandra.apache.org ?
> 
> This is not meant as anything perfect or finished, just to get 
> something out there, on which a pragmatic discussion can continue…


Closing the loop on this^, these commits have been made and each of our release 
branches have a pipeline build.

An example of a pipeline build result is 
 
https://builds.apache.org/blue/organizations/jenkins/Cassandra-trunk/detail/Cassandra-trunk/6/pipeline
 

Notifications from these are being sent to slack #cassandra-builds and to the 
builds@c.a.o ML

Next steps (I see) are:
 - remove scm polling on the other builds, so that only pipelines trigger off 
new commits,
 - look into posting test result summary from each pipeline back to jira as a 
comment (there will be a separate comment for each release branch a patch is 
pushed to),
 - stabilise disk usage on the jenkins nodes (nodes fall over from full disks), 
some discussion on builds@a.o is happening on this,
 - parallelise dtests (because 12 hours is wild)


And thanks to David, Josh, Dinesh, and Michael, for your input.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-13 Thread Mick Semb Wever


> If I don't hear any objection, I'll commit this. Off this, as it 
> aggregates test reports, it's now possible to start test posting emails 
> with the test report summary, as well as bringing in the dtest builds 
> into the pipeline. 


Based on the pipeline approach I've gotten notifications to slack and email 
working.

Does anyone object if I send these to #cassandra-builds (a brand new slack 
room) and to bui...@cassandra.apache.org ?

This is not meant as anything perfect or finished, just to get something out 
there, on which a pragmatic discussion can continue…


An example of the email is below.
-- 

Build complete: Cassandra-devbranch-pipeline #92 [UNSTABLE]

GENERAL INFO

BUILD UNSTABLE
Build URL: https://builds.apache.org/job/Cassandra-devbranch-pipeline/92/
Project: Cassandra-devbranch-pipeline
Date of build: Mon, 13 Jan 2020 13:11:16 +
Build duration: 1 hr 16 min and counting



JUNIT RESULTS

Name: (root) Failed: 14 test(s), Passed: 0 test(s), Skipped: 0 test(s), Total: 
14 test(s)Failed: …
Name: junit.framework Failed: 1 test(s), Passed: 0 test(s), Skipped: 42 
test(s), Total: 43 test(s)Failed: 
junit.framework.TestSuite.org.apache.cassandra.io.sstable.CQLSSTableWriterTest-cdc
Name: org.apache.cassandra.audit Failed: 0 test(s), Passed: 285 test(s), 
Skipped: 0 test(s), Total: 285 test(s)
Name: org.apache.cassandra.auth Failed: 0 test(s), Passed: 119 test(s), 
Skipped: 0 test(s), Total: 119 test(s)
Name: org.apache.cassandra.auth.jmx Failed: 0 test(s), Passed: 89 test(s), 
Skipped: 0 test(s), Total: 89 test(s)
Name: org.apache.cassandra.batchlog Failed: 0 test(s), Passed: 37 test(s), 
Skipped: 0 test(s), Total: 37 test(s)
Name: org.apache.cassandra.cache Failed: 0 test(s), Passed: 16 test(s), 
Skipped: 0 test(s), Total: 16 test(s)
Name: org.apache.cassandra.concurrent Failed: 0 test(s), Passed: 38 test(s), 
Skipped: 0 test(s), Total: 38 test(s)
Name: org.apache.cassandra.config Failed: 0 test(s), Passed: 40 test(s), 
Skipped: 0 test(s), Total: 40 test(s)
Name: org.apache.cassandra.cql.jdbc Failed: 0 test(s), Passed: 4 test(s), 
Skipped: 0 test(s), Total: 4 test(s)
Name: org.apache.cassandra.cql3 Failed: 0 test(s), Passed: 832 test(s), 
Skipped: 16 test(s), Total: 848 test(s)
Name: org.apache.cassandra.cql3.conditions Failed: 0 test(s), Passed: 16 
test(s), Skipped: 0 test(s), Total: 16 test(s)
Name: org.apache.cassandra.cql3.functions Failed: 0 test(s), Passed: 124 
test(s), Skipped: 0 test(s), Total: 124 test(s)
Name: org.apache.cassandra.cql3.restrictions Failed: 0 test(s), Passed: 92 
test(s), Skipped: 0 test(s), Total: 92 test(s)
Name: org.apache.cassandra.cql3.selection Failed: 0 test(s), Passed: 40 
test(s), Skipped: 0 test(s), Total: 40 test(s)
Name: org.apache.cassandra.cql3.statements Failed: 0 test(s), Passed: 48 
test(s), Skipped: 0 test(s), Total: 48 test(s)
Name: org.apache.cassandra.cql3.validation.entities Failed: 0 test(s), Passed: 
1416 test(s), Skipped: 4 test(s), Total: 1420 test(s)
Name: org.apache.cassandra.cql3.validation.miscellaneous Failed: 0 test(s), 
Passed: 224 test(s), Skipped: 0 test(s), Total: 224 test(s)
Name: org.apache.cassandra.cql3.validation.operations Failed: 0 test(s), 
Passed: 1564 test(s), Skipped: 0 test(s), Total: 1564 test(s)
Name: org.apache.cassandra.db Failed: 0 test(s), Passed: 1434 test(s), Skipped: 
2 test(s), Total: 1436 test(s)
Name: org.apache.cassandra.db.aggregation Failed: 0 test(s), Passed: 24 
test(s), Skipped: 0 test(s), Total: 24 test(s)
Name: org.apache.cassandra.db.columniterator Failed: 0 test(s), Passed: 4 
test(s), Skipped: 0 test(s), Total: 4 test(s)
Name: org.apache.cassandra.db.commitlog Failed: 0 test(s), Passed: 1452 
test(s), Skipped: 4 test(s), Total: 1456 test(s)
Name: org.apache.cassandra.db.compaction Failed: 0 test(s), Passed: 692 
test(s), Skipped: 4 test(s), Total: 696 test(s)
Name: org.apache.cassandra.db.composites Failed: 0 test(s), Passed: 12 test(s), 
Skipped: 0 test(s), Total: 12 test(s)
Name: org.apache.cassandra.db.context Failed: 0 test(s), Passed: 32 test(s), 
Skipped: 0 test(s), Total: 32 test(s)
Name: org.apache.cassandra.db.filter Failed: 0 test(s), Passed: 28 test(s), 
Skipped: 0 test(s), Total: 28 test(s)
Name: org.apache.cassandra.db.lifecycle Failed: 0 test(s), Passed: 268 test(s), 
Skipped: 0 test(s), Total: 268 test(s)
Name: org.apache.cassandra.db.marshal Failed: 0 test(s), Passed: 444 test(s), 
Skipped: 0 test(s), Total: 444 test(s)
Name: org.apache.cassandra.db.monitoring Failed: 0 test(s), Passed: 60 test(s), 
Skipped: 0 test(s), Total: 60 test(s)
Name: org.apache.cassandra.db.partition Failed: 0 test(s), Passed: 40 test(s), 
Skipped: 0 test(s), Total: 40 test(s)
Name: org.apache.cassandra.db.partitions Failed: 0 test(s), Passed: 32 test(s), 
Skipped: 0 test(s), Total: 32 test(s)
Name: org.apache.cassandra.db.repair Failed: 1 test(s), Passed: 87 test(s), 
Skipped: 0 test(s), Total: 88 test(s)Failed: 
org.apache.cassandra.db.repair.PendingAntiCompactionBytemanTest.testException

Re: Cassandra CI Status

2020-01-12 Thread mck

> Moving to pure pipeline jobs with all build & test runs in one pipeline, 
> with a final single JIRA comment post operation of all the results in 
> one comment would be the only way I can think of to prevent the 
> multi-post problem. 


For the pipeline build, this is what I have so far putting together just the 
non-dtest builds.
https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch-pipeline/detail/Cassandra-devbranch-pipeline/84/pipeline
 

…and this is now being generated from the generated DSL file.
https://github.com/apache/cassandra-builds/compare/master...thelastpickle:mck/jenkins-dsl-pipeline
 


If I don't hear any objection, I'll commit this. Off this, as it aggregates 
test reports, it's now possible to start test posting emails with the test 
report summary, as well as bringing in the dtest builds into the pipeline. 

And I'm aware that "Unit Testing" is an accurate column name here, but… :shrug: 




Re: Cassandra CI Status

2020-01-11 Thread Michael Shuler




On 1/11/20 10:02 AM, Mick Semb Wever wrote:



This brings up the issue that links to builds on tickets should
ideally refer to information that is permanent. This could be
done by configuring builds to keep status and logs but not the
built artefacts (and/or adding bigger disks). I will now update
the builds to discard their artefacts quicker, since (afaik) no
one us using artefacts built by jenkins.


This is already being done, we only keep the latest job's
artifacts.



It was only being done for the artifact builds. I applied the same
artifactNumToKeep setting to all the builds in 5528779




If we wanted a more permanent place to keep results, we would need
some sort of result upload to somewhere. IMO, a jira comment on
some CASSANDRA-123456 that just showed up in CHANGES.txt that "this
patch passed/failed, here's that info" could be that permanent
place? If the link to the job 404s at some point, who cares, we
already have the feedback and could be re-run, if someone desires.



I'd rather not link, if the link will always quickly be stale.

Uploading a complete summary is a good idea. It's particularly the
test report that is of interest to me. Would each of the different
builds each upload their report to the jira ticket? That's a few
comments to append… Maybe something will come out the devbranch
pipeline build that i've been playing with for patches, an aggregated
summary would be nice, along with a link to "rebuild" that sha in the
pipeline build.


First, it takes commit discipline to trigger the correct ticket number, 
but we do that already for the most part in the commit message.


Moving to pure pipeline jobs with all build & test runs in one pipeline, 
wiht a final single JIRA comment post operation of all the results in 
one comment would be the only way I can think of to prevent the 
multi-post problem. (Which does get super annoying pretty quickly)


--
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-11 Thread Michael Shuler




On 1/11/20 9:17 AM, Michael Shuler wrote:



On 1/11/20 4:48 AM, Mick Semb Wever wrote:


This brings up the issue that links to builds on tickets should 
ideally refer to information that is permanent.
This could be done by configuring builds to keep status and logs but 
not the built artefacts (and/or adding bigger disks). I will now 
update the builds to discard their artefacts quicker, since (afaik) no 
one us using artefacts built by jenkins.


This is already being done, we only keep the latest job's artifacts. We 
are also keeping 5x more log history than INFRA really wants on 
builds.a.o. I got the ok to keep this many for our jobs, since they 
occasionally run scripts to globally modify job configs outside of DSL 
definitions. Basically, I've considered nothing on builds.a.o to remain 
permanent and I've had to run the DSL seed to rebuild jobs when they've 
been randomly edited by INFRA.


Job templates have:

     logRotator {
     numToKeep(50)
     artifactNumToKeep(1)
     }

If we wanted a more permanent place to keep results, we would need some 
sort of result upload to somewhere. IMO, a jira comment on some 
CASSANDRA-123456 that just showed up in CHANGES.txt that "this patch 
passed/failed, here's that info" could be that permanent place? If the 
link to the job 404s at some point, who cares, we already have the 
feedback and could be re-run, if someone desires.


Sorry, I was looking at the artifacts job template, which is the set of 
jobs that archive significant size on disk. I see you added 
artifactNumToKeep(1) to test run templates. Those jobs do not save much, 
other than the raw test output to debug test failures, so didn't take up 
much disk space compared to the artifact tarballs.


The disk space taken up by artifacts archiving is on the Jenkins master, 
so slave disk size (other than actual job run needs) plays no part in 
job history stuff.


Sorry I was on the wrong track.

--
Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-11 Thread Mick Semb Wever


> > This brings up the issue that links to builds on tickets should ideally 
> > refer to information that is permanent.
> > This could be done by configuring builds to keep status and logs but not 
> > the built artefacts (and/or adding bigger disks). I will now update the 
> > builds to discard their artefacts quicker, since (afaik) no one us using 
> > artefacts built by jenkins.
> 
> This is already being done, we only keep the latest job's artifacts.


It was only being done for the artifact builds.
I applied the same artifactNumToKeep setting to all the builds in 5528779



> If we wanted a more permanent place to keep results, we would need some 
> sort of result upload to somewhere. IMO, a jira comment on some 
> CASSANDRA-123456 that just showed up in CHANGES.txt that "this patch 
> passed/failed, here's that info" could be that permanent place? If the 
> link to the job 404s at some point, who cares, we already have the 
> feedback and could be re-run, if someone desires.


I'd rather not link, if the link will always quickly be stale.

Uploading a complete summary is a good idea. It's particularly the test report 
that is of interest to me. Would each of the different builds each upload their 
report to the jira ticket? That's a few comments to append… Maybe something 
will come out the devbranch pipeline build that i've been playing with for 
patches, an aggregated summary would be nice, along with a link to "rebuild" 
that sha in the pipeline build.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-11 Thread Michael Shuler




On 1/11/20 4:48 AM, Mick Semb Wever wrote:


This brings up the issue that links to builds on tickets should ideally refer 
to information that is permanent.
This could be done by configuring builds to keep status and logs but not the 
built artefacts (and/or adding bigger disks). I will now update the builds to 
discard their artefacts quicker, since (afaik) no one us using artefacts built 
by jenkins.


This is already being done, we only keep the latest job's artifacts. We 
are also keeping 5x more log history than INFRA really wants on 
builds.a.o. I got the ok to keep this many for our jobs, since they 
occasionally run scripts to globally modify job configs outside of DSL 
definitions. Basically, I've considered nothing on builds.a.o to remain 
permanent and I've had to run the DSL seed to rebuild jobs when they've 
been randomly edited by INFRA.


Job templates have:

logRotator {
numToKeep(50)
artifactNumToKeep(1)
}

If we wanted a more permanent place to keep results, we would need some 
sort of result upload to somewhere. IMO, a jira comment on some 
CASSANDRA-123456 that just showed up in CHANGES.txt that "this patch 
passed/failed, here's that info" could be that permanent place? If the 
link to the job 404s at some point, who cares, we already have the 
feedback and could be re-run, if someone desires.


--
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-11 Thread Mick Semb Wever


> I would still be in favour of adding the post-commit CI feedback 
> integration, though in a shorter format that what Hadoop does. It's a 
> nice finalising comment on the ticket. Maybe we can come back to this 
> once we stabilise Jenkins a bit more and ask what everyone thinks? (For 
> example, just now one of the agents: cassandra13; has filled up a disk 
> [INFRA-19701])


This brings up the issue that links to builds on tickets should ideally refer 
to information that is permanent. 
This could be done by configuring builds to keep status and logs but not the 
built artefacts (and/or adding bigger disks). I will now update the builds to 
discard their artefacts quicker, since (afaik) no one us using artefacts built 
by jenkins.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-10 Thread Mick Semb Wever


> Thank you for getting this fixed, Mick! Would it be possible to provide 
> CI feedback on Jira tickets?


Yes, i know that Hadoop does this.
For example 
https://issues.apache.org/jira/browse/HADOOP-16697?focusedCommentId=17012760&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17012760

But this would only work for testing what gets committed to the cassandra-2.2, 
cassandra-3.0, cassandra3-11, and trunk, branches. We still want to have CI 
results before the review is complete. That means either CircleCI or the manual 
(parameterised) Jenkins builds.

I would still be in favour of adding the post-commit CI feedback integration, 
though in a shorter format that what Hadoop does. It's a nice finalising 
comment on the ticket. Maybe we can come back to this once we stabilise Jenkins 
a bit more and ask what everyone thinks? (For example, just now one of the 
agents: cassandra13; has filled up a disk [INFRA-19701])

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-10 Thread Dinesh Joshi
Thank you for getting this fixed, Mick! Would it be possible to provide CI 
feedback on Jira tickets?

Dinesh

> On Jan 8, 2020, at 12:34 AM, Mick Semb Wever  wrote:
> 
> 
> I'm chuffed to say that ASF Jenkins builds for branches 2.2, 3.0, 3.11, and 
> trunk, are now all working.
> 
> Trunk:  https://builds.apache.org/view/A-D/view/Cassandra%20trunk/
>3.11:  https://builds.apache.org/view/A-D/view/Cassandra%203.11/
>  3.0:  https://builds.apache.org/view/A-D/view/Cassandra%203.0/
>  2.2:  https://builds.apache.org/view/A-D/view/Cassandra%202.2/
> 
> 
> The current setup for Cassandra's CI is a bit of a mess. Recently there's 
> been more attention on CircleCI, because it has shown to be less flakey than 
> ASF's Jenkins, and allows automated unit testing any contributor's patches. 
> Still CircleCI has short-comings: it provides no canonical build status for 
> our supported branches, you have to hack commit the circleci yaml, and unless 
> you work for a (super) large company that is willing to give you 2000+ 
> CircleCI containers the full test suite is out of your reach, not cool for an 
> OSS project. (If I'm wrong on that last point please correct me!)
> 
> So… I've gone against the grain and tried to give Jenkins a little love. All 
> the builds now are running again, except the dtest-large jobs which require 
> servers with 32GB+ ram which we no longer have. And the branches each have 
> their own view, as listed above.
> 
> In addition there's also a view for custom builds to test patches
>  https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/
> and in there is a pipeline job I'm working on so that patches only have to be 
> entered once: 
> https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/job/Cassandra-devbranch-pipeline/
> 
> If people have tickets with patches that they would like to test, ping me and 
> I'll happily enter them. And anyone with multiple patches I'd definitely 
> recommend setting up your own Jenkins, as Stefan Podkowinski has done a 
> pretty awesome job at automating the Jenkins job creation: 
> http://cassandra.apache.org/doc/latest/development/ci.html
> 
> One of the things I'd like to do next, as jobs stabilise, is to continue to 
> add email failure notifications so we're quicker to catch regressions on our 
> main branches. And I'm curious as to what others feel is needed and should be 
> the direction we head in.
> 
> regards,
> Mick
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-10 Thread Joshua McKenzie
>
> On my 16gb quad i7 laptop dtests take 4-5 hours

Hm. This is actually faster than I'd have expected so that's a positive for
my Friday :). Makes me wonder how things would behave on one of the new
threadrippers w/PCIe-4.

And yes, I'm looking for excuses to get one of those.

On Fri, Jan 10, 2020 at 1:25 AM Mick Semb Wever  wrote:

>
> > We had roughly 7 jenkins slaves DS had donated languish I'm looking into
> > (came up on slack yesterday morning) so hopefully we'll have some more
> > resources back in the pool soon.
>
>
> Thanks Josh!
>
> Currently all the hardware is provided by Instaclustr.
>
> Any hardware for running dtests, from anyone, would be hugely appreciated.
> On my 16gb quad i7 laptop dtests take 4-5 hours. On the jenkins agents we
> currently have it takes ~11 hours.
>
>
>
> > One other thing we've seen repeatedly over the years is startlingly
> > low-hanging-fruit of some badly performing tests (long stress generation,
> > hangs leading to timeout kills, etc) that greatly extend the testing
> > duration beyond what's strictly necessary to get the signal out of it.
>
>
> Right, indeed we're wasting a huge amount of resources with the "test-all"
> build plans. It is supposed to execute the "test", "long-test",
> "test-compression", "stress-test", "fqltool-test" targets, but only
> executes the "test" target bc some unit tests fail.
>
> Fixed is in CASSANDRA-15496 – Split out Jenkins test-all builds to
> individual builds for each of the test targets.
>
> With that in place I can come back with a list of 'badly performing tests'.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Cassandra CI Status

2020-01-10 Thread Mick Semb Wever


> We had roughly 7 jenkins slaves DS had donated languish I'm looking into
> (came up on slack yesterday morning) so hopefully we'll have some more
> resources back in the pool soon.


Thanks Josh! 

Currently all the hardware is provided by Instaclustr.

Any hardware for running dtests, from anyone, would be hugely appreciated. On 
my 16gb quad i7 laptop dtests take 4-5 hours. On the jenkins agents we 
currently have it takes ~11 hours. 



> One other thing we've seen repeatedly over the years is startlingly
> low-hanging-fruit of some badly performing tests (long stress generation,
> hangs leading to timeout kills, etc) that greatly extend the testing
> duration beyond what's strictly necessary to get the signal out of it.


Right, indeed we're wasting a huge amount of resources with the "test-all" 
build plans. It is supposed to execute the "test", "long-test", 
"test-compression", "stress-test", "fqltool-test" targets, but only executes 
the "test" target bc some unit tests fail. 

Fixed is in CASSANDRA-15496 – Split out Jenkins test-all builds to individual 
builds for each of the test targets.

With that in place I can come back with a list of 'badly performing tests'.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra CI Status

2020-01-08 Thread Joshua McKenzie
Thanks for taking this on Mick.

We had roughly 7 jenkins slaves DS had donated languish I'm looking into
(came up on slack yesterday morning) so hopefully we'll have some more
resources back in the pool soon.

One other thing we've seen repeatedly over the years is startlingly
low-hanging-fruit of some badly performing tests (long stress generation,
hangs leading to timeout kills, etc) that greatly extend the testing
duration beyond what's strictly necessary to get the signal out of it.
There may be value in further conversation about a periodic "health check"
on tests and pruning out / refactoring / cleaning up some of the most
egregious offenders in terms of inefficiency over time.

Also happens to be a really good way to ramp on the code-base; for me,
working through rewriting unit tests for the StorageEngine rewrite
definitely gave me a broad overview of things I don't think I'd otherwise
have gotten.

On Wed, Jan 8, 2020 at 12:34 AM Mick Semb Wever  wrote:

>
> I'm chuffed to say that ASF Jenkins builds for branches 2.2, 3.0, 3.11,
> and trunk, are now all working.
>
> Trunk:  https://builds.apache.org/view/A-D/view/Cassandra%20trunk/
>3.11:  https://builds.apache.org/view/A-D/view/Cassandra%203.11/
>  3.0:  https://builds.apache.org/view/A-D/view/Cassandra%203.0/
>  2.2:  https://builds.apache.org/view/A-D/view/Cassandra%202.2/
>
>
> The current setup for Cassandra's CI is a bit of a mess. Recently there's
> been more attention on CircleCI, because it has shown to be less flakey
> than ASF's Jenkins, and allows automated unit testing any contributor's
> patches. Still CircleCI has short-comings: it provides no canonical build
> status for our supported branches, you have to hack commit the circleci
> yaml, and unless you work for a (super) large company that is willing to
> give you 2000+ CircleCI containers the full test suite is out of your
> reach, not cool for an OSS project. (If I'm wrong on that last point please
> correct me!)
>
> So… I've gone against the grain and tried to give Jenkins a little love.
> All the builds now are running again, except the dtest-large jobs which
> require servers with 32GB+ ram which we no longer have. And the branches
> each have their own view, as listed above.
>
> In addition there's also a view for custom builds to test patches
>  https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/
> and in there is a pipeline job I'm working on so that patches only have to
> be entered once:
> https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/job/Cassandra-devbranch-pipeline/
>
> If people have tickets with patches that they would like to test, ping me
> and I'll happily enter them. And anyone with multiple patches I'd
> definitely recommend setting up your own Jenkins, as Stefan Podkowinski has
> done a pretty awesome job at automating the Jenkins job creation:
> http://cassandra.apache.org/doc/latest/development/ci.html
>
> One of the things I'd like to do next, as jobs stabilise, is to continue
> to add email failure notifications so we're quicker to catch regressions on
> our main branches. And I'm curious as to what others feel is needed and
> should be the direction we head in.
>
> regards,
> Mick
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Cassandra CI Status

2020-01-08 Thread Mick Semb Wever


I'm chuffed to say that ASF Jenkins builds for branches 2.2, 3.0, 3.11, and 
trunk, are now all working.

Trunk:  https://builds.apache.org/view/A-D/view/Cassandra%20trunk/
   3.11:  https://builds.apache.org/view/A-D/view/Cassandra%203.11/
 3.0:  https://builds.apache.org/view/A-D/view/Cassandra%203.0/
 2.2:  https://builds.apache.org/view/A-D/view/Cassandra%202.2/


The current setup for Cassandra's CI is a bit of a mess. Recently there's been 
more attention on CircleCI, because it has shown to be less flakey than ASF's 
Jenkins, and allows automated unit testing any contributor's patches. Still 
CircleCI has short-comings: it provides no canonical build status for our 
supported branches, you have to hack commit the circleci yaml, and unless you 
work for a (super) large company that is willing to give you 2000+ CircleCI 
containers the full test suite is out of your reach, not cool for an OSS 
project. (If I'm wrong on that last point please correct me!)

So… I've gone against the grain and tried to give Jenkins a little love. All 
the builds now are running again, except the dtest-large jobs which require 
servers with 32GB+ ram which we no longer have. And the branches each have 
their own view, as listed above.

In addition there's also a view for custom builds to test patches
 https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/
and in there is a pipeline job I'm working on so that patches only have to be 
entered once: 
https://builds.apache.org/view/A-D/view/Cassandra%20%C2%A0patches/job/Cassandra-devbranch-pipeline/

If people have tickets with patches that they would like to test, ping me and 
I'll happily enter them. And anyone with multiple patches I'd definitely 
recommend setting up your own Jenkins, as Stefan Podkowinski has done a pretty 
awesome job at automating the Jenkins job creation: 
http://cassandra.apache.org/doc/latest/development/ci.html

One of the things I'd like to do next, as jobs stabilise, is to continue to add 
email failure notifications so we're quicker to catch regressions on our main 
branches. And I'm curious as to what others feel is needed and should be the 
direction we head in.

regards,
Mick




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org