[jira] [Updated] (CASSANDRA-17305) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade

2022-06-01 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17305:

Resolution: Cannot Reproduce
Status: Resolved  (was: Open)

> Test Failure: 
> dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade
> -
>
> Key: CASSANDRA-17305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17305
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.1-beta, 4.x
>
>
> Failed 2 times in the last 29 runs. Flakiness: 7%, Stability: 93%
> Error Message
> Failed: Timeout >900.0s
> {code}
> Stacktrace
> self = 
>   object at 0x7f19858d59d0>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:313: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:435: in upgrade_scenario
> cluster.stop()
> ../venv/lib/python3.8/site-packages/ccmlib/cluster.py:576: in stop
> if not node.stop(wait=wait, signal_event=signal_event, **kwargs):
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = , wait = True
> wait_other_notice = False, signal_event = , kwargs = {}
> still_running = True, wait_time_sec = 16, i = 4
> def stop(self, wait=True, wait_other_notice=False, 
> signal_event=signal.SIGTERM, **kwargs):
> """
> Stop the node.
>   - wait: if True (the default), wait for the Cassandra process 
> to be
> really dead. Otherwise return after having sent the kill 
> signal.
>   - wait_other_notice: return only when the other live nodes of 
> the
> cluster have marked this node has dead.
>   - signal_event: Signal event to send to Cassandra; default is to
> let Cassandra clean up and shut down properly (SIGTERM [15])
>   - Optional:
>  + gently: Let Cassandra clean up and shut down properly; 
> unless
>false perform a 'kill -9' which shuts down faster.
> """
> if self.is_running():
> if wait_other_notice:
> marks = [(node, node.mark_log()) for node in 
> list(self.cluster.nodes.values()) if node.is_live() and node is not self]
> 
> if common.is_win():
> # Just taskkill the instance, don't bother trying to shut it 
> down gracefully.
> # Node recovery should prevent data loss from hard shutdown.
> # We have recurring issues with nodes not stopping / 
> releasing files in the CI
> # environment so it makes more sense just to murder it hard 
> since there's
> # really little downside.
> 
> # We want the node to flush its data before shutdown as some 
> tests rely on small writes being present.
> # The default Periodic sync at 10 ms may not have flushed 
> data yet, causing tests to fail.
> # This is not a hard requirement, however, so we swallow any 
> exceptions this may throw and kill anyway.
> if signal_event is signal.SIGTERM:
> try:
> self.flush()
> except:
> common.warning("Failed to flush node: {0} on 
> shutdown.".format(self.name))
> pass
> 
> os.system("taskkill /F /PID " + str(self.pid))
> if self._find_pid_on_windows():
> common.warning("Failed to terminate node: {0} with pid: 
> {1}".format(self.name, self.pid))
> else:
> # Determine if the signal event should be updated to keep API 
> compatibility
> if 'gently' in kwargs and kwargs['gently'] is False:
> signal_event = signal.SIGKILL
> 
> os.kill(self.pid, signal_event)
> 
> if wait_other_notice:
> for node, mark in marks:
> node.watch_log_for_death(self, from_mark=mark)
> else:
> time.sleep(.1)
> 
> still_running = self.is_running()
> if still_running and wait:
> wait_time_sec = 1
> for i in xrange(0, 7):
> # we'll 

[jira] [Updated] (CASSANDRA-17305) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade

2022-05-11 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17305:

Fix Version/s: 4.x

> Test Failure: 
> dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade
> -
>
> Key: CASSANDRA-17305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17305
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.x, 4.1-beta
>
>
> Failed 2 times in the last 29 runs. Flakiness: 7%, Stability: 93%
> Error Message
> Failed: Timeout >900.0s
> {code}
> Stacktrace
> self = 
>   object at 0x7f19858d59d0>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:313: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:435: in upgrade_scenario
> cluster.stop()
> ../venv/lib/python3.8/site-packages/ccmlib/cluster.py:576: in stop
> if not node.stop(wait=wait, signal_event=signal_event, **kwargs):
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = , wait = True
> wait_other_notice = False, signal_event = , kwargs = {}
> still_running = True, wait_time_sec = 16, i = 4
> def stop(self, wait=True, wait_other_notice=False, 
> signal_event=signal.SIGTERM, **kwargs):
> """
> Stop the node.
>   - wait: if True (the default), wait for the Cassandra process 
> to be
> really dead. Otherwise return after having sent the kill 
> signal.
>   - wait_other_notice: return only when the other live nodes of 
> the
> cluster have marked this node has dead.
>   - signal_event: Signal event to send to Cassandra; default is to
> let Cassandra clean up and shut down properly (SIGTERM [15])
>   - Optional:
>  + gently: Let Cassandra clean up and shut down properly; 
> unless
>false perform a 'kill -9' which shuts down faster.
> """
> if self.is_running():
> if wait_other_notice:
> marks = [(node, node.mark_log()) for node in 
> list(self.cluster.nodes.values()) if node.is_live() and node is not self]
> 
> if common.is_win():
> # Just taskkill the instance, don't bother trying to shut it 
> down gracefully.
> # Node recovery should prevent data loss from hard shutdown.
> # We have recurring issues with nodes not stopping / 
> releasing files in the CI
> # environment so it makes more sense just to murder it hard 
> since there's
> # really little downside.
> 
> # We want the node to flush its data before shutdown as some 
> tests rely on small writes being present.
> # The default Periodic sync at 10 ms may not have flushed 
> data yet, causing tests to fail.
> # This is not a hard requirement, however, so we swallow any 
> exceptions this may throw and kill anyway.
> if signal_event is signal.SIGTERM:
> try:
> self.flush()
> except:
> common.warning("Failed to flush node: {0} on 
> shutdown.".format(self.name))
> pass
> 
> os.system("taskkill /F /PID " + str(self.pid))
> if self._find_pid_on_windows():
> common.warning("Failed to terminate node: {0} with pid: 
> {1}".format(self.name, self.pid))
> else:
> # Determine if the signal event should be updated to keep API 
> compatibility
> if 'gently' in kwargs and kwargs['gently'] is False:
> signal_event = signal.SIGKILL
> 
> os.kill(self.pid, signal_event)
> 
> if wait_other_notice:
> for node, mark in marks:
> node.watch_log_for_death(self, from_mark=mark)
> else:
> time.sleep(.1)
> 
> still_running = self.is_running()
> if still_running and wait:
> wait_time_sec = 1
> for i in xrange(0, 7):
> # we'll double the wait time each try and cassandra 

[jira] [Updated] (CASSANDRA-17305) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade

2022-05-11 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17305:

   Complexity: Normal
Discovered By: User Report
 Severity: Low
   Status: Open  (was: Triage Needed)

> Test Failure: 
> dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade
> -
>
> Key: CASSANDRA-17305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17305
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.1-beta
>
>
> Failed 2 times in the last 29 runs. Flakiness: 7%, Stability: 93%
> Error Message
> Failed: Timeout >900.0s
> {code}
> Stacktrace
> self = 
>   object at 0x7f19858d59d0>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:313: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:435: in upgrade_scenario
> cluster.stop()
> ../venv/lib/python3.8/site-packages/ccmlib/cluster.py:576: in stop
> if not node.stop(wait=wait, signal_event=signal_event, **kwargs):
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = , wait = True
> wait_other_notice = False, signal_event = , kwargs = {}
> still_running = True, wait_time_sec = 16, i = 4
> def stop(self, wait=True, wait_other_notice=False, 
> signal_event=signal.SIGTERM, **kwargs):
> """
> Stop the node.
>   - wait: if True (the default), wait for the Cassandra process 
> to be
> really dead. Otherwise return after having sent the kill 
> signal.
>   - wait_other_notice: return only when the other live nodes of 
> the
> cluster have marked this node has dead.
>   - signal_event: Signal event to send to Cassandra; default is to
> let Cassandra clean up and shut down properly (SIGTERM [15])
>   - Optional:
>  + gently: Let Cassandra clean up and shut down properly; 
> unless
>false perform a 'kill -9' which shuts down faster.
> """
> if self.is_running():
> if wait_other_notice:
> marks = [(node, node.mark_log()) for node in 
> list(self.cluster.nodes.values()) if node.is_live() and node is not self]
> 
> if common.is_win():
> # Just taskkill the instance, don't bother trying to shut it 
> down gracefully.
> # Node recovery should prevent data loss from hard shutdown.
> # We have recurring issues with nodes not stopping / 
> releasing files in the CI
> # environment so it makes more sense just to murder it hard 
> since there's
> # really little downside.
> 
> # We want the node to flush its data before shutdown as some 
> tests rely on small writes being present.
> # The default Periodic sync at 10 ms may not have flushed 
> data yet, causing tests to fail.
> # This is not a hard requirement, however, so we swallow any 
> exceptions this may throw and kill anyway.
> if signal_event is signal.SIGTERM:
> try:
> self.flush()
> except:
> common.warning("Failed to flush node: {0} on 
> shutdown.".format(self.name))
> pass
> 
> os.system("taskkill /F /PID " + str(self.pid))
> if self._find_pid_on_windows():
> common.warning("Failed to terminate node: {0} with pid: 
> {1}".format(self.name, self.pid))
> else:
> # Determine if the signal event should be updated to keep API 
> compatibility
> if 'gently' in kwargs and kwargs['gently'] is False:
> signal_event = signal.SIGKILL
> 
> os.kill(self.pid, signal_event)
> 
> if wait_other_notice:
> for node, mark in marks:
> node.watch_log_for_death(self, from_mark=mark)
> else:
> time.sleep(.1)
> 
> still_running = self.is_running()
> if still_running and wait:
> wait_time_sec = 1
> for 

[jira] [Updated] (CASSANDRA-17305) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade

2022-05-11 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-17305:
--
Fix Version/s: 4.1-beta

> Test Failure: 
> dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade
> -
>
> Key: CASSANDRA-17305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17305
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.1-beta
>
>
> Failed 2 times in the last 29 runs. Flakiness: 7%, Stability: 93%
> Error Message
> Failed: Timeout >900.0s
> {code}
> Stacktrace
> self = 
>   object at 0x7f19858d59d0>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:313: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:435: in upgrade_scenario
> cluster.stop()
> ../venv/lib/python3.8/site-packages/ccmlib/cluster.py:576: in stop
> if not node.stop(wait=wait, signal_event=signal_event, **kwargs):
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = , wait = True
> wait_other_notice = False, signal_event = , kwargs = {}
> still_running = True, wait_time_sec = 16, i = 4
> def stop(self, wait=True, wait_other_notice=False, 
> signal_event=signal.SIGTERM, **kwargs):
> """
> Stop the node.
>   - wait: if True (the default), wait for the Cassandra process 
> to be
> really dead. Otherwise return after having sent the kill 
> signal.
>   - wait_other_notice: return only when the other live nodes of 
> the
> cluster have marked this node has dead.
>   - signal_event: Signal event to send to Cassandra; default is to
> let Cassandra clean up and shut down properly (SIGTERM [15])
>   - Optional:
>  + gently: Let Cassandra clean up and shut down properly; 
> unless
>false perform a 'kill -9' which shuts down faster.
> """
> if self.is_running():
> if wait_other_notice:
> marks = [(node, node.mark_log()) for node in 
> list(self.cluster.nodes.values()) if node.is_live() and node is not self]
> 
> if common.is_win():
> # Just taskkill the instance, don't bother trying to shut it 
> down gracefully.
> # Node recovery should prevent data loss from hard shutdown.
> # We have recurring issues with nodes not stopping / 
> releasing files in the CI
> # environment so it makes more sense just to murder it hard 
> since there's
> # really little downside.
> 
> # We want the node to flush its data before shutdown as some 
> tests rely on small writes being present.
> # The default Periodic sync at 10 ms may not have flushed 
> data yet, causing tests to fail.
> # This is not a hard requirement, however, so we swallow any 
> exceptions this may throw and kill anyway.
> if signal_event is signal.SIGTERM:
> try:
> self.flush()
> except:
> common.warning("Failed to flush node: {0} on 
> shutdown.".format(self.name))
> pass
> 
> os.system("taskkill /F /PID " + str(self.pid))
> if self._find_pid_on_windows():
> common.warning("Failed to terminate node: {0} with pid: 
> {1}".format(self.name, self.pid))
> else:
> # Determine if the signal event should be updated to keep API 
> compatibility
> if 'gently' in kwargs and kwargs['gently'] is False:
> signal_event = signal.SIGKILL
> 
> os.kill(self.pid, signal_event)
> 
> if wait_other_notice:
> for node, mark in marks:
> node.watch_log_for_death(self, from_mark=mark)
> else:
> time.sleep(.1)
> 
> still_running = self.is_running()
> if still_running and wait:
> wait_time_sec = 1
> for i in xrange(0, 7):
> # we'll double the wait time each try and cassandra should
>  

[jira] [Updated] (CASSANDRA-17305) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade

2022-01-26 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-17305:
--
Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)

> Test Failure: 
> dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_3_11_X_HEAD.test_parallel_upgrade
> -
>
> Key: CASSANDRA-17305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17305
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
>
> Failed 2 times in the last 29 runs. Flakiness: 7%, Stability: 93%
> Error Message
> Failed: Timeout >900.0s
> {code}
> Stacktrace
> self = 
>   object at 0x7f19858d59d0>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:313: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:435: in upgrade_scenario
> cluster.stop()
> ../venv/lib/python3.8/site-packages/ccmlib/cluster.py:576: in stop
> if not node.stop(wait=wait, signal_event=signal_event, **kwargs):
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = , wait = True
> wait_other_notice = False, signal_event = , kwargs = {}
> still_running = True, wait_time_sec = 16, i = 4
> def stop(self, wait=True, wait_other_notice=False, 
> signal_event=signal.SIGTERM, **kwargs):
> """
> Stop the node.
>   - wait: if True (the default), wait for the Cassandra process 
> to be
> really dead. Otherwise return after having sent the kill 
> signal.
>   - wait_other_notice: return only when the other live nodes of 
> the
> cluster have marked this node has dead.
>   - signal_event: Signal event to send to Cassandra; default is to
> let Cassandra clean up and shut down properly (SIGTERM [15])
>   - Optional:
>  + gently: Let Cassandra clean up and shut down properly; 
> unless
>false perform a 'kill -9' which shuts down faster.
> """
> if self.is_running():
> if wait_other_notice:
> marks = [(node, node.mark_log()) for node in 
> list(self.cluster.nodes.values()) if node.is_live() and node is not self]
> 
> if common.is_win():
> # Just taskkill the instance, don't bother trying to shut it 
> down gracefully.
> # Node recovery should prevent data loss from hard shutdown.
> # We have recurring issues with nodes not stopping / 
> releasing files in the CI
> # environment so it makes more sense just to murder it hard 
> since there's
> # really little downside.
> 
> # We want the node to flush its data before shutdown as some 
> tests rely on small writes being present.
> # The default Periodic sync at 10 ms may not have flushed 
> data yet, causing tests to fail.
> # This is not a hard requirement, however, so we swallow any 
> exceptions this may throw and kill anyway.
> if signal_event is signal.SIGTERM:
> try:
> self.flush()
> except:
> common.warning("Failed to flush node: {0} on 
> shutdown.".format(self.name))
> pass
> 
> os.system("taskkill /F /PID " + str(self.pid))
> if self._find_pid_on_windows():
> common.warning("Failed to terminate node: {0} with pid: 
> {1}".format(self.name, self.pid))
> else:
> # Determine if the signal event should be updated to keep API 
> compatibility
> if 'gently' in kwargs and kwargs['gently'] is False:
> signal_event = signal.SIGKILL
> 
> os.kill(self.pid, signal_event)
> 
> if wait_other_notice:
> for node, mark in marks:
> node.watch_log_for_death(self, from_mark=mark)
> else:
> time.sleep(.1)
> 
> still_running = self.is_running()
> if still_running and wait:
> wait_time_sec = 1
> for i in xrange(0, 7):
> # we'll double the wait time each try and