style95 opened a new pull request, #5432:
URL: https://github.com/apache/openwhisk/pull/5432

   ## Description
   This is to fix a bug in that ETCD data for non-existent containers persist 
forever.
   
   This bug happens when a shared action is invoked.
   When the queue manager receives an activation message, it finds a queue 
based on the `docId` only and this is wrong.
   As a result, the activation message could be sent to the wrong queue as long 
as the `docId` is same.
   But the `docId` is same for all shared actions.
   
   ex) 
   - original action: `/whisk.system/sharedPackage/hello`
   - bound action1: `/style95/myPackage/hello` (docId: 
`/whisk.system/sharedPackage/hello`)
   - bound action2: `/bdoyle/yourPackage/hello` (docId: 
`/whisk.system/sharedPackage/hello`) 
   
   So an activation for `/style95/myPackage/hello` could be sent to 
`/bdoyle/yourPackage/hello`.
   Then a memory queue will send the activation to a container for 
`/style95/myPackage/hello`.
   The container is initialized with `/style95/myPackage/hello` and it 
registers the ETCD data for the running container.
   But after executing the activation for `/bdoyle/yourPackage/hello`, it tries 
to remove the ETCD data for the running container using the key prefix with 
`/bdoyle/yourPackage/hello` because the key now resides in the container 
data(`WarmData`).
   Accordingly, the original ETCD data for the running container is not deleted 
forever.
   
   Please refer to the following logs that I found.
   (The shared action is `whisk.system/sharedPackage/hello` and I replaced the 
name of two namespace to `style95` and `bdoyle`  from the original logs just to 
hide proprietary information.)
   
   ```
   [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] Got activation message 98481e836089419d881e836089d19d00 for 
Namespace(style95,07e316f4-9393-4777-a1d3-650da22a83f0)/whisk.system/sharedPackage/hello@0.0.38
 from kafka.
   [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] [98481e836089419d881e836089d19d00] the key 
whisk/queue/style95/whisk.system/sharedPackage/hello/leader is not in the 
initRevisionMap. revision: 38-5e7fc51a452c685030fdfcec61e3bdf1
   [2023-08-04T16:31:49.392Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] [98481e836089419d881e836089d19d00] send activation to remote 
queue, key: whisk/queue/style95/whisk.system/sharedPackage/hello/leader 
revision: 38-5e7fc51a452c685030fdfcec61e3bdf1
   [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] add a new actor selection to a map with key: 
whisk/queue/style95/whisk.system/sharedPackage/hello/leader
   [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] Got activation message 98481e836089419d881e836089d19d00 for 
Namespace(style95,07e316f4-9393-4777-a1d3-650da22a83f0)/whisk.system/sharedPackage/hello@0.0.38
 from remote queue manager.
   [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[QueueManager] Queue for action whisk.system/sharedPackage/hello@0.0.38 is 
already recovered, skip
   [2023-08-04T16:31:49.419Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[MemoryQueue] [bdoyle:whisk.system/sharedPackage/hello@0.0.38:Running] got a 
new activation message 98481e836089419d881e836089d19d00
   ```
   
   As you can see above the activation(`98481e836089419d881e836089d19d00`) is 
originally for `style95` but it is finally sent to the queue for `bdoyle`.
   
   And the queue sent this activation to a container with an 
ID(`f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`).
   ```
   [2023-08-04T16:31:49.686Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[MemoryQueue] [bdoyle:whisk.system/sharedPackage/hello@0.0.38:Running] Get 
activation request 
f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff, send one 
message: 98481e836089419d881e836089d19d00
   ```
   
   This container was originally created for `bdoyle`.
   
   ```
   [2023-08-04T16:31:49.623Z] [INFO] [#tid_sid_unknown] 
[FunctionPullingContainerPool] received a container creation message: 
f3dc23dbb34c43889c23dbb34c93882e
   [2023-08-04T16:31:49.683Z] [INFO] [#tid_sid_unknown] [WatcherService] watch 
endpoint: 
WatchEndpoint(whisk/namespace/bdoyle/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,,false,data-management-service,Set(DeleteEvent))
   [2023-08-04T16:31:49.685Z] [INFO] [#tid_sid_unknown] [FPCInvokerReactive] 
Posted success ack of container creation f3dc23dbb34c43889c23dbb34c93882e for 
bdoyle/whisk.system/sharedPackage/hello@0.0.38
   ```
   
   But when the same 
container(`f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`) 
is being paused, it tries to unwatch endpoint based on the `style95` key 
because it executed an activation for `style95` namespace.
   
   ```
   [2023-08-04T16:31:49.688Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[FunctionPullingContainerProxy] received a message 
98481e836089419d881e836089d19d00 for whisk.system/sharedPackage/hello@0.0.38 in 
ClientCreated
   [2023-08-04T16:31:49.688Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[DockerContainer] sending initialization to 
ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff) 
ContainerAddress(172.17.0.34,8080)
   [2023-08-04T16:31:49.888Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[DockerContainer] initialization result: ok
   [2023-08-04T16:31:49.888Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[DockerContainer] sending arguments to /whisk.system/sharedPackage/hello at 
ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff) 
ContainerAddress(172.17.0.34,8080)
   [2023-08-04T16:31:49.992Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[DockerContainer] running result: ok
   [2023-08-04T16:31:49.992Z] [INFO] [#tid_5de45169ee33b5678a8067431ba09ca8] 
[MessagingActiveAck] posted completion of activation 
98481e836089419d881e836089d19d00
   
   
   [2023-08-04T16:32:00.007Z] [INFO] [#tid_sid_unknown] 
[FunctionPullingContainerProxy] No more run activation is coming in state: 
Running, action: ExecutableWhiskAction/whisk.system/sharedPackage/hello@0.0.38, 
container: 
ContainerId(f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff)
   [2023-08-04T16:32:00.472Z] [INFO] [#tid_sid_invokerNanny] [RuncClient] 
running /usr/bin/docker-runc pause 
f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff (timeout: 10 
seconds)
   [2023-08-04T16:32:00.544Z] [INFO] [#tid_sid_unknown] [WatcherService] watch 
endpoint: 
WatchEndpoint(whisk/warmed/style95/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker/0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,,false,data-management-service,Set(DeleteEvent))
   [2023-08-04T16:32:00.544Z] [INFO] [#tid_sid_unknown] [WatcherService] 
unwatch endpoint: 
UnwatchEndpoint(whisk/namespace/style95/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff,false,data-management-service,true)
   ```
   
   Consequently, the data for 
`bdoyle`(`whisk/namespace/bdoyle/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff`)
 is not removed forever unless the invoker is restarted.
   
   ## Related issue and scope
   <!--- Please include a link to a related issue if there is one. -->
   - [ ] I opened an issue to propose and discuss this change (#????)
   
   ## My changes affect the following components
   <!--- Select below all system components are affected by your change. -->
   <!--- Enter an `x` in all applicable boxes. -->
   - [ ] API
   - [ ] Controller
   - [ ] Message Bus (e.g., Kafka)
   - [ ] Loadbalancer
   - [x] Scheduler
   - [x] Invoker
   - [ ] Intrinsic actions (e.g., sequences, conductors)
   - [ ] Data stores (e.g., CouchDB)
   - [ ] Tests
   - [ ] Deployment
   - [ ] CLI
   - [ ] General tooling
   - [ ] Documentation
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Use `x` in all the 
boxes that apply: -->
   - [x] Bug fix (generally a non-breaking change which closes an issue).
   - [ ] Enhancement or new feature (adds new functionality).
   - [ ] Breaking change (a bug fix or enhancement which changes existing 
behavior).
   
   ## Checklist:
   <!--- Please review the points below which help you make sure you've covered 
all aspects of the change you're making. -->
   
   - [x] I signed an [Apache 
CLA](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md).
   - [x] I reviewed the [style 
guides](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md#coding-standards)
 and followed the recommendations (Travis CI will check :).
   - [ ] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to