On Fri, Aug 8, 2025 at 3:52 AM Timpl, Markus <[email protected]>
wrote:

> Hello,
>
> I have a test where multiple users(15 users join, wait 20 seconds,
> disconnect/rejoin and so on) are joining the same connection(rdp in this
> case) in quick succession.
> This test reliably deadlocks guacd(1.6.0 unmodified). The two interesting
> threads are:
>
> #0 pthread_rwlock_rdlock from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 guac_rwlock_acquire_read_lock at rwlock.c:228
> #2 guac_display_layer_get_bounds at display-layer.c:51
> #3 guac_display_dup at display.c:259
> #4 guac_rdp_join_pending_handler at client.c:135
> #5 guac_client_promote_pending_users at client.c:178
> #6 guac_client_pending_users_thread at client.c:246
> #7 start_thread from /lib/x86_64-linux-gnu/libpthread.so.0
> #8 clone from /lib/x86_64-linux-gnu/libc.so.6
>
> --> has __pending_users_lock(which breaks adding/removing users) waits for
> pending_frame.lock has read lock on last_frame.lock
>
>
> #0 pthread_rwlock_wrlock from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 guac_rwlock_acquire_write_lock at rwlock.c:186
> #2 guac_display_end_multiple_frames at display-flush.c:323
> #3 guac_display_worker_thread at display-worker.c:461
> #4 start_thread from /lib/x86_64-linux-gnu/libpthread.so.0
> #5 clone from /lib/x86_64-linux-gnu/libc.so.6
>
> --> has pending_frame.lock waits for last_frame.lock
>
> I tried to fix this be acquiring pending_frame.lock in guac_display_dup
> before getting the last_frame.lock.
> This makes things a lot better.
>
> But now I hit another issue reliably after about a minute. Somehow a
> thread is stuck in a socket write operation:
> #0 write from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 guac_socket_fd_write at socket-fd.c:109
> #2 guac_socket_fd_flush at socket-fd.c:189
> #3 guac_socket_fd_write_buffered at socket-fd.c:263
> #4 guac_socket_fd_write_handler at socket-fd.c:318
> #5 __guac_socket_write at socket.c:91
> #6 guac_socket_write at socket.c:107
> #7 __write_chunk_callback at socket-broadcast.c:135
> #8 guac_client_foreach_pending_user at client.c:560
> #9 __guac_socket_broadcast_write_handler at socket-broadcast.c:173
> #10 __guac_socket_write at socket.c:91
> #11 guac_socket_write at socket.c:107
> #12 guac_socket_flush_base64 at socket.c:341
> #13 guac_socket_write_base64 at socket.c:372
> #14 guac_protocol_send_blob at protocol.c:262
> #15 guac_png_flush_data at encode-png.c:79
> #16 guac_png_write_data at encode-png.c:114
> #17 guac_png_cairo_write_handler at encode-png.c:162
> #18 ?? from /usr/lib/x86_64-linux-gnu/libcairo.so.2
> #19 png_write_chunk_data from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #20 ?? from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #21 ?? from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #22 ?? from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #23 png_write_row from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #24 png_write_image from /usr/lib/x86_64-linux-gnu/libpng16.so.16
> #25 ?? from /usr/lib/x86_64-linux-gnu/libcairo.so.2
> #26 cairo_surface_write_to_png_stream from
> /usr/lib/x86_64-linux-gnu/libcairo.so.2
> #27 guac_png_cairo_write at encode-png.c:195
> #28 guac_png_write at encode-png.c:300
> #29 guac_client_stream_png at client.c:799
> #30 guac_display_dup at display.c:275
> #31 guac_rdp_join_pending_handler at client.c:135
> #32 guac_client_promote_pending_users at client.c:178
> #33 guac_client_pending_users_thread at client.c:246
> #34 start_thread from /lib/x86_64-linux-gnu/libpthread.so.0
> #35 clone from /lib/x86_64-linux-gnu/libc.so.6
>
> This thread does not exit anymore. Even if all users are disconnected.
> No new users can join anymore because the thread holds the
> __pending_users_lock.
> Join threads:
> #0 pthread_rwlock_wrlock from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 guac_rwlock_acquire_write_lock at rwlock.c:186
> #2 guac_client_add_pending_user at client.c:440
> #3 guac_client_add_user at client.c:479
> #4 guac_user_handle_connection at user-handshake.c:339
> #5 guacd_user_thread at proc.c:99
> #6 start_thread from /lib/x86_64-linux-gnu/libpthread.so.0
> #7 clone from /lib/x86_64-linux-gnu/libc.so.6
>
> Also, no existing users can be removed from the connection for the same
> reason.
> Remove threads:
> #0 pthread_rwlock_wrlock from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 guac_rwlock_acquire_write_lock at rwlock.c:186
> #2 guac_client_remove_user at client.c:497
> #3 guac_user_handle_connection at user-handshake.c:364
> #4 guacd_user_thread at proc.c:99
> #5 start_thread from /lib/x86_64-linux-gnu/libpthread.so.0
> #6 clone from /lib/x86_64-linux-gnu/libc.so.6
>
> I am not sure how to fix this. Any ideas? Adding timeouts to the currently
> blocking socket call is the only solution I can come up with.
> But after discovering the discussion in
> https://lists.apache.org/thread/94xrxq9w3kd4otcpdn3fh0jwn603m4wp it seems
> like this might not be the preferred way to fix this.
>
>
Hello, Markus,
It seems like you're on the right trail of a deadlock somewhere in the
guacd code. I don't have any great suggestions except to keep tracking down
where there might be a lock that is getting acquired and not released, or a
race condition where a lock is acquired but cannot be released. I think we
have a couple of these in the 1.6.0 version of the code, as reported by you
and at least one other person on the mailing lists and/or Jira forum.

Maybe adding some guac_client_log and/or guac_user_log calls to the code in
various places can help track down when locks are being acquired but not
released?

-Nick

Reply via email to