[jira] [Commented] (QPID-5924) [linearstore] Qpidd Will Not Start with Large Number of Queues

Kim van der Riet (JIRA) Wed, 30 Jul 2014 07:12:06 -0700

    [ 
https://issues.apache.org/jira/browse/QPID-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079309#comment-14079309
 ]


Kim van der Riet commented on QPID-5924:
----------------------------------------

I have examined the source code and made some measurements using {{stap}}, and 
find that at the moment the linearstore is consuming 2 file descriptors per 
queue during recovery. Once recovery is complete, this number drops to the 
expected value of 1 per queue.

The first of these file descriptors is a dedicated file handle each queue 
keeps, and which is located in the leading {{JournalFile}} object. This file 
descriptor is used for AIO write operations. 

During recovery only, another file handle is being used for reading the 
journals. Recovery is performed serially, which allows just one to be used for 
the entire recovery. However, because of an error in the code, these file 
descriptors are not being closed explicitly when each queue has completed 
recovery, and thus they remain open until the end of the entire recovery phase, 
at which point they are all released together. This causes an additional file 
handle to be consumed  for each queue (temporarily, until the end of the 
recovery).

I have a proposed patch which does the following:
# The bug which keeps the files handles for each queue during recovery has been 
fixed and now only one file handle is used during the entire recovery process.
# The dedicated per-queue file handle in the JournalFile objects are not opened 
on initialization. Instead, they are opened on first use. This delays the 
consumption of file handles until necessary, and indefinitely on queues which 
are not in active use.

These changes would optimise the consumption of file handles, and in addition 
allows recovery to take place using a single file handle. However, there is 
still a limit to the number of file handles that may be practically used on a 
given hardware configuration. The larger goal of using a pool of file handles 
so that the number of queues that can be handled may be well in excess of the 
maximum number of available file descriptors will have to be tackled as a later 
enhancement.

> [linearstore] Qpidd Will Not Start with Large Number of Queues
> --------------------------------------------------------------
>
>                 Key: QPID-5924
>                 URL: https://issues.apache.org/jira/browse/QPID-5924
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: 0.22
>         Environment: qpid-cpp-server-0.22-42
> qpid-cpp-server-linearstore-0.22-42
>            Reporter: Brian Bouterse
>            Assignee: Kim van der Riet
>            Priority: Critical
>
> Pulp is an open source project that uses Qpidd. Pulp has need for a large 
> number of queues 10K+, and these queues need to be durable. When creating a 
> large number of queues (thousands), if you restart qpidd, it won't start. 
> Here is how to reproduce:
> 1. Install qpid-cpp-server and qpid-cpp-server-store
> 2. Start qpidd
> 3. Create a crazy number of unique queues (10K) with durability
> 4. Restart Qpidd
> 5. Observe an error message such as the following
> Starting Qpid AMQP daemon: Daemon startup failed: Queue 
> pulp.agent.5752dc04-7536-4e5c-b406-a0cd5d9c9119: recoverMessages() failed: 
> jexception 0x0104 RecoveryManager::getFile() threw JERR__FILEIO: File read or 
> write failure. 
> (/var/lib/qpidd/qls/jrnl/pulp.agent.5752dc04-7536-4e5c-b406-a0cd5d9c9119/818fa4b0-3319-4478-b2b0-d2195f90f695.jrnl)
>  
> (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:1004)
> Looking at /var/lib/qpidd/qls/jrnl/ directory there is 2676 jrnl files, 2640 
> of them start with pulp.agent. In our case the lots of queues are named 
> 'pulp.agent.<UUID>'.
> The expected behavior is that qpidd would start and run awesome with a crazy 
> number of queues (1 Million +).
> Raising the number of file descriptors is a viable workaround, but eventually 
> those will run out too. It would be an architectural win if a constant number 
> of file descriptors were used that are not affected by the number of queues 
> qpidd is managing.
> Perhaps this could be introduced as a new journal type that would run slower 
> but be more scalable. It could be introduced as 
> qpid-cpp-server-crazy-scalable-but-slower-store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (QPID-5924) [linearstore] Qpidd Will Not Start with Large Number of Queues

Reply via email to