[jira] [Created] (ZOOKEEPER-2927) Local session reconnect validation not forward to leader
Qihong Xu created ZOOKEEPER-2927: Summary: Local session reconnect validation not forward to leader Key: ZOOKEEPER-2927 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2927 Project: ZooKeeper Issue Type: Improvement Components: java client, quorum, server Affects Versions: 3.5.3 Environment: configuration management system based on zookeeper 3.5.3 Reporter: Qihong Xu Priority: Minor When zookeeper quorum recovers from shutdown/crash, a client with a local session will reconnect to a random server in quorum. If this random-chosen server is not leader and does not own the local session previously, it will forward this session to leader for validation. And then if this is a global session, leader will update its owner, if not, leader adds Boolean false to packet and does nothing. Since our system involves mostly local session and has a large amount of connections, this procedure may be redundant and add potential pressure to leader. Is this reasonable for the reconnect scenario that local session does not forward to leader, instead return by follower directly? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2927) Local session reconnect validation not forward to leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ZOOKEEPER-2927: - Description: When zookeeper quorum recovers from shutdown/crash, a client with a local session will reconnect to a random server in quorum. If this random-chosen server is not leader and does not own the local session previously, it will forward this session to leader for validation. And then if this is a global session, leader will update its owner, if not, leader adds Boolean false to packet and does nothing. Since our system involves mostly local session and has a large amount of connections, this procedure may be redundant and add potential pressure to leader. Is this reasonable for the reconnect scenario that local session validation not forward to leader, instead return by follower directly? was: When zookeeper quorum recovers from shutdown/crash, a client with a local session will reconnect to a random server in quorum. If this random-chosen server is not leader and does not own the local session previously, it will forward this session to leader for validation. And then if this is a global session, leader will update its owner, if not, leader adds Boolean false to packet and does nothing. Since our system involves mostly local session and has a large amount of connections, this procedure may be redundant and add potential pressure to leader. Is this reasonable for the reconnect scenario that local session does not forward to leader, instead return by follower directly? > Local session reconnect validation not forward to leader > > > Key: ZOOKEEPER-2927 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2927 > Project: ZooKeeper > Issue Type: Improvement > Components: java client, quorum, server >Affects Versions: 3.5.3 > Environment: configuration management system based on zookeeper 3.5.3 >Reporter: Qihong Xu >Priority: Minor > > When zookeeper quorum recovers from shutdown/crash, a client with a local > session will reconnect to a random server in quorum. If this random-chosen > server is not leader and does not own the local session previously, it will > forward this session to leader for validation. And then if this is a global > session, leader will update its owner, if not, leader adds Boolean false to > packet and does nothing. > Since our system involves mostly local session and has a large amount of > connections, this procedure may be redundant and add potential pressure to > leader. Is this reasonable for the reconnect scenario that local session > validation not forward to leader, instead return by follower directly? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly
Qihong Xu created ZOOKEEPER-2964: Summary: dataDir and dataLogDir are printed opposingly Key: ZOOKEEPER-2964 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.3 Reporter: Qihong Xu Priority: Minor I foung a bug that "conf" command would return dataDir and dataLogDir opposingly. This bug only exists in versions newer than 3.5. I only found dumpConf in [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] print these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are not affected and server function is ok. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ZOOKEEPER-2964: - Description: I foung a bug that "conf" command would return dataDir and dataLogDir opposingly. This bug only exists in versions newer than 3.5. I only found dumpConf in [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are not affected and server function is ok. I made a small patch to fix this bug. Any review is appreciated. was: I foung a bug that "conf" command would return dataDir and dataLogDir opposingly. This bug only exists in versions newer than 3.5. I only found dumpConf in [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] print these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are not affected and server function is ok. > dataDir and dataLogDir are printed opposingly > - > > Key: ZOOKEEPER-2964 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3 >Reporter: Qihong Xu >Priority: Minor > > I foung a bug that "conf" command would return dataDir and dataLogDir > opposingly. > This bug only exists in versions newer than 3.5. I only found dumpConf in > [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] > prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths > are not affected and server function is ok. > I made a small patch to fix this bug. Any review is appreciated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ZOOKEEPER-2964: - Attachment: ZOOKEEPER-2964.patch > dataDir and dataLogDir are printed opposingly > - > > Key: ZOOKEEPER-2964 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3 >Reporter: Qihong Xu >Priority: Minor > Attachments: ZOOKEEPER-2964.patch > > > I foung a bug that "conf" command would return dataDir and dataLogDir > opposingly. > This bug only exists in versions newer than 3.5. I only found dumpConf in > [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] > prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths > are not affected and server function is ok. > I made a small patch to fix this bug. Any review is appreciated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2964) "Conf" command returns dataDir and dataLogDir opposingly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ZOOKEEPER-2964: - Summary: "Conf" command returns dataDir and dataLogDir opposingly (was: dataDir and dataLogDir are printed opposingly) > "Conf" command returns dataDir and dataLogDir opposingly > > > Key: ZOOKEEPER-2964 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3 >Reporter: Qihong Xu >Priority: Minor > Attachments: ZOOKEEPER-2964.patch > > > I foung a bug that "conf" command would return dataDir and dataLogDir > opposingly. > This bug only exists in versions newer than 3.5. I only found dumpConf in > [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188] > prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths > are not affected and server function is ok. > I made a small patch to fix this bug. Any review is appreciated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARTEMIS-2806) deployQueue missing address argument
Qihong Xu created ARTEMIS-2806: -- Summary: deployQueue missing address argument Key: ARTEMIS-2806 URL: https://issues.apache.org/jira/browse/ARTEMIS-2806 Project: ActiveMQ Artemis Issue Type: Bug Affects Versions: 2.12.0 Reporter: Qihong Xu In ActiveMQServerControlImpl, deployQueue method is missing address argument which results in creating non-matched addresses and queues. (For example, using deployQueue to create a subscriber A_0 under existing address A, it finally returns a new topic A_0 with a subscriber A_0 ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-1700) Server stopped responding and killed itself while exiting paging state
Qihong Xu created ARTEMIS-1700: -- Summary: Server stopped responding and killed itself while exiting paging state Key: ARTEMIS-1700 URL: https://issues.apache.org/jira/browse/ARTEMIS-1700 Project: ActiveMQ Artemis Issue Type: Bug Components: Broker Affects Versions: 2.4.0 Reporter: Qihong Xu Attachments: artemis.log We are currently experiencing this error while running stress test on artemis. Basic configuration: 1 broker ,1 topic, pub-sub mode. Journal type = MAPPED. Threadpool max size = 60. In order to test the throughput of artemis we use 300 producers and 300 consumers. However we found that sometimes when artemis exit paging state, it will stop responding and kill itself. This situatuion happened on some specific servers. Details can be found in attached dump file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARTEMIS-1700) Server stopped responding and killed itself while exiting paging state
[ https://issues.apache.org/jira/browse/ARTEMIS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375225#comment-16375225 ] Qihong Xu commented on ARTEMIS-1700: [~nigro@gmail.com] Yes, we just use the default setting here. > Server stopped responding and killed itself while exiting paging state > -- > > Key: ARTEMIS-1700 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1700 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.4.0 >Reporter: Qihong Xu >Priority: Major > Attachments: artemis.log > > > We are currently experiencing this error while running stress test on artemis. > > Basic configuration: > 1 broker ,1 topic, pub-sub mode. > Journal type = MAPPED. > Threadpool max size = 60. > > In order to test the throughput of artemis we use 300 producers and 300 > consumers. However we found that sometimes when artemis exit paging state, it > will stop responding and kill itself. This situatuion happened on some > specific servers. > > Details can be found in attached dump file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARTEMIS-2214) Cache durable&priority in PagedReference to avoid blocks in consuming paged messages
[ https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ARTEMIS-2214: --- Attachment: stacks.txt > Cache durable&priority in PagedReference to avoid blocks in consuming paged > messages > > > Key: ARTEMIS-2214 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2214 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.6.3 >Reporter: Qihong Xu >Priority: Major > Attachments: stacks.txt > > > We recently performed a test on artemis broker and found a severe performance > issue. > When paged messages are being consumed, decrementMetrics in > QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are > durable or not. In this way queue will be locked for a long time because page > may be GCed and need to be reload entirely. Other operations rely on queue > will be blocked at this time, which cause a significant TPS drop. Detailed > stacks are attached below. > This also happens when consumer is closed and messages are pushed back to the > queue, artemis will check priority on return if these messages are paged. > To solve the issue, durable and priority need to be cached in PagedReference > just like messageID, transactionID and so on. I have applied a patch to fix > the issue. Any review is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARTEMIS-2214) Cache durable&priority in PagedReference to avoid blocks in consuming paged messages
[ https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ARTEMIS-2214: --- Attachment: 0001-Add-durable-and-priority-to-pagedReference.patch > Cache durable&priority in PagedReference to avoid blocks in consuming paged > messages > > > Key: ARTEMIS-2214 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2214 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.6.3 >Reporter: Qihong Xu >Priority: Major > Attachments: 0001-Add-durable-and-priority-to-pagedReference.patch, > stacks.txt > > > We recently performed a test on artemis broker and found a severe performance > issue. > When paged messages are being consumed, decrementMetrics in > QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are > durable or not. In this way queue will be locked for a long time because page > may be GCed and need to be reload entirely. Other operations rely on queue > will be blocked at this time, which cause a significant TPS drop. Detailed > stacks are attached below. > This also happens when consumer is closed and messages are pushed back to the > queue, artemis will check priority on return if these messages are paged. > To solve the issue, durable and priority need to be cached in PagedReference > just like messageID, transactionID and so on. I have applied a patch to fix > the issue. Any review is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARTEMIS-2214) Cache durable&priority in PagedReference to avoid blocks in consuming paged messages
[ https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ARTEMIS-2214: --- Attachment: (was: 0001-Add-durable-and-priority-to-pagedReference.patch) > Cache durable&priority in PagedReference to avoid blocks in consuming paged > messages > > > Key: ARTEMIS-2214 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2214 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.6.3 >Reporter: Qihong Xu >Priority: Major > Attachments: stacks.txt > > > We recently performed a test on artemis broker and found a severe performance > issue. > When paged messages are being consumed, decrementMetrics in > QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are > durable or not. In this way queue will be locked for a long time because page > may be GCed and need to be reload entirely. Other operations rely on queue > will be blocked at this time, which cause a significant TPS drop. Detailed > stacks are attached below. > This also happens when consumer is closed and messages are pushed back to the > queue, artemis will check priority on return if these messages are paged. > To solve the issue, durable and priority need to be cached in PagedReference > just like messageID, transactionID and so on. I have applied a patch to fix > the issue. Any review is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARTEMIS-2214) Cache durable&priority in PagedReference to avoid blocks in consuming paged messages
Qihong Xu created ARTEMIS-2214: -- Summary: Cache durable&priority in PagedReference to avoid blocks in consuming paged messages Key: ARTEMIS-2214 URL: https://issues.apache.org/jira/browse/ARTEMIS-2214 Project: ActiveMQ Artemis Issue Type: Bug Components: Broker Affects Versions: 2.6.3 Reporter: Qihong Xu We recently performed a test on artemis broker and found a severe performance issue. When paged messages are being consumed, decrementMetrics in QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are durable or not. In this way queue will be locked for a long time because page may be GCed and need to be reload entirely. Other operations rely on queue will be blocked at this time, which cause a significant TPS drop. Detailed stacks are attached below. This also happens when consumer is closed and messages are pushed back to the queue, artemis will check priority on return if these messages are paged. To solve the issue, durable and priority need to be cached in PagedReference just like messageID, transactionID and so on. I have applied a patch to fix the issue. Any review is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARTEMIS-2216) Use a specific executor for pageSyncTimer
Qihong Xu created ARTEMIS-2216: -- Summary: Use a specific executor for pageSyncTimer Key: ARTEMIS-2216 URL: https://issues.apache.org/jira/browse/ARTEMIS-2216 Project: ActiveMQ Artemis Issue Type: Improvement Affects Versions: 2.6.3 Reporter: Qihong Xu Improve paging throughput by using a specific executor for pageSyncTimer Improving throughput on paging mode is one of our concerns since our cluster uses paging a lot. We found that pageSyncTimer in PagingStoreImpl shared the same executor with pageCursorProvider from thread pool. In heavy load scenario like hundreds of consumers receiving messages simultaneously, it became difficult for pageSyncTimer to get the executor due to race condition. Therefore page sync was delayed and producers suffered low throughput. To achieve higher performance we assign a specific executor to pageSyncTimer to avoid racing. And we run a small-scale test on a single modified broker. Broker: 4C/8G/500G SSD Producer: 200 threads, non-transactional send Consumer 200 threads, transactional receive Message text size: 100-200 bytes randomly AddressFullPolicy: PAGE Test result: | |Only Send TPS|Only Receive TPS|Send&Receive TPS| |Original ver|38k|33k|3k/30k| |Modified ver|38k|34k|30k/12.5k| The chart above shows that on modified broker send TPS improves from “poor” to “extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” under heavy load. Considering consumer systems usually have a long processing chain after receiving messages, we don’t need too fast receive TPS. Instead, we want to guarantee send TPS to cope with traffic peak and lower producer’s delay time. Moreover, send and receive TPS in total raises from 33k to about 43k. From all above this trade-off seems beneficial and acceptable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARTEMIS-2216) Use a specific executor for pageSyncTimer
[ https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ARTEMIS-2216: --- Description: Improving throughput on paging mode is one of our concerns since our cluster uses paging a lot. We found that pageSyncTimer in PagingStoreImpl shared the same executor with pageCursorProvider from thread pool. In heavy load scenario like hundreds of consumers receiving messages simultaneously, it became difficult for pageSyncTimer to get the executor due to race condition. Therefore page sync was delayed and producers suffered low throughput. To achieve higher performance we assign a specific executor to pageSyncTimer to avoid racing. And we run a small-scale test on a single modified broker. Broker: 4C/8G/500G SSD Producer: 200 threads, non-transactional send Consumer 200 threads, transactional receive Message text size: 100-200 bytes randomly AddressFullPolicy: PAGE Test result: | |Only Send TPS|Only Receive TPS|Send&Receive TPS| |Original ver|38k|33k|3k/30k| |Modified ver|38k|34k|30k/12.5k| The chart above shows that on modified broker send TPS improves from “poor” to “extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” under heavy load. Considering consumer systems usually have a long processing chain after receiving messages, we don’t need too fast receive TPS. Instead, we want to guarantee send TPS to cope with traffic peak and lower producer’s delay time. Moreover, send and receive TPS in total raises from 33k to about 43k. From all above this trade-off seems beneficial and acceptable. was: Improve paging throughput by using a specific executor for pageSyncTimer Improving throughput on paging mode is one of our concerns since our cluster uses paging a lot. We found that pageSyncTimer in PagingStoreImpl shared the same executor with pageCursorProvider from thread pool. In heavy load scenario like hundreds of consumers receiving messages simultaneously, it became difficult for pageSyncTimer to get the executor due to race condition. Therefore page sync was delayed and producers suffered low throughput. To achieve higher performance we assign a specific executor to pageSyncTimer to avoid racing. And we run a small-scale test on a single modified broker. Broker: 4C/8G/500G SSD Producer: 200 threads, non-transactional send Consumer 200 threads, transactional receive Message text size: 100-200 bytes randomly AddressFullPolicy: PAGE Test result: | |Only Send TPS|Only Receive TPS|Send&Receive TPS| |Original ver|38k|33k|3k/30k| |Modified ver|38k|34k|30k/12.5k| The chart above shows that on modified broker send TPS improves from “poor” to “extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” under heavy load. Considering consumer systems usually have a long processing chain after receiving messages, we don’t need too fast receive TPS. Instead, we want to guarantee send TPS to cope with traffic peak and lower producer’s delay time. Moreover, send and receive TPS in total raises from 33k to about 43k. From all above this trade-off seems beneficial and acceptable. > Use a specific executor for pageSyncTimer > - > > Key: ARTEMIS-2216 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2216 > Project: ActiveMQ Artemis > Issue Type: Improvement >Affects Versions: 2.6.3 >Reporter: Qihong Xu >Priority: Major > > Improving throughput on paging mode is one of our concerns since our cluster > uses paging a lot. > We found that pageSyncTimer in PagingStoreImpl shared the same executor with > pageCursorProvider from thread pool. In heavy load scenario like hundreds of > consumers receiving messages simultaneously, it became difficult for > pageSyncTimer to get the executor due to race condition. Therefore page sync > was delayed and producers suffered low throughput. > > To achieve higher performance we assign a specific executor to pageSyncTimer > to avoid racing. And we run a small-scale test on a single modified broker. > > Broker: 4C/8G/500G SSD > Producer: 200 threads, non-transactional send > Consumer 200 threads, transactional receive > Message text size: 100-200 bytes randomly > AddressFullPolicy: PAGE > > Test result: > | |Only Send TPS|Only Receive TPS|Send&Receive TPS| > |Original ver|38k|33k|3k/30k| > |Modified ver|38k|34k|30k/12.5k| > > The chart above shows that on modified broker send TPS improves from “poor” > to “extremely fast”, while receive TPS drops from “extremely fast” to > “not-bad” under heavy load. Considering consumer systems usually have a long > processing chain after receiving messages, we don’t need too fast receive > TPS. Instead, we want to guarantee send TPS to cope with traffic peak and > lower producer’s delay time. Mor
[jira] [Updated] (ARTEMIS-2214) ARTEMIS-2214 Cache durable&deliveryTime in PagedReference
[ https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Xu updated ARTEMIS-2214: --- Summary: ARTEMIS-2214 Cache durable&deliveryTime in PagedReference (was: Cache durable&priority in PagedReference to avoid blocks in consuming paged messages) > ARTEMIS-2214 Cache durable&deliveryTime in PagedReference > - > > Key: ARTEMIS-2214 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2214 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.6.3 >Reporter: Qihong Xu >Priority: Major > Attachments: stacks.txt > > > We recently performed a test on artemis broker and found a severe performance > issue. > When paged messages are being consumed, decrementMetrics in > QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are > durable or not. In this way queue will be locked for a long time because page > may be GCed and need to be reload entirely. Other operations rely on queue > will be blocked at this time, which cause a significant TPS drop. Detailed > stacks are attached below. > This also happens when consumer is closed and messages are pushed back to the > queue, artemis will check priority on return if these messages are paged. > To solve the issue, durable and priority need to be cached in PagedReference > just like messageID, transactionID and so on. I have applied a patch to fix > the issue. Any review is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARTEMIS-2251) Large messages might not be deleted when server crashed
Qihong Xu created ARTEMIS-2251: -- Summary: Large messages might not be deleted when server crashed Key: ARTEMIS-2251 URL: https://issues.apache.org/jira/browse/ARTEMIS-2251 Project: ActiveMQ Artemis Issue Type: Bug Reporter: Qihong Xu When deleting large messages, artemis will use storePendingLargeMessage to insert a temporary record in journal for reload, in case server crashed and large messages stayed forever. But in storePendingLargeMessage that appendAddRecord inserts records asynchronously. In this way there are potential risks that tasks in executor get lost due to server crash, which may lead to undeletable large messages. To solve this problem a Boolean is added to storePendingLargeMessage so that it will be forced to use SimpleWaitIOCallback in delete situation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)