[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-08 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
FYI: we are working to fix the CI machines now so I won't be able ATM to 
run any job on it :(
I hope to have the CI machines up and running soon to continue :+1: 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-08 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  



> @qihongxu @michaelandrepearce
> I'm running now a CI job: it will take some time, but when it will be 
fine I will merge this 
> @qihongxu After all the relevant bits re paging will be merged I will 
send another PR with the same 2 commits I have sent to your branch: are you 
available to give some help to check the effects of that PR on your tests?


> @qihongxu big thanks for all the effort on this!! And providing the 
testing time


My pleasure :) We are glad to see any boost in perf, especially on paging 
mode.

I will keep a close watch on the new PR you mentioned and ran some more 
tests as we have done in this issue if needed.

Also thank you all for reviews and works on this PR! 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-08 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu big thanks for all the effort on this!!


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-08 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu @michaelandrepearce 
I'm running now a CI job: it will take some time, but when it will be fine 
I will merge this :+1: 
@qihongxu After all the relevant bits re paging will be merged I will send 
another PR with the same 2 commits I have sent to your branch: are you 
available to give some help to check the effects of that PR on your tests? 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
Yes if @qihongxu agree to give an hand :P re this pr probably make sense to 
include at least the second commit of my PR that was unrelated to the new cache 
impl and was just avoiding an unnecessary query that was always performed...


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@franz1981 i think well have to do some more real world testing with it 
(difference between a isolated bench and a full e2e test), with @qihongxu help 
hopefully, it might be something odd that by releasing some performance it 
causes an odd bottleneck somewhere else.

Are you ok, if once full CI passes we merge as is, and continue this on a 
separate pr?


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce I would like first to trigger a CI job of some kind, 
maybe @clebertsuconic can help with his superbox (just this time) to get an 
answer sooner?

Re the cache I was thinking already to send another PR, but I have verified 
that is virtually impossible that's the reason of the consumer slow-down. These 
are the numbers of a the bench comparing it with the original version:
```
Benchmark   (size)  (type)   Mode  Cnt  Score   
   Error  Units
CacheBench.getMessage1  32 chunked  thrpt   10  150039261.251 ± 
12504804.694  ops/s
CacheBench.getMessage1  32  linkedlist  thrpt   10   31776755.611 ±  
1405838.635  ops/s
CacheBench.getMessage11024 chunked  thrpt   10   31433127.788 ±  
3902498.303  ops/s
CacheBench.getMessage11024  linkedlist  thrpt   102638404.341 ±   
119171.758  ops/s
CacheBench.getMessage1  102400 chunked  thrpt   10 344799.911 ±
27267.965  ops/s
CacheBench.getMessage1  102400  linkedlist  thrpt   10  20020.925 ±
 5392.418  ops/s
CacheBench.getMessage3  32 chunked  thrpt   10  384605640.192 ± 
35164543.632  ops/s
CacheBench.getMessage3  32  linkedlist  thrpt   10   14124979.510 ±  
2875341.272  ops/s
CacheBench.getMessage31024 chunked  thrpt   10   90418506.375 ±  
4593688.556  ops/s
CacheBench.getMessage31024  linkedlist  thrpt   101562687.000 ±
91433.926  ops/s
CacheBench.getMessage3  102400 chunked  thrpt   10 978575.016 ±
44800.161  ops/s
CacheBench.getMessage3  102400  linkedlist  thrpt   10  21614.717 ±
 5344.742  ops/s
```
Where `getMessage1` is `LivePageCacheImpl::getMessage` called @ random 
positions by 1 thread and 
`getMessage3` is `LivePageCacheImpl::getMessage` called @ random positions 
by 3 threads.
`chunked` is the version and `linkedlist` the original version: the 
difference is quite large and the new version just scale linearly...


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@franz1981 based on this, shall we merge this pr as is which is quite 
impressive result at a combined 49.4k.

And then work on livepagecache changes separately? 





---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> A possible explanation might be that in our case consumers were reading 
old pages while producers were writing new pages. There is no intersection 
between these two groups such as a common page being writing by both of 
them.

So why it the consumer side is being slowed down? 
I need to check, but I remember that consumers were querying the live page 
cache, but I could be wrong.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu Sorry to ask twice, but I haven't understood if `Ver no lock & 
new livePageCache` was making uses of this pr 
https://github.com/qihongxu/activemq-artemis/pull/1 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu ping! :) I'm too curious :)


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu 

> Ver no lock & new livePageCache

Is making use of the last 2 commits I have sent as a PR to your repository?
It sould be way lot faster then `Ver no lock`


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-07 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @franz1981 
After we ran tests on both version (one with no lock and the other with new 
LivePageCache & no lock ), the result chart is as below.

  | Send | In Total
-- | -- | --
Ver Original | 30k/12.5k | 32.5k
Ver Modified checkDepage | 31.1k/11.3k | 42.4k
Ver Modified hasNext | 23.8k/23.5k | 47.3k
Ver no lock | **22.9k/26.5k** | **49.4k**
Ver no lock & new livePageCache | **24.5k/22k** | **46.5k**

Tests are implemented with same conditions before. Clients consumed and 
produced on the same queue with 10 million messages in it.
All version are based on Artemis-2214, which cache something in PagedRef.




---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce @qihongxu Ok, checked: the latest version of this PR 
+ my branch 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is 
fully non-blocking :)

Nice work!!

> Be great to see a final stat in @qihongxu test env

As soon as I'm back Monday I will try implement same tests on both 
version(this and franz's).


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce Done, the PR has been sent, now we can just wait the 
perf results on it :)
I have improved quite a bit the live page cache behaviour/reliability 
(especially if OOME), but sadly I see that the most called method `getMessage` 
cannot be improved anymore without making the lock-free code a real nightmare.
The original version was O(n) depending which message was queried, because 
it needs to walk the entire linked list of paged messages. 
In my version I have amortized the cost by using an interesting hybrid 
between an ArrayList and a LinkedList, similar to 
https://en.wikipedia.org/wiki/Unrolled_linked_list, but (very) optimized for 
addition.
I'm mentioning this, because is a long time I want to design a 
single-threaded version of this same data-structure to be used as the main 
datastructure inside QueueImpl.



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce Good idea! let me do it now



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@franz1981 did you send a pr to @qihongxu branch so he can merge it and 
this pr picks it up?

Be great to see a final stat in @qihongxu test env


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @qihongxu Ok, checked: the latest version of this PR + 
my branch 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is 
never blocked :)


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @qihongxu I have re-implemented the LivePageCache to be 
completly lock-free: 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache
Fell free to try it into this PR too and I can do a PR to your branch.
I will provide soon some charts of the contention state :+1: 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> whats the current PR look like in heat maps?

I'm finishing a thing to fix `LivePageCacheImpl` too, hopefully today and I 
will post: impl-wise seems good to me as well :+1: 



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu looks good to me.

@franz1981 whats the current PR look like in heat maps?


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> em.Could you please tell us which issues? We need to verify how it 
affects our cluster.

The big issue im relating to, which became a night mare for my 
organisation, was that under high concurrency (high throughput and low latency 
broker setup), the buffers can get mixed up, and was causing index out of 
bounds issues.

Fixes were multiple:

https://github.com/apache/activemq-artemis/commit/024db5bd3c1656d265daf60c9e3a362d53b9088b

https://github.com/apache/activemq-artemis/commit/da7fb89037481fb6343c760010d4553ff28ac87e

Im also aware there have been some other concurrency fixes for smaller 
issues.



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
Removed readlock of isPaging().
Also as @franz1981 suggested now only volatile load 
addressFullMessagePolicy once on each call.
Please review and notify me if any more test needed:)


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread wy96f
Github user wy96f commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  

> Id agree, im just cautious as we've been hit a few times with concurrency 
issues that have been a nightmare to find (and quite recently as last month!).
em.Could you please tell us which issues? We need to verify how it 
affects our cluster.




---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce
> 
> > Do you get this on master or this PR (im mean is that a typo)?
> 
> I've got that on master!

ok so i think we not need worry on this for realms of this PR. (probably 
needs looking into, but doesn;t need to be solved in this pr - imo) 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> If we return `true` from the dirty read we can just return it, while if 
we found the it `false` we could attempt to enter the read lock and validate 
that's not paging for real.

Ive literally gone through every case, what occurs is we call isPaging 
within an if statement, and then do some logic after, as such anyhow any action 
we do within these if statements anyhow will be based off a stale state. 

Im starting to just think we make isPaging not use a read lock  (aka make 
it dirty), as its only used in queueimpl like mentioned and for queuecontrol 
(aka the admin gui)


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
> Im starting to feel like Alice here and were going to end up going into a 
rabbit hole ;) and will end up with the original isPaging just being dirty.

+1 same exact feeling 
Quoting myself twice for @wy96f:
> LivePageCacheImpl (in violet) is now a major contention point

and 

> Compaction is stealing lot of cpu and I/O

Just a thought: we can just choose in what being less stale or not.
If we return `true` from the dirty read we can just return it, while if we 
found the it `false` we could attempt to enter the read lock and validate 
that's not paging for real.




---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
> Do you get this on master or this PR (im mean is that a typo)?

I've got that on master!


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-04 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce @qihongxu Just a lil OT but I'm getting these warns 
on master:
> 
> ```
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,136 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,139 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,142 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,145 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,148 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,151 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,154 during compact replay
> ```

Do you get this on master? If not then that is a BIG worry


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> Looking at the CPU graph instead I can see many odd things ie Compaction 
is stealing lot of cpu and I/O:

Nice find.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@franz1981 nice graphs, looks like essentially isPaging in QueueImpl unless 
everything is dirtyRead then we simply move the problem somewhere else, i think 
we probably need to make then a more general call if all these are safe to have 
dirty read or not. (Im starting to feel like  


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> > > CursorIterator:hasNext
> > 
> > 
> > Im bit concerned with this doing a dirty read, as this isnt something 
that is trigger an ascyn action, in actual fact the hasNext is purposefully 
synchronized. (especially as recently some concurrency issues have been found 
in paging (e.g. howards current pr) im sensitive to us being too hap hazard 
here.
> 
> I don't get why isPaging() in hasNext() needs to be consistent. The 
paging status can change after isPaging() unless we readlock isPaging() and 
subsequent operations. Without readlock, a) if isPaging() returns true, but the 
other thread set it to false, subsequent next() call reads no data and returns 
null; b) if isPaging() returns false, but the other thread set it to true(a new 
message coming), deliverAsync would be called later anyway.

if thats the case then the sync method could be removed also, though i 
think @clebertsuconic has given some further info to why this is like it is.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
So out the two, the checkDepage is the safer one to use a dirty read, so id 
expect to see that changed first before anything else.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
> @qihongxu you're using JMS api not core api then?

Yes (sometimes for for compatibility concern)

> @qihongxu i dont see checkDepage using the dirtyRead in current commit

Sorry I probably did not make it clear enough in my last comment. As the 
chart shows we tried checkDepage using the dirtyRead, but found it makes no 
difference in perf. We even changed isPaging() method in PageSubscriptionImpl 
from `return pageStore.isPaging();` to `return pageStore.isPagingDirtyRead();`. 
In this way not only checkDepage, but also other methods that call 
PageSubscription.isPaging() (as @franz1981 shown in flame charts above) will 
all use dirtyRead. But still, same perf as before.
From all above we thought it's safer to leave others unchanged, only force 
CursorIterator:hasNext to use dirtyRead since it affects perf a lot. My latest 
RP is based on these concerns.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread clebertsuconic
Github user clebertsuconic commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
during initial development years ago I needed to be careful when to leave 
paging. this needed some sync points to make sure depage would set it 
synchronously.

I'm not saying anything against the change (I'm still in vacation mode, I'm 
back on monday).. just saying what was the original intent when it was written.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread wy96f
Github user wy96f commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> > CursorIterator:hasNext
> 
> Im bit concerned with this doing a dirty read, as this isnt something 
that is trigger an ascyn action, in actual fact the hasNext is purposefully 
synchronized. (especially as recently some concurrency issues have been found 
in paging (e.g. howards current pr) im sensitive to us being too hap hazard 
here.


I don't get why isPaging() in hasNext() needs to be consistent. The paging 
status can change after isPaging() unless we readlock isPaging() and subsequent 
operations. Without readlock, a) if isPaging() returns true, but the other 
thread set it to false, subsequent next() call reads no data and returns null; 
b) if isPaging() returns false, but the other thread set it to true(a new 
message coming), deliverAsync would be called later anyway.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
Looking at the CPU graph instead I can see many odd things:

- Page writing is double copying the heap ByteBuffer into a native 
ByteBuffer to write on the page file:

![image](https://user-images.githubusercontent.com/13125299/50661999-308bce80-0fa5-11e9-9ee4-053327c36cb1.png)
- Compaction is stealing lot of cpu and I/O:

![image](https://user-images.githubusercontent.com/13125299/50662028-4ef1ca00-0fa5-11e9-9e13-613b569e4c58.png)



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
I have taken a look on the version used for this PR + the change suggested 
by @michaelandrepearce 
 on checkDepage and I have found that:
LivePageCacheImpl (in violet) is now a major contention point:

![image](https://user-images.githubusercontent.com/13125299/50660994-fa991b00-0fa1-11e9-93cf-88218f07a2de.png)
While re `isPaging`:

![image](https://user-images.githubusercontent.com/13125299/50661277-df7adb00-0fa2-11e9-9ed1-94823381102a.png)
and

![image](https://user-images.githubusercontent.com/13125299/50661299-eefa2400-0fa2-11e9-9463-45a9b0a57548.png)

Are now source of contention







---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @qihongxu Just a lil OT but I'm getting these warns on 
master:
```
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,136 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,139 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,142 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,145 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,148 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,151 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,154 during compact replay
```


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu i dont see checkDepage using the dirtyRead in current commit


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> The version I have shown is just master ie any read lock is there!

I get that, im more relfecting on this change in PR.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
The version I have shown is just master ie any read lock is there!


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> CursorIterator:hasNext

Im bit concerned with this doing a dirty read, as this isnt something that 
is trigger an ascyn action, in actual fact the hasNext is purposefully 
synchronized. 

@franz1981 if checkDepage is removed from the lock, why would DepageRunner 
now be locking it up so bad? 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> > Instead of transaction consumer you could use client acknowledge or 
even individual acknowledge.
> 
> It seems that both client-acknowledge and individual-acknowledge mode 
will finally use Transaction at server side. Considering these modes have no 
obvious difference in performance, we choose to use transaction as it’s more 
reliable and supports rollback.

@qihongxu you're using JMS api not core then?


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread franz1981
Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu Just as a confirmation, I have used 
https://github.com/jvm-profiling-tools/async-profiler on lock events ie time 
while waiting to enter into a lock and I have found the same exact behaviours 
explained above:
QueueImpl::checkDepage

![image](https://user-images.githubusercontent.com/13125299/50647451-de34b880-0f78-11e9-99bc-101c830cbda2.png)
QueueImpl::DepageRunner

![image](https://user-images.githubusercontent.com/13125299/50647474-ef7dc500-0f78-11e9-857f-48d190a6e9c6.png)

both are calling `PagingStoreImpl::isPaging` and are blocking each others




---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> Instead of transaction consumer you could use client acknowledge or even 
individual acknowledge.

It seems that both client-acknowledge and individual-acknowledge mode will 
finally use Transaction at server side. Considering these modes have no obvious 
difference in performance, we choose to use transaction as it’s more reliable 
and supports rollback.



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2019-01-03 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @qihongxu,
> 
> re:
> "
> According to this we have an internal branch that removes isPaging() 
method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s 
specific executor. After that we performed a same test and get a result of 
21k/25k Send–in total 46k TPS. This version runs smoothly in our use 
case but we are still exploring potential effects.
> "
> 
> Agree that as the QueueImpl:checkDepage which is called from deliver 
method, only schedules another depage asynchronously that the read barrier lock 
for that is over kill as it doesn't have to be strongly consistent correct.
> 
> I think changing the current method could be dangerous, unless we do a 
fuller analysis on all its uses, but i think we could easily add a new method 
named something like "isPagingDirtyRead" which can return without the readlock, 
it will mean the returned value isn't always 100% accurate but in cases such as 
checkDepage, we don't need it to be, its just a single, and we can call it 
where we know its safe to.
> 
> Would you be happy adding this, (will need to be exposed via 
PageSubscription)? And updating checkDepage to use that one?


As you suggested we tried update checkDepage with a new "isPagingDirtyRead" 
method which can return without the readlock. But after running same tests on 
this version it seems that checkDepage did not affect receive performance too 
much. Instead, CursorIterator#hasNext in PageSubscriptionImpl also called 
isPaging() and had a significant impact on receive performance. According to 
this we updated hasNext() and forced it to use the new “isPagingDirtyRead”. 
Below is the result chart.
(P.S the original ver is a control group, not exactly the “original” 
master-branch build. It has been applied with specific pageSyncTimer executor, 
and cache durable in PageReference - See PR Artemis-2214.)


  | Send | In Total
-- | -- | --
Ver Original | 30k/12.5k | 32.5k
Ver Modified checkDepage | 31.1k/11.3k | 42.4k
Ver Modified hasNext | 23.8k/23.5k | 47.3k

Detailed TPS chart can be seen in attachment. (Sorry we dont have enough 
time for running tests. Just got back to work today)
[TPS 
chart.docx](https://github.com/apache/activemq-artemis/files/2723731/TPS.chart.docx)



---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-31 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu,

re:
"
According to this we have an internal branch that removes isPaging() 
method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s 
specific executor. After that we performed a same test and get a result of 
21k/25k Send–in total 46k TPS. This version runs smoothly in our use 
case but we are still exploring potential effects.
"

Agree that as the QueueImpl:checkDepage which is called from deliver 
method, only schedules another depage asynchronously that the read barrier lock 
for that is over kill as it doesn't have to be strongly consistent correct.

I think changing the current method could be dangerous, unless we do a 
fuller analysis on all its uses, but i think we could easily add a new method 
named something like "isPagingDirtyRead" which can return without the readlock, 
it will mean the returned value isn't always 100% accurate but in cases such as 
checkDepage, we don't need it to be, its just a single.

Would you be happy adding this, (will need to be exposed via 
PageSubscription)? And updating checkDepage to use that one?






---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-30 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
The bit around lock contention on paging check is very interesting where 
@qihongxu removed it and performance rose, this would suggest consumer perf 
change isnt related to io but due to over lock contention in code. Def would be 
good to see if its safe to remove 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-30 Thread clebertsuconic
Github user clebertsuconic commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
The receiver could be slower because of more IO. That’s not a bad thing. 


I’m on vacation this week. I won’t be able to get to this.  Just saying 
why the receiver could be slower. 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-30 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
If you could supply more complete stats with ramp ups e.g pef at 1k, 2k, 
3k...5k...15k..20k.21k.22k...25k etc etc, over time. Rather than just a summary 
snapshot.


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-30 Thread qihongxu
Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce As presented in test result adding a specific executor 
do improve producer rates, and also means page() method in PagingStoreImpl will 
acquire writelock more often. At the same time QueueImpl::deliver will call 
isPaging() to check whether queue is paging or not, which acquire readlock and 
make it a race condition. This may explain why the increase of producer rates 
is accompanied by the decrease of consumer rates.
According to this we have an internal branch that removes isPaging() 
method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s 
specific executor. After that we performed a same test and get a result of 
21k/25k Send–in total 46k TPS. This version runs smoothly in our use 
case but we are still exploring potential effects.
The reason why we use transactional receive is to make sure every message 
indeed delivered and consumed. Say that if we do non-tx receive and consumer 
system fail in the following processing chain, this message may be delivered 
but actually “lost”. So we choose to let consumer decide when should they 
ack and commit, instead of auto-commit acks. 


---


[GitHub] activemq-artemis issue #2484: ARTEMIS-2216 Use a specific executor for pageS...

2018-12-29 Thread michaelandrepearce
Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
Comments on this are:

We have shared executor and thread pools as a way to pool these its part 
artemis design so that these can be controlled, eg the number of threads are 
configurable. And theres uses cases for why, we dont just add executors. If we 
add a new pool it should be configurable like others.

Whilst your use case this significantly improves things. As you note this 
change isnt entirely positive to all users as it does impact use cases where 
people may care more on the consumer side (i am one one of those). If anything 
if the broker is saturated its more important consumers for us can dequeue and 
if anything producers backpressure. As if consumers cant keep up and producer 
peak continues you can very soon end up situation where paging depth will just 
grow and grow.

It would be good to get more stats therefore on how as producer rates ramup 
how consumer rates ramp down. And crticial cross over points. Along with number 
of remote consumers and producers. All of this could be mute if there is a way 
to eliminate the negative effect this change has.


---