[jira] [Commented] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2015-04-02 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392618#comment-14392618
 ] 

anselme dewavrin commented on AMQ-5082:
---

Well, it depends... When do you think v5.12 is planned for release ?

How do I patch a 5.11 ?

Thank you, we have been using ActiveMQ a lot in production for 20 months and it 
works well. We could give testimonial.

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Assignee: Christian Posta
Priority: Critical
 Fix For: 5.12.0

 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2015-04-02 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392950#comment-14392950
 ] 

anselme dewavrin commented on AMQ-5082:
---

Thanks Jim for these precisions.

doing it very soon !

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Assignee: Christian Posta
Priority: Critical
 Fix For: 5.12.0

 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2015-04-01 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390730#comment-14390730
 ] 

anselme dewavrin commented on AMQ-5082:
---

A big Thank you Jim, we were truly annoyed by this !

Christian, will this be included in the main branch of the downloadable 
binaries ? In which version ?

Thanks in advance,

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Assignee: Christian Posta
Priority: Critical
 Fix For: 5.12.0

 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AMQ-5235) erroneous temp percent used

2015-02-03 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303376#comment-14303376
 ] 

anselme dewavrin commented on AMQ-5235:
---

yes, the deletion is lazy. Several minutes somtimes.

Also, one message alone can make a whole data file (the .log) necessary, which 
are 100MB each by default. We reduced this problem by setting logSize to 10 MB.

For empty queues that still make temp percent non-zero, MAybe you should try to 
stop/restart your broker ?

Anselme

 erroneous temp percent used
 ---

 Key: AMQ-5235
 URL: https://issues.apache.org/jira/browse/AMQ-5235
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian (quality testing and production)
Reporter: anselme dewavrin

 Dear all,
 We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
 security because we only use persistent messages (about 6000 messages per 
 day). After severall days of use, the temp usage increases, and even shows 
 values that are above the total amount of the data on disk. Here it shows 45% 
 of its 1GB limit for the following files :
 find activemq-data -ls
 768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
 activemq-data
 768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
 activemq-data/store-version.txt
 768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/dirty.index
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/dirty.index/08.sst
 768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/dirty.index/CURRENT
 76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
 activemq-data/dirty.index/11.sst
 768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/dirty.index/LOCK
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/dirty.index/05.sst
 76809821 2048 -rw-r--r--   1 anselme  anselme   2097152 Jun 19 11:30 
 activemq-data/dirty.index/12.log
 76809818 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/dirty.index/MANIFEST-10
 768098160 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/lock
 76809815 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 19 11:30 
 activemq-data/00f0faaf.log
 76809823 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 16 11:50 
 activemq-data/00385f46.log
 768098074 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/00f0faaf.index
 76809808  420 -rw-r--r--   1 anselme  anselme429264 Jun 16 16:13 
 activemq-data/00f0faaf.index/09.log
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/00f0faaf.index/08.sst
 768098124 -rw-r--r--   1 anselme  anselme   165 Jun 16 16:13 
 activemq-data/00f0faaf.index/MANIFEST-07
 768098094 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/00f0faaf.index/CURRENT
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/00f0faaf.index/05.sst
 76809814 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 12 21:06 
 activemq-data/.log
 768098024 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/plist.index
 768098034 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/plist.index/CURRENT
 768098060 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/plist.index/LOCK
 76809805 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/03.log
 76809804 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/MANIFEST-02
 The problem is that in our production system it once blocked producers with a 
 tempusage at 122%, even if the disk was empty.
 So we invesigated and executed the broker in a debugger, and found how the 
 usage is calculated. If it in the scala leveldb files : It is not based on 
 what  is on disk, but on what it thinks is on the disk. It multiplies the 
 size of one log by the number of logs known by a certain hashmap.
 I think the entries of  the hashmap are not removed when the log files are 
 purged.
 Could you confirm ?
 Thanks in advance 
 Anselme



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AMQ-4987) io wait on replicated levelDB slaves

2014-06-30 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047684#comment-14047684
 ] 

anselme dewavrin commented on AMQ-4987:
---

still up and running in production.

 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor
 Fix For: 5.9.0


 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different 
 machines, as explained on the activemq site (with zookeeper etc.).
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2014-06-24 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041885#comment-14041885
 ] 

anselme dewavrin commented on AMQ-5082:
---

Dear all,

We had the exact same problem due to backups at our hosting company, that we 
could not avoid.

We worked around by increasing the ticktime in zookeeper.

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Priority: Critical
 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AMQ-5105) leveldb fails to startup because of NoSuchMethodError

2014-06-24 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041890#comment-14041890
 ] 

anselme dewavrin commented on AMQ-5105:
---

I worked for us, thanks
Anselme

 leveldb fails to startup because of NoSuchMethodError
 -

 Key: AMQ-5105
 URL: https://issues.apache.org/jira/browse/AMQ-5105
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.10.0
 Environment: Any
Reporter: Netlancer
Priority: Minor

 leveldb persistence fails to start due to errors as given below 
  Caused by: java.lang.NoSuchMethodError: 
 com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
 at 
 com.google.common.cache.CacheBuilder.getKeyStrength(CacheBuilder.java:530)
 at com.google.common.cache.LocalCache.init(LocalCache.java:238)
 at 
 com.google.common.cache.LocalCache$LocalLoadingCache.init(LocalCache.java:4861)
 at com.google.common.cache.CacheBuilder.build(CacheBuilder.java:803)
 at org.iq80.leveldb.impl.TableCache.init(TableCache.java:46)
 at org.iq80.leveldb.impl.DbImpl.init(DbImpl.java:155)
 at org.iq80.leveldb.impl.Iq80DBFactory.open(Iq80DBFactory.java:59)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply$mcV$sp(LevelDBClient.scala:661)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply(LevelDBClient.scala:657)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply(LevelDBClient.scala:657)
 at 
 org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:549)
 The problem seems to be because of multiple jars having the same classes
 guava-12.jar
 pax-url-aether-1.5.2.jar
 The class present in pax-url-aether-1.5.2.jar gets loaded causing level db to 
 fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (AMQ-5105) leveldb fails to startup because of NoSuchMethodError

2014-06-24 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041890#comment-14041890
 ] 

anselme dewavrin edited comment on AMQ-5105 at 6/24/14 8:58 AM:


It worked for us, thanks
Anselme


was (Author: adewavrin):
I worked for us, thanks
Anselme

 leveldb fails to startup because of NoSuchMethodError
 -

 Key: AMQ-5105
 URL: https://issues.apache.org/jira/browse/AMQ-5105
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.10.0
 Environment: Any
Reporter: Netlancer
Priority: Minor

 leveldb persistence fails to start due to errors as given below 
  Caused by: java.lang.NoSuchMethodError: 
 com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
 at 
 com.google.common.cache.CacheBuilder.getKeyStrength(CacheBuilder.java:530)
 at com.google.common.cache.LocalCache.init(LocalCache.java:238)
 at 
 com.google.common.cache.LocalCache$LocalLoadingCache.init(LocalCache.java:4861)
 at com.google.common.cache.CacheBuilder.build(CacheBuilder.java:803)
 at org.iq80.leveldb.impl.TableCache.init(TableCache.java:46)
 at org.iq80.leveldb.impl.DbImpl.init(DbImpl.java:155)
 at org.iq80.leveldb.impl.Iq80DBFactory.open(Iq80DBFactory.java:59)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply$mcV$sp(LevelDBClient.scala:661)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply(LevelDBClient.scala:657)
 at 
 org.apache.activemq.leveldb.LevelDBClient$$anonfun$replay_init$2.apply(LevelDBClient.scala:657)
 at 
 org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:549)
 The problem seems to be because of multiple jars having the same classes
 guava-12.jar
 pax-url-aether-1.5.2.jar
 The class present in pax-url-aether-1.5.2.jar gets loaded causing level db to 
 fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2014-06-24 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041885#comment-14041885
 ] 

anselme dewavrin edited comment on AMQ-5082 at 6/24/14 9:03 AM:


Dear all,

We had the exact same problem due to backups at our hosting company, that we 
could not avoid.

So we can consider that the cluster behaves  normally : when unable to 
communicate with each others, brokers try to fail over.

We increased the ticktime in zookeeper and it works perfectly. 4 months in 
production already.

Anselme


was (Author: adewavrin):
Dear all,

We had the exact same problem due to backups at our hosting company, that we 
could not avoid.

We worked around by increasing the ticktime in zookeeper.

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Priority: Critical
 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (AMQ-5082) ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening

2014-06-24 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041885#comment-14041885
 ] 

anselme dewavrin edited comment on AMQ-5082 at 6/24/14 9:03 AM:


Dear all,

We had the exact same problem due to backups at our hosting company, that we 
could not avoid.

So we can consider that the cluster behaves  normally : when unable to 
communicate with each others, the brokers try to fail over.

We increased the ticktime in zookeeper and it works perfectly. 4 months in 
production already.

Anselme


was (Author: adewavrin):
Dear all,

We had the exact same problem due to backups at our hosting company, that we 
could not avoid.

So we can consider that the cluster behaves  normally : when unable to 
communicate with each others, brokers try to fail over.

We increased the ticktime in zookeeper and it works perfectly. 4 months in 
production already.

Anselme

 ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
 ---

 Key: AMQ-5082
 URL: https://issues.apache.org/jira/browse/AMQ-5082
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0, 5.10.0
Reporter: Scott Feldstein
Priority: Critical
 Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
 mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
 zookeeper.out-cluster.failure


 I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
 persistence adapter.
 {code}
 persistenceAdapter
 replicatedLevelDB
   directory=${activemq.data}/leveldb
   replicas=3
   bind=tcp://0.0.0.0:0
   zkAddress=zookeep0:2181
   zkPath=/activemq/leveldb-stores/
 /persistenceAdapter
 {code}
 After about a day or so of sitting idle there are cascading failures and the 
 cluster completely stops listening all together.
 I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
 the three mq nodes and the zookeeper logs that reflect the time where the 
 cluster starts having issues.
 The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
 The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
 issue.
 If you need more data it should be pretty easy to get whatever is needed 
 since it is consistently reproducible.
 This bug may be related to AMQ-5026, but looks different enough to file a 
 separate issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AMQ-5235) erroneous temp percent used

2014-06-20 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038587#comment-14038587
 ] 

anselme dewavrin commented on AMQ-5235:
---

Timothy,

Thank you for your answer and suggestion

We have the same behavior with 5.10 snapshot 20 december that we have in 
production.

Anselme

 erroneous temp percent used
 ---

 Key: AMQ-5235
 URL: https://issues.apache.org/jira/browse/AMQ-5235
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian (quality testing and production)
Reporter: anselme dewavrin

 Dear all,
 We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
 security because we only use persistent messages (about 6000 messages per 
 day). After severall days of use, the temp usage increases, and even shows 
 values that are above the total amount of the data on disk. Here it shows 45% 
 of its 1GB limit for the following files :
 find activemq-data -ls
 768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
 activemq-data
 768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
 activemq-data/store-version.txt
 768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/dirty.index
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/dirty.index/08.sst
 768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/dirty.index/CURRENT
 76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
 activemq-data/dirty.index/11.sst
 768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/dirty.index/LOCK
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/dirty.index/05.sst
 76809821 2048 -rw-r--r--   1 anselme  anselme   2097152 Jun 19 11:30 
 activemq-data/dirty.index/12.log
 76809818 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/dirty.index/MANIFEST-10
 768098160 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/lock
 76809815 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 19 11:30 
 activemq-data/00f0faaf.log
 76809823 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 16 11:50 
 activemq-data/00385f46.log
 768098074 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/00f0faaf.index
 76809808  420 -rw-r--r--   1 anselme  anselme429264 Jun 16 16:13 
 activemq-data/00f0faaf.index/09.log
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/00f0faaf.index/08.sst
 768098124 -rw-r--r--   1 anselme  anselme   165 Jun 16 16:13 
 activemq-data/00f0faaf.index/MANIFEST-07
 768098094 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/00f0faaf.index/CURRENT
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/00f0faaf.index/05.sst
 76809814 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 12 21:06 
 activemq-data/.log
 768098024 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/plist.index
 768098034 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/plist.index/CURRENT
 768098060 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/plist.index/LOCK
 76809805 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/03.log
 76809804 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/MANIFEST-02
 The problem is that in our production system it once blocked producers with a 
 tempusage at 122%, even if the disk was empty.
 So we invesigated and executed the broker in a debugger, and found how the 
 usage is calculated. If it in the scala leveldb files : It is not based on 
 what  is on disk, but on what it thinks is on the disk. It multiplies the 
 size of one log by the number of logs known by a certain hashmap.
 I think the entries of  the hashmap are not removed when the log files are 
 purged.
 Could you confirm ?
 Thanks in advance 
 Anselme



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AMQ-5235) erroneous temp percent used

2014-06-20 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038602#comment-14038602
 ] 

anselme dewavrin commented on AMQ-5235:
---

In fact I would greatly appreciate if an author confirms that the tempUsage is 
computed in
LevelDBClient.scala
at line :
  def size: Long = logRefs.size * store.logSize

Thank you in advance
Anselme

 erroneous temp percent used
 ---

 Key: AMQ-5235
 URL: https://issues.apache.org/jira/browse/AMQ-5235
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian (quality testing and production)
Reporter: anselme dewavrin

 Dear all,
 We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
 security because we only use persistent messages (about 6000 messages per 
 day). After severall days of use, the temp usage increases, and even shows 
 values that are above the total amount of the data on disk. Here it shows 45% 
 of its 1GB limit for the following files :
 find activemq-data -ls
 768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
 activemq-data
 768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
 activemq-data/store-version.txt
 768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/dirty.index
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/dirty.index/08.sst
 768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/dirty.index/CURRENT
 76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
 activemq-data/dirty.index/11.sst
 768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/dirty.index/LOCK
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/dirty.index/05.sst
 76809821 2048 -rw-r--r--   1 anselme  anselme   2097152 Jun 19 11:30 
 activemq-data/dirty.index/12.log
 76809818 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/dirty.index/MANIFEST-10
 768098160 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/lock
 76809815 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 19 11:30 
 activemq-data/00f0faaf.log
 76809823 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 16 11:50 
 activemq-data/00385f46.log
 768098074 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/00f0faaf.index
 76809808  420 -rw-r--r--   1 anselme  anselme429264 Jun 16 16:13 
 activemq-data/00f0faaf.index/09.log
 768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
 activemq-data/00f0faaf.index/08.sst
 768098124 -rw-r--r--   1 anselme  anselme   165 Jun 16 16:13 
 activemq-data/00f0faaf.index/MANIFEST-07
 768098094 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/00f0faaf.index/CURRENT
 76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
 activemq-data/00f0faaf.index/05.sst
 76809814 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 12 21:06 
 activemq-data/.log
 768098024 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
 activemq-data/plist.index
 768098034 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
 activemq-data/plist.index/CURRENT
 768098060 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
 activemq-data/plist.index/LOCK
 76809805 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/03.log
 76809804 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
 activemq-data/plist.index/MANIFEST-02
 The problem is that in our production system it once blocked producers with a 
 tempusage at 122%, even if the disk was empty.
 So we invesigated and executed the broker in a debugger, and found how the 
 usage is calculated. If it in the scala leveldb files : It is not based on 
 what  is on disk, but on what it thinks is on the disk. It multiplies the 
 size of one log by the number of logs known by a certain hashmap.
 I think the entries of  the hashmap are not removed when the log files are 
 purged.
 Could you confirm ?
 Thanks in advance 
 Anselme



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (AMQ-5235) erroneous temp percent used

2014-06-19 Thread anselme dewavrin (JIRA)
anselme dewavrin created AMQ-5235:
-

 Summary: erroneous temp percent used
 Key: AMQ-5235
 URL: https://issues.apache.org/jira/browse/AMQ-5235
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian (quality testing and production)
Reporter: anselme dewavrin


Dear all,

We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
security because we only use persistent messages (about 6000 messages per day). 
After severall days of use, the temp usage increases, and even shows values 
that are above the total amount of the data on disk. Here it shows 45% of its 
1GB limit for the following files :

find activemq-data -ls
768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
activemq-data
768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
activemq-data/store-version.txt
768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/dirty.index
768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
activemq-data/dirty.index/08.sst
768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/dirty.index/CURRENT
76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
activemq-data/dirty.index/11.sst
768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/dirty.index/LOCK
76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
activemq-data/dirty.index/05.sst
76809821 2048 -rw-r--r--   1 anselme  anselme   2097152 Jun 19 11:30 
activemq-data/dirty.index/12.log
76809818 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/dirty.index/MANIFEST-10
768098160 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/lock
76809815 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 19 11:30 
activemq-data/00f0faaf.log
76809823 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 16 11:50 
activemq-data/00385f46.log
768098074 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/00f0faaf.index
76809808  420 -rw-r--r--   1 anselme  anselme429264 Jun 16 16:13 
activemq-data/00f0faaf.index/09.log
768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
activemq-data/00f0faaf.index/08.sst
768098124 -rw-r--r--   1 anselme  anselme   165 Jun 16 16:13 
activemq-data/00f0faaf.index/MANIFEST-07
768098094 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/00f0faaf.index/CURRENT
76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
activemq-data/00f0faaf.index/05.sst
76809814 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 12 21:06 
activemq-data/.log
768098024 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/plist.index
768098034 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/plist.index/CURRENT
768098060 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/plist.index/LOCK
76809805 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/plist.index/03.log
76809804 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/plist.index/MANIFEST-02


The problem is that in our production system it once blocked producers with a 
tempusage at 122%, even if the disk was empty.

So we invesigated and executed the broker in a debugger, and found how the 
usage is calculated. If it in the scala leveldb files : It is not based on what 
 is on disk, but on what it thinks is on the disk. It multiplies the size of 
one log by the number of logs recorded in a hashmap.

I think the entries of  the hashmap are not removed when the log files are 
purged.

Could you confirm ?

Thanks in advance 
Anselme



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AMQ-5235) erroneous temp percent used

2014-06-19 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin updated AMQ-5235:
--

Description: 
Dear all,

We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
security because we only use persistent messages (about 6000 messages per day). 
After severall days of use, the temp usage increases, and even shows values 
that are above the total amount of the data on disk. Here it shows 45% of its 
1GB limit for the following files :

find activemq-data -ls
768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
activemq-data
768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
activemq-data/store-version.txt
768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/dirty.index
768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
activemq-data/dirty.index/08.sst
768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/dirty.index/CURRENT
76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
activemq-data/dirty.index/11.sst
768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/dirty.index/LOCK
76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
activemq-data/dirty.index/05.sst
76809821 2048 -rw-r--r--   1 anselme  anselme   2097152 Jun 19 11:30 
activemq-data/dirty.index/12.log
76809818 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/dirty.index/MANIFEST-10
768098160 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/lock
76809815 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 19 11:30 
activemq-data/00f0faaf.log
76809823 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 16 11:50 
activemq-data/00385f46.log
768098074 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/00f0faaf.index
76809808  420 -rw-r--r--   1 anselme  anselme429264 Jun 16 16:13 
activemq-data/00f0faaf.index/09.log
768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
activemq-data/00f0faaf.index/08.sst
768098124 -rw-r--r--   1 anselme  anselme   165 Jun 16 16:13 
activemq-data/00f0faaf.index/MANIFEST-07
768098094 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/00f0faaf.index/CURRENT
76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 
activemq-data/00f0faaf.index/05.sst
76809814 102400 -rw-r--r--   1 anselme  anselme  104857600 Jun 12 21:06 
activemq-data/.log
768098024 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/plist.index
768098034 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/plist.index/CURRENT
768098060 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/plist.index/LOCK
76809805 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/plist.index/03.log
76809804 1024 -rw-r--r--   1 anselme  anselme   1048576 Jun 16 16:13 
activemq-data/plist.index/MANIFEST-02


The problem is that in our production system it once blocked producers with a 
tempusage at 122%, even if the disk was empty.

So we invesigated and executed the broker in a debugger, and found how the 
usage is calculated. If it in the scala leveldb files : It is not based on what 
 is on disk, but on what it thinks is on the disk. It multiplies the size of 
one log by the number of logs known by a certain hashmap.

I think the entries of  the hashmap are not removed when the log files are 
purged.

Could you confirm ?

Thanks in advance 
Anselme

  was:
Dear all,

We have an activemq 5.9 configured with 1GB of tempUsage allowed. Just by 
security because we only use persistent messages (about 6000 messages per day). 
After severall days of use, the temp usage increases, and even shows values 
that are above the total amount of the data on disk. Here it shows 45% of its 
1GB limit for the following files :

find activemq-data -ls
768098014 drwxr-xr-x   5 anselme  anselme  4096 Jun 19 10:24 
activemq-data
768098134 -rw-r--r--   1 anselme  anselme24 Jun 16 16:13 
activemq-data/store-version.txt
768098174 drwxr-xr-x   2 anselme  anselme  4096 Jun 16 16:13 
activemq-data/dirty.index
768098114 -rw-r--r--   2 anselme  anselme  2437 Jun 16 12:06 
activemq-data/dirty.index/08.sst
768098204 -rw-r--r--   1 anselme  anselme16 Jun 16 16:13 
activemq-data/dirty.index/CURRENT
76809819   80 -rw-r--r--   1 anselme  anselme 80313 Jun 16 16:13 
activemq-data/dirty.index/11.sst
768098220 -rw-r--r--   1 anselme  anselme 0 Jun 16 16:13 
activemq-data/dirty.index/LOCK
76809810  300 -rw-r--r--   2 anselme  anselme305206 Jun 16 11:51 

[jira] [Resolved] (AMQ-4987) io wait on replicated levelDB slaves

2014-02-18 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin resolved AMQ-4987.
---

   Resolution: Fixed
Fix Version/s: 5.9.0

This fsync impeachment works perfectly.

Anselme

 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor
 Fix For: 5.9.0


 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different 
 machines, as explained on the activemq site (with zookeeper etc.).
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (AMQ-5063) when restarted, slaves do not really sync

2014-02-18 Thread anselme dewavrin (JIRA)
anselme dewavrin created AMQ-5063:
-

 Summary: when restarted, slaves do not really sync
 Key: AMQ-5063
 URL: https://issues.apache.org/jira/browse/AMQ-5063
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian 6, x86-64, jdk1.7
Reporter: anselme dewavrin


Dear All,

I worked several days on replication tests with 5.9 (and snapshot 5.10), using 
replicatedLevelDB configuration as explained on activemq website (details 
follow). Everything is replicated well as long as every node is up.

But if I 
-stop a slave,
-inject messages,
-restart it,
then I see in the logs that it caught up. But if I make it become the master, 
the new messages are not there...

Did I misunderstood the goal of replication ? Is this normal ?

Thank you all,
Anselme


 persistenceAdapter
 replicatedLevelDB
   directory=/usr2/talend/activemq/data
   sync=quorum_disk
   weight=2
   replicas=3
   bind=tcp://auchanhmi-was02-prod.ope.cloud.mbs:1
   
zkAddress=a-was01-prod.ope.cloud.mbs:2190,a-was02-prod.ope.cloud.mbs:2190,ahmi-was10-prod.ope.cloud.mbs:2190
   zkPassword=password
   zkPath=/activemq/leveldb-stores
  hostname=a-was02-prod.ope.cloud.mbs
  verifyChecksums=true
  paranoidChecks=true
  /
   /persistenceAdapter





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (AMQ-4987) io wait on replicated levelDB slaves

2014-02-18 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903946#comment-13903946
 ] 

anselme dewavrin edited comment on AMQ-4987 at 2/18/14 12:54 PM:
-

After 3 weeks this fsync impeachment works perfectly.

Anselme


was (Author: adewavrin):
This fsync impeachment works perfectly.

Anselme

 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor
 Fix For: 5.9.0


 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different 
 machines, as explained on the activemq site (with zookeeper etc.).
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (AMQ-4987) io wait on replicated levelDB slaves

2014-02-18 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903946#comment-13903946
 ] 

anselme dewavrin edited comment on AMQ-4987 at 2/18/14 12:55 PM:
-

After 3 weeks this fsync impeachment still works perfectly.

Anselme


was (Author: adewavrin):
After 3 weeks this fsync impeachment works perfectly.

Anselme

 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor
 Fix For: 5.9.0


 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different 
 machines, as explained on the activemq site (with zookeeper etc.).
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (AMQ-5063) when restarted, slaves do not really sync

2014-02-18 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin updated AMQ-5063:
--

Description: 
Dear All,

I worked several days on replication tests with 5.9 (and snapshot 5.10), using 
replicatedLevelDB configuration as explained on activemq website (details 
follow). Everything is replicated well as long as every node is up.

But if I 
-stop a slave,
-inject messages,
-restart it,
then I see in the logs that it caught up. But if I make it become the master, 
the messages are not there...

Did I misunderstood the goal of replication ? Is this normal ?

Thank you all,
Anselme


For instance the conf of 2nd node is :

 persistenceAdapter
 replicatedLevelDB
   directory=/usr2/talend/activemq/data
   sync=quorum_disk
   weight=2
   replicas=3
   bind=tcp://awas02:1
   zkAddress=awas01:2190,awas02:2190,awas10:2190
   zkPassword=password
   zkPath=/activemq/leveldb-stores
  hostname=awas02
  verifyChecksums=true
  paranoidChecks=true
  /
   /persistenceAdapter


  was:
Dear All,

I worked several days on replication tests with 5.9 (and snapshot 5.10), using 
replicatedLevelDB configuration as explained on activemq website (details 
follow). Everything is replicated well as long as every node is up.

But if I 
-stop a slave,
-inject messages,
-restart it,
then I see in the logs that it caught up. But if I make it become the master, 
the new messages are not there...

Did I misunderstood the goal of replication ? Is this normal ?

Thank you all,
Anselme


 persistenceAdapter
 replicatedLevelDB
   directory=/usr2/talend/activemq/data
   sync=quorum_disk
   weight=2
   replicas=3
   bind=tcp://auchanhmi-was02-prod.ope.cloud.mbs:1
   
zkAddress=a-was01-prod.ope.cloud.mbs:2190,a-was02-prod.ope.cloud.mbs:2190,ahmi-was10-prod.ope.cloud.mbs:2190
   zkPassword=password
   zkPath=/activemq/leveldb-stores
  hostname=a-was02-prod.ope.cloud.mbs
  verifyChecksums=true
  paranoidChecks=true
  /
   /persistenceAdapter




 when restarted, slaves do not really sync
 -

 Key: AMQ-5063
 URL: https://issues.apache.org/jira/browse/AMQ-5063
 Project: ActiveMQ
  Issue Type: Bug
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian 6, x86-64, jdk1.7
Reporter: anselme dewavrin

 Dear All,
 I worked several days on replication tests with 5.9 (and snapshot 5.10), 
 using replicatedLevelDB configuration as explained on activemq website 
 (details follow). Everything is replicated well as long as every node is up.
 But if I 
 -stop a slave,
 -inject messages,
 -restart it,
 then I see in the logs that it caught up. But if I make it become the 
 master, the messages are not there...
 Did I misunderstood the goal of replication ? Is this normal ?
 Thank you all,
 Anselme
 For instance the conf of 2nd node is :
  persistenceAdapter
  replicatedLevelDB
directory=/usr2/talend/activemq/data
sync=quorum_disk
weight=2
replicas=3
bind=tcp://awas02:1
zkAddress=awas01:2190,awas02:2190,awas10:2190
zkPassword=password
zkPath=/activemq/leveldb-stores
   hostname=awas02
   verifyChecksums=true
   paranoidChecks=true
   /
/persistenceAdapter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (AMQ-4987) io wait on replicated levelDB slaves

2014-01-23 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin updated AMQ-4987:
--

Description: 
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 600 
messages/s).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. And the iowait is justified by 3000 bo/s (blocks out) in 
the vmstat report.

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
so much to disk ?

Many thanks in advance
Yours,
Anselme

  was:
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 600 
messages/s).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. 

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ?

Many thanks in advance
Yours,
Anselme


 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor

 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
 activemq site.
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 600 
 messages/s).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (AMQ-4987) io wait on replicated levelDB slaves

2014-01-23 Thread anselme dewavrin (JIRA)

[ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879805#comment-13879805
 ] 

anselme dewavrin commented on AMQ-4987:
---

Dear All,

The iowait and huge disk activity is due to frequent fsyncs on the slaves. I 
demonstrated this by preloading a library that disables fsyncs (with a 
LD_PRELOAD=libeatmydata.so). With this trick the iowait disappear.

I think the leveldb replicated stores is in itself secure enough without 
fsync-ing each update, because of the synchronous replication (sync=quorum_mem) 
on different machines that should fail within the same seconds. This is why the 
many fsyncs on the slaves are not useful, in my opinion.

For my purpose I will live with the LD_PRELOAD, but it could be profitable for 
the community to make the levelDB replicated store evolve.

Anselme

 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor

 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
 activemq site.
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 600 
 messages/s).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (AMQ-4987) io wait on replicated levelDB slaves

2014-01-23 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin updated AMQ-4987:
--

Description: 
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 100 
messages/s, each message is 10k).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. And the iowait is justified by 3000 bo/s (blocks out) in 
the vmstat report.

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
so much to disk ?

Many thanks in advance
Yours,
Anselme

  was:
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 600 
messages/s).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. And the iowait is justified by 3000 bo/s (blocks out) in 
the vmstat report.

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
so much to disk ?

Many thanks in advance
Yours,
Anselme


 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor

 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
 activemq site.
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (AMQ-4987) io wait on replicated levelDB slaves

2014-01-23 Thread anselme dewavrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMQ-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anselme dewavrin updated AMQ-4987:
--

Description: 
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different machines, 
as explained on the activemq site (with zookeeper etc.).

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 100 
messages/s, each message is 10k).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. And the iowait is justified by 3000 bo/s (blocks out) in 
the vmstat report.

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
so much to disk ?

Many thanks in advance
Yours,
Anselme

  was:
Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 100 
messages/s, each message is 10k).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. And the iowait is justified by 3000 bo/s (blocks out) in 
the vmstat report.

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
so much to disk ?

Many thanks in advance
Yours,
Anselme


 io wait on replicated levelDB slaves
 

 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor

 Dear all,
 I set up a 3-nodes replicatedLevelDB activeMQ cluster on 3 different 
 machines, as explained on the activemq site (with zookeeper etc.).
 I made a message injector using the php stomp client described here : 
 http://stomp.fusesource.org/documentation/php/book.html
 Then I injected persistent messages as fast as possible (giving about 100 
 messages/s, each message is 10k).
 Everything works fine, then I measured the servers' activity with vmstat 1. 
 I saw no iowait on the master node, but  20% on both slaves. This would 
 impeach scalabitity I suppose. And the iowait is justified by 3000 bo/s 
 (blocks out) in the vmstat report.
 The machines are not swapping (paging).
 Here is what I tried, without success :
 -specify explicitly sync=quorum_mem
 -JNI implementation of the leveldb store (and verified it is used)
 -setting flushDelay to 2000
 Does anyone have an idea that I could try ? Why is the leveldb slaves writing 
 so much to disk ?
 Many thanks in advance
 Yours,
 Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (AMQ-4987) io wait on replicated levelDB slaves

2014-01-22 Thread anselme dewavrin (JIRA)
anselme dewavrin created AMQ-4987:
-

 Summary: io wait on replicated levelDB slaves
 Key: AMQ-4987
 URL: https://issues.apache.org/jira/browse/AMQ-4987
 Project: ActiveMQ
  Issue Type: Test
  Components: activemq-leveldb-store
Affects Versions: 5.9.0
 Environment: debian VM 2.6.32-5-amd64, jdk7
Reporter: anselme dewavrin
Priority: Minor


Dear all,
I set up a 3-nodes replicatedLevelDB activeMQ cluster as explained on the 
activemq site.

I made a message injector using the php stomp client described here : 
http://stomp.fusesource.org/documentation/php/book.html

Then I injected persistent messages as fast as possible (giving about 600 
messages/s).

Everything works fine, then I measured the servers' activity with vmstat 1. I 
saw no iowait on the master node, but  20% on both slaves. This would impeach 
scalabitity I suppose. 

The machines are not swapping (paging).

Here is what I tried, without success :
-specify explicitly sync=quorum_mem
-JNI implementation of the leveldb store (and verified it is used)
-setting flushDelay to 2000

Does anyone have an idea that I could try ?

Many thanks in advance
Yours,
Anselme



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)