Hi lalit, When data is pushed into the database that mcf uses but the mcf instance is not doing the pushing, then caches everywhere will not be properly invalidated. It may be more appropriate to have only one cluster with two members of each type (agents process, mcf UI, etc), if that would be acceptable.
Karl Sent from my Windows Phone ------------------------------ From: lalit jangra Sent: 7/3/2014 1:23 PM To: Karl Wright Subject: Re: Zookeeper in Apache ManifoldCF Hello Karl, I have a set of two MCF servers each having its own tomcat server but pointing to same Postgres DB. I have also configured set of three zookeeper servers on each node of cluster, started them, configured properties.xml & properties-global.xml on both nodes. Finally i started zookeeper's start-agents.sh on both nodes. While trying to run ./zkCli.sh -server localhost:2181 on both machines, i am getting different outputs. Is it normal or i am missing something. Node1. [zk: localhost:2181(CONNECTED) 2] ls / [org.apache.manifoldcf.service-AGENT, org.apache.manifoldcf.servicelock-AGENT, org.apache.manifoldcf.configuration, org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper] Node2. [zk: localhost:2181(CONNECTED) 1] ls / [org.apache.manifoldcf.locks-statslock-reindex-jobqueue, org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr, org.apache.manifoldcf.service-AGENT, org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, org.apache.manifoldcf.resources-stats-reindex-jobqueue, org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr, org.apache.manifoldcf.locks-_Cache_JOBSTATUSES, org.apache.manifoldcf.locks-statslock-analyze-jobqueue, org.apache.manifoldcf.servicelock-AGENT, org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_, org.apache.manifoldcf.configuration, org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr, org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK, org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr, org.apache.manifoldcf.resources-_REPR_MINDEPTH_, org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME, org.apache.manifoldcf.resources-stats-analyze-jobqueue, org.apache.manifoldcf.locks-_IDFACTORY_, org.apache.manifoldcf.locks-_JOBRESET_, org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, org.apache.manifoldcf.resources-cache-JOBSTATUSES, org.apache.manifoldcf.locks-_JOBSTOP_, org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr, zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_, org.apache.manifoldcf.locks-_Cache_JOB_1404323519962, org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors, org.apache.manifoldcf.locks-_JOBRESUME_] Also in clustered setup, i noticed one strange behavior. If i created a job on say MCF1 in clustered setup, it is created but not replicated to MCF2 node. I need to restart MCF2 node to get it replicated there. Is it OK? Please suggest. Regards. On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <[email protected]> wrote: > Hi lalit, > > Each agents process in a cluster needs its own Id. Please look carefully > at the multiprocess zookeeper example for details how to do that. If you > didn't intend for there to be multiple agents processes in one cluster, you > did something wrong, because that is what you have. > > > Karl > > Sent from my Windows Phone > ------------------------------ > From: lalit jangra > Sent: 7/2/2014 2:11 PM > To: Karl Wright > Cc: [email protected] > > Subject: Re: Zookeeper in Apache ManifoldCF > > Hello, > > I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my > server and in mcf/dist/mulitprocess-zk-example i have configured all three > servers as comma separated list. > > Now i have started all three zookeeper instances and i could see all three > running. Next i tried with a crawl job but in manifoldcf.logs, i can see > below error. > > ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: Service > '' of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is > already active > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of > type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already > active > > at > org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156) > > at > org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120) > > at > org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69) > > at > org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) > > at > org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) > > > How can i validate that these errors are not related to zookeeper or not? > Also how to know if MCF is integrated with zookeeper. > > > Regards. > > > > On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <[email protected]> wrote: > >> Hi Lalit, >> >> I presumed in my recommendation that your "active" and "passive" >> manifoldcf instances were using the same PostgreSQL server, but were using >> different database instances within it. That is the only way it could >> reasonable work. >> >> Any time you have a Zookeeper cluster, they recommend you have three >> instances. Effectively you are setting up two ManifoldCF clusters: an >> "active" one, and a "passive" one. Each one has its own database instance >> within PostgreSQL, and each one (if it is multiprocess) should have 3 >> zookeeper instances. >> >> I hope this is clear. >> >> Karl >> >> >> >> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <[email protected]> >> wrote: >> >>> Thanks Karl, >>> >>> I have a little variation here and this is about having both MCF nodes >>> in Active/Active nodes pointing to same DB, so still Zookeeper is required? >>> >>> Also does it mean by " two sets of three zookeeper machines", i need to >>> setup three zookeepers onto each node so total 6 zookepeer node here >>> working on both machine in same ensamble? >>> >>> Regards. >>> >>> >>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Lalit, >>>> >>>> You can keep things really simple by having both active and passive mcf >>>> instances run each as a single process, either under jetty or using the >>>> combined war under tomcat. If that is not acceptable, you would need two >>>> sets of three zookeeper machines, one set for each instance. >>>> >>>> Karl >>>> >>>> Sent from my Windows Phone >>>> ------------------------------ >>>> From: lalit jangra >>>> Sent: 6/30/2014 12:19 PM >>>> To: [email protected] >>>> Subject: Re: Zookeeper in Apache ManifoldCF >>>> >>>> Thanks Karl & Graeme, >>>> >>>> Let me elaborate my scenario and what i am trying to achieve. >>>> >>>> I have two servers each running MCF 1.5.1 individually. But both of >>>> them are backed by same PostGreSQL DB so both of MCF applications are >>>> pointing to same DB at any point of time, without having their own >>>> dedicated DBs. Next, primary/active DB instance is backed up with >>>> periodical backups from active to passive instance. >>>> >>>> Only one DB instance will be active at any time, with other DB instance >>>> acting as active standby. In case of breakdown of primary/active instance, >>>> passive/secondary will take over and becomes primary/active instance >>>> handling all DB transactions, thus making primary as new secondary DB >>>> instance. >>>> >>>> Similarly i have two solr 4.6 instances which act in active/passive >>>> mode with periodic backup of active/primary to passive/secondary with >>>> active standby and failover. >>>> >>>> So my intention of clustering is high availability of system with >>>> failover but i will not use both of MCF instances parallely or >>>> simultaneously. >>>> >>>> Finally i am limited to having two instances only but as mentioned >>>> earlier, we need at least three Zookeeper instances for a proper Zookeeper >>>> clustering. >>>> >>>> Is it still worthy to go and use Zookeeper or i can do simple >>>> clustering where each of MCF node is clustered using same DB. Please >>>> suggest. >>>> >>>> Thanks for help. >>>> >>>> Regards. >>>> >>>> >>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <[email protected]> >>>> wrote: >>>> >>>>> Hi Lalit, >>>>> >>>>> For production use, you will want to spin up your own ZK cluster using >>>>> the instructions on the zookeeper site (as pointed out earlier at least 3 >>>>> is recommended).... >>>>> >>>>> You then need to modify the properties.xml file in >>>>> multiprocess-zk-example to point to the list of Zookeeper servers. You >>>>> also need to modify properties-global.xml with the appropriate global >>>>> settings i.e. logging levels, Postgresql database etc. and then run >>>>> setglobalproperties.sh to register the settings in ZK. >>>>> >>>>> To test that is working, set up a crawl and then tail the >>>>> manifoldcf.log file on each of your nodes to check that they are all >>>>> crawling in parallel. >>>>> >>>>> HTH, >>>>> >>>>> Graeme >>>>> >>>>> >>>>> On 25/06/14 12:19, Karl Wright wrote: >>>>> >>>>> Hi Lalit, >>>>> >>>>> Zookeeper does not use a database; it keeps its stuff in the local >>>>> file system. Each Zookeeper node has its own local data, and everything >>>>> else is socket communication between them. >>>>> >>>>> As for information: http://zookeeper.apache.org/ >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra < >>>>> [email protected]> wrote: >>>>> >>>>>> Thanks Karl, >>>>>> >>>>>> Apologies as i am not very familiar with Zookeeper and trying to >>>>>> figure out on same. >>>>>> >>>>>> Is there any more documentation/pointers available for same as that >>>>>> would be more helpful. >>>>>> >>>>>> Also i have 2 tomcat servers in cluster, each having MCF 1.5.1 >>>>>> setup and configured to point to same PostGreSQL DB & DB is backed up for >>>>>> failover. From your inputs, it seems that we need to configure a separate >>>>>> standalone Zookeeper server which will act as Master and both nodes in >>>>>> cluster will need to work as slaves and talk to standalone Zookeeper >>>>>> master. >>>>>> >>>>>> Also the Zookeeper server will have its own DB so either we can >>>>>> host it separately or we can use same Postgres DB? >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Lalit, >>>>>>> >>>>>>> 1. zookeeper is already spun into MCF. in fact you start a >>>>>>> zookeeper instance when you run the mcf zookeeper example. They >>>>>>> recommend, >>>>>>> though, that for failover you have 3 instances, etc. >>>>>>> 2. Looks like the documentation is out of date and something old is >>>>>>> left in there. >>>>>>> 3. Zookeeper is a client/server kind of arrangement. You need at >>>>>>> least ONE zookeeper server, and each cluster member includes a zookeeper >>>>>>> client, which is configured to talk with ALL the zookeeper server >>>>>>> instances >>>>>>> you have. >>>>>>> 4. There is ONE database instance; the instance may be supported >>>>>>> by failover and redundant Postgresql, but it appears as one instance. >>>>>>> TO >>>>>>> get failover from Postgres you need the Enterprise Edition, which costs >>>>>>> money. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Thanks Karl, >>>>>>>> >>>>>>>> That was helpful. >>>>>>>> >>>>>>>> I am setting clustered setup on Tomcats as i was following >>>>>>>> instructions @ >>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization >>>>>>>> and i need some suggestions here. >>>>>>>> >>>>>>>> 1. Do we need to download zookeeper and put it in >>>>>>>> multiprocess-zk-example folder or it is already spun into MCF and we >>>>>>>> are >>>>>>>> good to go? >>>>>>>> 2. It says all jars under *processes *should be put into >>>>>>>> classpath but i can not see any *processes *folder under MCF? >>>>>>>> 3. Do we need to setup Zookeeper on both nodes or only at one >>>>>>>> node, i assume we need to do on both nodes ? >>>>>>>> 4. Do we also need to setup databases separately on both nodes >>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL or it will >>>>>>>> use >>>>>>>> its own HSQL DB? >>>>>>>> >>>>>>>> Finally how can i test that my Zookeeper is setp and ready to >>>>>>>> roll? >>>>>>>> >>>>>>>> Thanks for your help. >>>>>>>> >>>>>>>> Regards. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Lalit, >>>>>>>>> ZooKeeper is standard for cluster deployments these days. See >>>>>>>>> the multiprocess-zookeeper example for ideas about how to deploy it. >>>>>>>>> It's >>>>>>>>> also important to read the how-to-build-and-deploy page to understand >>>>>>>>> the >>>>>>>>> example. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am planning to use MCF in cluster mode. For same, i want to >>>>>>>>>> know if Zookeeper is of any help here? >>>>>>>>>> >>>>>>>>>> If yes, how can it be leveraged in distributed MCF servers? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Lalit Jangra. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Lalit Jangra. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Lalit Jangra. >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit Jangra. >>>> >>> >>> >>> >>> -- >>> Regards, >>> Lalit Jangra. >>> >> >> > > > -- > Regards, > Lalit Jangra. > -- Regards, Lalit Jangra.
