Hello Karl, I have got my cluster working with each node in cluster having three zookeeper nodes so total six nodes. I have connected both MCF1 & MCF2 to this cluster by using below steps.
1. Start ZooKeeper (using the *runzookeeper[.sh|.bat]* script) 2. Initialize the ManifoldCF shared configuration data (using *setglobalproperties[.sh|.bat]*) 3. Start the database (using *start-database[.sh|.bat]*) 4. Initialize the database (using *initialize[.sh|.bat]*) 5. Start the agents process (using *start-agents[.sh|.bat]*, and optionally *start-agents-2[.sh|.bat]*) 6. Modify the Tomcat startup script, or use the Tomcat service administration client, to set a Java "-Dorg.apache.manifoldcf.configfile" switch to point to the example's *properties.xml* file. 7. Start Tomcat. 8. Deploy and start the mcf-crawler-ui, mcf-authority-service, and mcf-api-service web applications, preferably using the Tomcat administration client. I just want to ask if we need to start zookeeper as in step 1 as we have already a cluster of zookeeper servers up and running? Also i want to confirm if we need to update tomcat as in step 6 as i did not do that but i am not getting any error as such? Are there any implications for not using this? Finally i would ask for a little help as first i did my setup using multiprocess-file-exmaple by initializing DB but then for zookeeper i moved to multiprocess-zk-exmaple. Is it safe to use in production? Regards. On Thu, Jul 3, 2014 at 10:48 PM, lalit jangra <[email protected]> wrote: > Thanks Karl, > > I am having one cluster with two MCF instances pointing to one single DB. > > Can you please elaborate a bit more? > > regards. > > > > > On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <[email protected]> wrote: > >> >> Hi lalit, >> >> When data is pushed into the database that mcf uses but the mcf instance >> is not doing the pushing, then caches everywhere will not be properly >> invalidated. It may be more appropriate to have only one cluster with two >> members of each type (agents process, mcf UI, etc), if that would be >> acceptable. >> >> >> Karl >> >> Sent from my Windows Phone >> ------------------------------ >> From: lalit jangra >> Sent: 7/3/2014 1:23 PM >> To: Karl Wright >> >> Subject: Re: Zookeeper in Apache ManifoldCF >> >> Hello Karl, >> >> I have a set of two MCF servers each having its own tomcat server but >> pointing to same Postgres DB. >> >> I have also configured set of three zookeeper servers on each node of >> cluster, started them, configured properties.xml & properties-global.xml on >> both nodes. Finally i started zookeeper's start-agents.sh on both nodes. >> >> While trying to run ./zkCli.sh -server localhost:2181 on both machines, i >> am getting different outputs. Is it normal or i am missing something. >> >> Node1. >> >> [zk: localhost:2181(CONNECTED) 2] ls / >> >> [org.apache.manifoldcf.service-AGENT, >> org.apache.manifoldcf.servicelock-AGENT, >> org.apache.manifoldcf.configuration, >> org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper] >> >> >> Node2. >> >> [zk: localhost:2181(CONNECTED) 1] ls / >> >> [org.apache.manifoldcf.locks-statslock-reindex-jobqueue, >> org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr, >> org.apache.manifoldcf.service-AGENT, >> org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, >> org.apache.manifoldcf.resources-stats-reindex-jobqueue, >> org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr, >> org.apache.manifoldcf.locks-_Cache_JOBSTATUSES, >> org.apache.manifoldcf.locks-statslock-analyze-jobqueue, >> org.apache.manifoldcf.servicelock-AGENT, >> org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_, >> org.apache.manifoldcf.configuration, >> org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr, >> org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK, >> org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr, >> org.apache.manifoldcf.resources-_REPR_MINDEPTH_, >> org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME, >> org.apache.manifoldcf.resources-stats-analyze-jobqueue, >> org.apache.manifoldcf.locks-_IDFACTORY_, >> org.apache.manifoldcf.locks-_JOBRESET_, >> org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, >> org.apache.manifoldcf.resources-cache-JOBSTATUSES, >> org.apache.manifoldcf.locks-_JOBSTOP_, >> org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr, >> zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_, >> org.apache.manifoldcf.locks-_Cache_JOB_1404323519962, >> org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors, >> org.apache.manifoldcf.locks-_JOBRESUME_] >> >> >> Also in clustered setup, i noticed one strange behavior. >> >> If i created a job on say MCF1 in clustered setup, it is created but not >> replicated to MCF2 node. I need to restart MCF2 node to get it replicated >> there. Is it OK? >> >> Please suggest. >> >> Regards. >> >> >> On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <[email protected]> wrote: >> >>> Hi lalit, >>> >>> Each agents process in a cluster needs its own Id. Please look carefully >>> at the multiprocess zookeeper example for details how to do that. If you >>> didn't intend for there to be multiple agents processes in one cluster, you >>> did something wrong, because that is what you have. >>> >>> >>> Karl >>> >>> Sent from my Windows Phone >>> ------------------------------ >>> From: lalit jangra >>> Sent: 7/2/2014 2:11 PM >>> To: Karl Wright >>> Cc: [email protected] >>> >>> Subject: Re: Zookeeper in Apache ManifoldCF >>> >>> Hello, >>> >>> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my >>> server and in mcf/dist/mulitprocess-zk-example i have configured all three >>> servers as comma separated list. >>> >>> Now i have started all three zookeeper instances and i could see all >>> three running. Next i tried with a crawl job but in manifoldcf.logs, i can >>> see below error. >>> >>> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: >>> Service '' of type >>> 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already active >>> >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of >>> type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already >>> active >>> >>> at >>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156) >>> >>> at >>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120) >>> >>> at >>> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69) >>> >>> at >>> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) >>> >>> at >>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) >>> >>> >>> How can i validate that these errors are not related to zookeeper or >>> not? Also how to know if MCF is integrated with zookeeper. >>> >>> >>> Regards. >>> >>> >>> >>> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Lalit, >>>> >>>> I presumed in my recommendation that your "active" and "passive" >>>> manifoldcf instances were using the same PostgreSQL server, but were using >>>> different database instances within it. That is the only way it could >>>> reasonable work. >>>> >>>> Any time you have a Zookeeper cluster, they recommend you have three >>>> instances. Effectively you are setting up two ManifoldCF clusters: an >>>> "active" one, and a "passive" one. Each one has its own database instance >>>> within PostgreSQL, and each one (if it is multiprocess) should have 3 >>>> zookeeper instances. >>>> >>>> I hope this is clear. >>>> >>>> Karl >>>> >>>> >>>> >>>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <[email protected]> >>>> wrote: >>>> >>>>> Thanks Karl, >>>>> >>>>> I have a little variation here and this is about having both MCF nodes >>>>> in Active/Active nodes pointing to same DB, so still Zookeeper is >>>>> required? >>>>> >>>>> Also does it mean by " two sets of three zookeeper machines", i need >>>>> to setup three zookeepers onto each node so total 6 zookepeer node here >>>>> working on both machine in same ensamble? >>>>> >>>>> Regards. >>>>> >>>>> >>>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Lalit, >>>>>> >>>>>> You can keep things really simple by having both active and passive >>>>>> mcf instances run each as a single process, either under jetty or using >>>>>> the >>>>>> combined war under tomcat. If that is not acceptable, you would need two >>>>>> sets of three zookeeper machines, one set for each instance. >>>>>> >>>>>> Karl >>>>>> >>>>>> Sent from my Windows Phone >>>>>> ------------------------------ >>>>>> From: lalit jangra >>>>>> Sent: 6/30/2014 12:19 PM >>>>>> To: [email protected] >>>>>> Subject: Re: Zookeeper in Apache ManifoldCF >>>>>> >>>>>> Thanks Karl & Graeme, >>>>>> >>>>>> Let me elaborate my scenario and what i am trying to achieve. >>>>>> >>>>>> I have two servers each running MCF 1.5.1 individually. But both of >>>>>> them are backed by same PostGreSQL DB so both of MCF applications are >>>>>> pointing to same DB at any point of time, without having their own >>>>>> dedicated DBs. Next, primary/active DB instance is backed up with >>>>>> periodical backups from active to passive instance. >>>>>> >>>>>> Only one DB instance will be active at any time, with other DB >>>>>> instance acting as active standby. In case of breakdown of primary/active >>>>>> instance, passive/secondary will take over and becomes primary/active >>>>>> instance handling all DB transactions, thus making primary as new >>>>>> secondary >>>>>> DB instance. >>>>>> >>>>>> Similarly i have two solr 4.6 instances which act in active/passive >>>>>> mode with periodic backup of active/primary to passive/secondary with >>>>>> active standby and failover. >>>>>> >>>>>> So my intention of clustering is high availability of system with >>>>>> failover but i will not use both of MCF instances parallely or >>>>>> simultaneously. >>>>>> >>>>>> Finally i am limited to having two instances only but as mentioned >>>>>> earlier, we need at least three Zookeeper instances for a proper >>>>>> Zookeeper >>>>>> clustering. >>>>>> >>>>>> Is it still worthy to go and use Zookeeper or i can do simple >>>>>> clustering where each of MCF node is clustered using same DB. Please >>>>>> suggest. >>>>>> >>>>>> Thanks for help. >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Lalit, >>>>>>> >>>>>>> For production use, you will want to spin up your own ZK cluster >>>>>>> using the instructions on the zookeeper site (as pointed out earlier at >>>>>>> least 3 is recommended).... >>>>>>> >>>>>>> You then need to modify the properties.xml file in >>>>>>> multiprocess-zk-example to point to the list of Zookeeper servers. You >>>>>>> also need to modify properties-global.xml with the appropriate global >>>>>>> settings i.e. logging levels, Postgresql database etc. and then run >>>>>>> setglobalproperties.sh to register the settings in ZK. >>>>>>> >>>>>>> To test that is working, set up a crawl and then tail the >>>>>>> manifoldcf.log file on each of your nodes to check that they are all >>>>>>> crawling in parallel. >>>>>>> >>>>>>> HTH, >>>>>>> >>>>>>> Graeme >>>>>>> >>>>>>> >>>>>>> On 25/06/14 12:19, Karl Wright wrote: >>>>>>> >>>>>>> Hi Lalit, >>>>>>> >>>>>>> Zookeeper does not use a database; it keeps its stuff in the local >>>>>>> file system. Each Zookeeper node has its own local data, and everything >>>>>>> else is socket communication between them. >>>>>>> >>>>>>> As for information: http://zookeeper.apache.org/ >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Thanks Karl, >>>>>>>> >>>>>>>> Apologies as i am not very familiar with Zookeeper and trying to >>>>>>>> figure out on same. >>>>>>>> >>>>>>>> Is there any more documentation/pointers available for same as that >>>>>>>> would be more helpful. >>>>>>>> >>>>>>>> Also i have 2 tomcat servers in cluster, each having MCF 1.5.1 >>>>>>>> setup and configured to point to same PostGreSQL DB & DB is backed up >>>>>>>> for >>>>>>>> failover. From your inputs, it seems that we need to configure a >>>>>>>> separate >>>>>>>> standalone Zookeeper server which will act as Master and both nodes in >>>>>>>> cluster will need to work as slaves and talk to standalone Zookeeper >>>>>>>> master. >>>>>>>> >>>>>>>> Also the Zookeeper server will have its own DB so either we can >>>>>>>> host it separately or we can use same Postgres DB? >>>>>>>> >>>>>>>> Regards. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Lalit, >>>>>>>>> >>>>>>>>> 1. zookeeper is already spun into MCF. in fact you start a >>>>>>>>> zookeeper instance when you run the mcf zookeeper example. They >>>>>>>>> recommend, >>>>>>>>> though, that for failover you have 3 instances, etc. >>>>>>>>> 2. Looks like the documentation is out of date and something old >>>>>>>>> is left in there. >>>>>>>>> 3. Zookeeper is a client/server kind of arrangement. You need at >>>>>>>>> least ONE zookeeper server, and each cluster member includes a >>>>>>>>> zookeeper >>>>>>>>> client, which is configured to talk with ALL the zookeeper server >>>>>>>>> instances >>>>>>>>> you have. >>>>>>>>> 4. There is ONE database instance; the instance may be supported >>>>>>>>> by failover and redundant Postgresql, but it appears as one instance. >>>>>>>>> TO >>>>>>>>> get failover from Postgres you need the Enterprise Edition, which >>>>>>>>> costs >>>>>>>>> money. >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Thanks Karl, >>>>>>>>>> >>>>>>>>>> That was helpful. >>>>>>>>>> >>>>>>>>>> I am setting clustered setup on Tomcats as i was following >>>>>>>>>> instructions @ >>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization >>>>>>>>>> and i need some suggestions here. >>>>>>>>>> >>>>>>>>>> 1. Do we need to download zookeeper and put it in >>>>>>>>>> multiprocess-zk-example folder or it is already spun into MCF and we >>>>>>>>>> are >>>>>>>>>> good to go? >>>>>>>>>> 2. It says all jars under *processes *should be put into >>>>>>>>>> classpath but i can not see any *processes *folder under MCF? >>>>>>>>>> 3. Do we need to setup Zookeeper on both nodes or only at one >>>>>>>>>> node, i assume we need to do on both nodes ? >>>>>>>>>> 4. Do we also need to setup databases separately on both nodes >>>>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL or it >>>>>>>>>> will use >>>>>>>>>> its own HSQL DB? >>>>>>>>>> >>>>>>>>>> Finally how can i test that my Zookeeper is setp and ready to >>>>>>>>>> roll? >>>>>>>>>> >>>>>>>>>> Thanks for your help. >>>>>>>>>> >>>>>>>>>> Regards. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Hi Lalit, >>>>>>>>>>> ZooKeeper is standard for cluster deployments these days. See >>>>>>>>>>> the multiprocess-zookeeper example for ideas about how to deploy >>>>>>>>>>> it. It's >>>>>>>>>>> also important to read the how-to-build-and-deploy page to >>>>>>>>>>> understand the >>>>>>>>>>> example. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I am planning to use MCF in cluster mode. For same, i want to >>>>>>>>>>>> know if Zookeeper is of any help here? >>>>>>>>>>>> >>>>>>>>>>>> If yes, how can it be leveraged in distributed MCF servers? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Lalit Jangra. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards, >>>>>>>>>> Lalit Jangra. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Lalit Jangra. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Lalit Jangra. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Lalit Jangra. >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Lalit Jangra. >>> >> >> >> >> -- >> Regards, >> Lalit Jangra. >> > > > > -- > Regards, > Lalit Jangra. > -- Regards, Lalit Jangra.
