Hi Lalit, If you are already running your own zookeeper cluster, you do not need to start zookeeper via the runzookeeper script.
Karl On Fri, Jul 4, 2014 at 9:04 AM, lalit jangra <[email protected]> wrote: > Hello Karl, > > I have got my cluster working with each node in cluster having three > zookeeper nodes so total six nodes. I have connected both MCF1 & MCF2 to > this cluster by > using below steps. > > > 1. Start ZooKeeper (using the *runzookeeper[.sh|.bat]* script) > 2. Initialize the ManifoldCF shared configuration data (using > *setglobalproperties[.sh|.bat]*) > 3. Start the database (using *start-database[.sh|.bat]*) > 4. Initialize the database (using *initialize[.sh|.bat]*) > 5. Start the agents process (using *start-agents[.sh|.bat]*, and > optionally *start-agents-2[.sh|.bat]*) > 6. Modify the Tomcat startup script, or use the Tomcat service > administration client, to set a Java "-Dorg.apache.manifoldcf.configfile" > switch to point to the example's *properties.xml* file. > 7. Start Tomcat. > 8. Deploy and start the mcf-crawler-ui, mcf-authority-service, and > mcf-api-service web applications, preferably using the Tomcat > administration client. > > I just want to ask if we need to start zookeeper as in step 1 as we have > already a cluster of zookeeper servers up and running? > > Also i want to confirm if we need to update tomcat as in step 6 as i did > not do that but i am not getting any error as such? Are there any > implications for not using this? > > Finally i would ask for a little help as first i did my setup using > multiprocess-file-exmaple by initializing DB but then for zookeeper i moved > to multiprocess-zk-exmaple. > > Is it safe to use in production? > > Regards. > > > > > On Thu, Jul 3, 2014 at 10:48 PM, lalit jangra <[email protected]> > wrote: > >> Thanks Karl, >> >> I am having one cluster with two MCF instances pointing to one single DB. >> >> Can you please elaborate a bit more? >> >> regards. >> >> >> >> >> On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <[email protected]> wrote: >> >>> >>> Hi lalit, >>> >>> When data is pushed into the database that mcf uses but the mcf instance >>> is not doing the pushing, then caches everywhere will not be properly >>> invalidated. It may be more appropriate to have only one cluster with two >>> members of each type (agents process, mcf UI, etc), if that would be >>> acceptable. >>> >>> >>> Karl >>> >>> Sent from my Windows Phone >>> ------------------------------ >>> From: lalit jangra >>> Sent: 7/3/2014 1:23 PM >>> To: Karl Wright >>> >>> Subject: Re: Zookeeper in Apache ManifoldCF >>> >>> Hello Karl, >>> >>> I have a set of two MCF servers each having its own tomcat server but >>> pointing to same Postgres DB. >>> >>> I have also configured set of three zookeeper servers on each node of >>> cluster, started them, configured properties.xml & properties-global.xml on >>> both nodes. Finally i started zookeeper's start-agents.sh on both nodes. >>> >>> While trying to run ./zkCli.sh -server localhost:2181 on both machines, >>> i am getting different outputs. Is it normal or i am missing something. >>> >>> Node1. >>> >>> [zk: localhost:2181(CONNECTED) 2] ls / >>> >>> [org.apache.manifoldcf.service-AGENT, >>> org.apache.manifoldcf.servicelock-AGENT, >>> org.apache.manifoldcf.configuration, >>> org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper] >>> >>> >>> Node2. >>> >>> [zk: localhost:2181(CONNECTED) 1] ls / >>> >>> [org.apache.manifoldcf.locks-statslock-reindex-jobqueue, >>> org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr, >>> org.apache.manifoldcf.service-AGENT, >>> org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, >>> org.apache.manifoldcf.resources-stats-reindex-jobqueue, >>> org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr, >>> org.apache.manifoldcf.locks-_Cache_JOBSTATUSES, >>> org.apache.manifoldcf.locks-statslock-analyze-jobqueue, >>> org.apache.manifoldcf.servicelock-AGENT, >>> org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_, >>> org.apache.manifoldcf.configuration, >>> org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr, >>> org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK, >>> org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr, >>> org.apache.manifoldcf.resources-_REPR_MINDEPTH_, >>> org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME, >>> org.apache.manifoldcf.resources-stats-analyze-jobqueue, >>> org.apache.manifoldcf.locks-_IDFACTORY_, >>> org.apache.manifoldcf.locks-_JOBRESET_, >>> org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, >>> org.apache.manifoldcf.resources-cache-JOBSTATUSES, >>> org.apache.manifoldcf.locks-_JOBSTOP_, >>> org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr, >>> zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_, >>> org.apache.manifoldcf.locks-_Cache_JOB_1404323519962, >>> org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors, >>> org.apache.manifoldcf.locks-_JOBRESUME_] >>> >>> >>> Also in clustered setup, i noticed one strange behavior. >>> >>> If i created a job on say MCF1 in clustered setup, it is created but not >>> replicated to MCF2 node. I need to restart MCF2 node to get it replicated >>> there. Is it OK? >>> >>> Please suggest. >>> >>> Regards. >>> >>> >>> On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi lalit, >>>> >>>> Each agents process in a cluster needs its own Id. Please look >>>> carefully at the multiprocess zookeeper example for details how to do >>>> that. If you didn't intend for there to be multiple agents processes in >>>> one cluster, you did something wrong, because that is what you have. >>>> >>>> >>>> Karl >>>> >>>> Sent from my Windows Phone >>>> ------------------------------ >>>> From: lalit jangra >>>> Sent: 7/2/2014 2:11 PM >>>> To: Karl Wright >>>> Cc: [email protected] >>>> >>>> Subject: Re: Zookeeper in Apache ManifoldCF >>>> >>>> Hello, >>>> >>>> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my >>>> server and in mcf/dist/mulitprocess-zk-example i have configured all three >>>> servers as comma separated list. >>>> >>>> Now i have started all three zookeeper instances and i could see all >>>> three running. Next i tried with a crawl job but in manifoldcf.logs, i can >>>> see below error. >>>> >>>> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: >>>> Service '' of type >>>> 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already active >>>> >>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' >>>> of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is >>>> already active >>>> >>>> at >>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156) >>>> >>>> at >>>> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120) >>>> >>>> at >>>> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69) >>>> >>>> at >>>> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) >>>> >>>> at >>>> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) >>>> >>>> >>>> How can i validate that these errors are not related to zookeeper or >>>> not? Also how to know if MCF is integrated with zookeeper. >>>> >>>> >>>> Regards. >>>> >>>> >>>> >>>> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <[email protected]> wrote: >>>> >>>>> Hi Lalit, >>>>> >>>>> I presumed in my recommendation that your "active" and "passive" >>>>> manifoldcf instances were using the same PostgreSQL server, but were using >>>>> different database instances within it. That is the only way it could >>>>> reasonable work. >>>>> >>>>> Any time you have a Zookeeper cluster, they recommend you have three >>>>> instances. Effectively you are setting up two ManifoldCF clusters: an >>>>> "active" one, and a "passive" one. Each one has its own database instance >>>>> within PostgreSQL, and each one (if it is multiprocess) should have 3 >>>>> zookeeper instances. >>>>> >>>>> I hope this is clear. >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <[email protected] >>>>> > wrote: >>>>> >>>>>> Thanks Karl, >>>>>> >>>>>> I have a little variation here and this is about having both MCF >>>>>> nodes in Active/Active nodes pointing to same DB, so still Zookeeper is >>>>>> required? >>>>>> >>>>>> Also does it mean by " two sets of three zookeeper machines", i need >>>>>> to setup three zookeepers onto each node so total 6 zookepeer node here >>>>>> working on both machine in same ensamble? >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Lalit, >>>>>>> >>>>>>> You can keep things really simple by having both active and passive >>>>>>> mcf instances run each as a single process, either under jetty or using >>>>>>> the >>>>>>> combined war under tomcat. If that is not acceptable, you would need >>>>>>> two >>>>>>> sets of three zookeeper machines, one set for each instance. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> Sent from my Windows Phone >>>>>>> ------------------------------ >>>>>>> From: lalit jangra >>>>>>> Sent: 6/30/2014 12:19 PM >>>>>>> To: [email protected] >>>>>>> Subject: Re: Zookeeper in Apache ManifoldCF >>>>>>> >>>>>>> Thanks Karl & Graeme, >>>>>>> >>>>>>> Let me elaborate my scenario and what i am trying to achieve. >>>>>>> >>>>>>> I have two servers each running MCF 1.5.1 individually. But both of >>>>>>> them are backed by same PostGreSQL DB so both of MCF applications are >>>>>>> pointing to same DB at any point of time, without having their own >>>>>>> dedicated DBs. Next, primary/active DB instance is backed up with >>>>>>> periodical backups from active to passive instance. >>>>>>> >>>>>>> Only one DB instance will be active at any time, with other DB >>>>>>> instance acting as active standby. In case of breakdown of >>>>>>> primary/active >>>>>>> instance, passive/secondary will take over and becomes primary/active >>>>>>> instance handling all DB transactions, thus making primary as new >>>>>>> secondary >>>>>>> DB instance. >>>>>>> >>>>>>> Similarly i have two solr 4.6 instances which act in active/passive >>>>>>> mode with periodic backup of active/primary to passive/secondary with >>>>>>> active standby and failover. >>>>>>> >>>>>>> So my intention of clustering is high availability of system with >>>>>>> failover but i will not use both of MCF instances parallely or >>>>>>> simultaneously. >>>>>>> >>>>>>> Finally i am limited to having two instances only but as mentioned >>>>>>> earlier, we need at least three Zookeeper instances for a proper >>>>>>> Zookeeper >>>>>>> clustering. >>>>>>> >>>>>>> Is it still worthy to go and use Zookeeper or i can do simple >>>>>>> clustering where each of MCF node is clustered using same DB. Please >>>>>>> suggest. >>>>>>> >>>>>>> Thanks for help. >>>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Lalit, >>>>>>>> >>>>>>>> For production use, you will want to spin up your own ZK cluster >>>>>>>> using the instructions on the zookeeper site (as pointed out earlier at >>>>>>>> least 3 is recommended).... >>>>>>>> >>>>>>>> You then need to modify the properties.xml file in >>>>>>>> multiprocess-zk-example to point to the list of Zookeeper servers. You >>>>>>>> also need to modify properties-global.xml with the appropriate global >>>>>>>> settings i.e. logging levels, Postgresql database etc. and then run >>>>>>>> setglobalproperties.sh to register the settings in ZK. >>>>>>>> >>>>>>>> To test that is working, set up a crawl and then tail the >>>>>>>> manifoldcf.log file on each of your nodes to check that they are all >>>>>>>> crawling in parallel. >>>>>>>> >>>>>>>> HTH, >>>>>>>> >>>>>>>> Graeme >>>>>>>> >>>>>>>> >>>>>>>> On 25/06/14 12:19, Karl Wright wrote: >>>>>>>> >>>>>>>> Hi Lalit, >>>>>>>> >>>>>>>> Zookeeper does not use a database; it keeps its stuff in the local >>>>>>>> file system. Each Zookeeper node has its own local data, and >>>>>>>> everything >>>>>>>> else is socket communication between them. >>>>>>>> >>>>>>>> As for information: http://zookeeper.apache.org/ >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Karl, >>>>>>>>> >>>>>>>>> Apologies as i am not very familiar with Zookeeper and trying to >>>>>>>>> figure out on same. >>>>>>>>> >>>>>>>>> Is there any more documentation/pointers available for same as >>>>>>>>> that would be more helpful. >>>>>>>>> >>>>>>>>> Also i have 2 tomcat servers in cluster, each having MCF 1.5.1 >>>>>>>>> setup and configured to point to same PostGreSQL DB & DB is backed up >>>>>>>>> for >>>>>>>>> failover. From your inputs, it seems that we need to configure a >>>>>>>>> separate >>>>>>>>> standalone Zookeeper server which will act as Master and both nodes in >>>>>>>>> cluster will need to work as slaves and talk to standalone Zookeeper >>>>>>>>> master. >>>>>>>>> >>>>>>>>> Also the Zookeeper server will have its own DB so either we can >>>>>>>>> host it separately or we can use same Postgres DB? >>>>>>>>> >>>>>>>>> Regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Lalit, >>>>>>>>>> >>>>>>>>>> 1. zookeeper is already spun into MCF. in fact you start a >>>>>>>>>> zookeeper instance when you run the mcf zookeeper example. They >>>>>>>>>> recommend, >>>>>>>>>> though, that for failover you have 3 instances, etc. >>>>>>>>>> 2. Looks like the documentation is out of date and something old >>>>>>>>>> is left in there. >>>>>>>>>> 3. Zookeeper is a client/server kind of arrangement. You need >>>>>>>>>> at least ONE zookeeper server, and each cluster member includes a >>>>>>>>>> zookeeper >>>>>>>>>> client, which is configured to talk with ALL the zookeeper server >>>>>>>>>> instances >>>>>>>>>> you have. >>>>>>>>>> 4. There is ONE database instance; the instance may be >>>>>>>>>> supported by failover and redundant Postgresql, but it appears as one >>>>>>>>>> instance. TO get failover from Postgres you need the Enterprise >>>>>>>>>> Edition, >>>>>>>>>> which costs money. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Karl, >>>>>>>>>>> >>>>>>>>>>> That was helpful. >>>>>>>>>>> >>>>>>>>>>> I am setting clustered setup on Tomcats as i was following >>>>>>>>>>> instructions @ >>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization >>>>>>>>>>> and i need some suggestions here. >>>>>>>>>>> >>>>>>>>>>> 1. Do we need to download zookeeper and put it in >>>>>>>>>>> multiprocess-zk-example folder or it is already spun into MCF and >>>>>>>>>>> we are >>>>>>>>>>> good to go? >>>>>>>>>>> 2. It says all jars under *processes *should be put into >>>>>>>>>>> classpath but i can not see any *processes *folder under MCF? >>>>>>>>>>> 3. Do we need to setup Zookeeper on both nodes or only at one >>>>>>>>>>> node, i assume we need to do on both nodes ? >>>>>>>>>>> 4. Do we also need to setup databases separately on both nodes >>>>>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL or it >>>>>>>>>>> will use >>>>>>>>>>> its own HSQL DB? >>>>>>>>>>> >>>>>>>>>>> Finally how can i test that my Zookeeper is setp and ready to >>>>>>>>>>> roll? >>>>>>>>>>> >>>>>>>>>>> Thanks for your help. >>>>>>>>>>> >>>>>>>>>>> Regards. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Lalit, >>>>>>>>>>>> ZooKeeper is standard for cluster deployments these days. See >>>>>>>>>>>> the multiprocess-zookeeper example for ideas about how to deploy >>>>>>>>>>>> it. It's >>>>>>>>>>>> also important to read the how-to-build-and-deploy page to >>>>>>>>>>>> understand the >>>>>>>>>>>> example. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am planning to use MCF in cluster mode. For same, i want to >>>>>>>>>>>>> know if Zookeeper is of any help here? >>>>>>>>>>>>> >>>>>>>>>>>>> If yes, how can it be leveraged in distributed MCF servers? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Lalit Jangra. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> Lalit Jangra. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Lalit Jangra. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Lalit Jangra. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Lalit Jangra. >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit Jangra. >>>> >>> >>> >>> >>> -- >>> Regards, >>> Lalit Jangra. >>> >> >> >> >> -- >> Regards, >> Lalit Jangra. >> > > > > -- > Regards, > Lalit Jangra. >
