Thanks Karl, I am having one cluster with two MCF instances pointing to one single DB.
Can you please elaborate a bit more? regards. On Thu, Jul 3, 2014 at 10:19 PM, Karl Wright <[email protected]> wrote: > > Hi lalit, > > When data is pushed into the database that mcf uses but the mcf instance > is not doing the pushing, then caches everywhere will not be properly > invalidated. It may be more appropriate to have only one cluster with two > members of each type (agents process, mcf UI, etc), if that would be > acceptable. > > > Karl > > Sent from my Windows Phone > ------------------------------ > From: lalit jangra > Sent: 7/3/2014 1:23 PM > To: Karl Wright > > Subject: Re: Zookeeper in Apache ManifoldCF > > Hello Karl, > > I have a set of two MCF servers each having its own tomcat server but > pointing to same Postgres DB. > > I have also configured set of three zookeeper servers on each node of > cluster, started them, configured properties.xml & properties-global.xml on > both nodes. Finally i started zookeeper's start-agents.sh on both nodes. > > While trying to run ./zkCli.sh -server localhost:2181 on both machines, i > am getting different outputs. Is it normal or i am missing something. > > Node1. > > [zk: localhost:2181(CONNECTED) 2] ls / > > [org.apache.manifoldcf.service-AGENT, > org.apache.manifoldcf.servicelock-AGENT, > org.apache.manifoldcf.configuration, > org.apache.manifoldcf.serviceactive-AGENT-A, zookeeper] > > > Node2. > > [zk: localhost:2181(CONNECTED) 1] ls / > > [org.apache.manifoldcf.locks-statslock-reindex-jobqueue, > org.apache.manifoldcf.locks-_Cache_OUTPUTCONNECTION_Solr, > org.apache.manifoldcf.service-AGENT, > org.apache.manifoldcf.service-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, > org.apache.manifoldcf.resources-stats-reindex-jobqueue, > org.apache.manifoldcf.serviceanon-_OUTPUTCONNECTORPOOL_Solr, > org.apache.manifoldcf.locks-_Cache_JOBSTATUSES, > org.apache.manifoldcf.locks-statslock-analyze-jobqueue, > org.apache.manifoldcf.servicelock-AGENT, > org.apache.manifoldcf.locks-_REPR_TRACKER_LOCK_, > org.apache.manifoldcf.configuration, > org.apache.manifoldcf.servicelock-_OUTPUTCONNECTORPOOL_Solr, > org.apache.manifoldcf.locks-_STUFFERTHREAD_LOCK, > org.apache.manifoldcf.service-_OUTPUTCONNECTORPOOL_Solr, > org.apache.manifoldcf.resources-_REPR_MINDEPTH_, > org.apache.manifoldcf.resources-_STUFFERTHREAD_LASTTIME, > org.apache.manifoldcf.resources-stats-analyze-jobqueue, > org.apache.manifoldcf.locks-_IDFACTORY_, > org.apache.manifoldcf.locks-_JOBRESET_, > org.apache.manifoldcf.servicelock-AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent, > org.apache.manifoldcf.resources-cache-JOBSTATUSES, > org.apache.manifoldcf.locks-_JOBSTOP_, > org.apache.manifoldcf.locks-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr, > zookeeper, org.apache.manifoldcf.resources-_IDFACTORY_, > org.apache.manifoldcf.locks-_Cache_JOB_1404323519962, > org.apache.manifoldcf.locks-_Cache_DB-mcfdb-TBL-outputconnectors, > org.apache.manifoldcf.locks-_JOBRESUME_] > > > Also in clustered setup, i noticed one strange behavior. > > If i created a job on say MCF1 in clustered setup, it is created but not > replicated to MCF2 node. I need to restart MCF2 node to get it replicated > there. Is it OK? > > Please suggest. > > Regards. > > > On Wed, Jul 2, 2014 at 10:49 PM, Karl Wright <[email protected]> wrote: > >> Hi lalit, >> >> Each agents process in a cluster needs its own Id. Please look carefully >> at the multiprocess zookeeper example for details how to do that. If you >> didn't intend for there to be multiple agents processes in one cluster, you >> did something wrong, because that is what you have. >> >> >> Karl >> >> Sent from my Windows Phone >> ------------------------------ >> From: lalit jangra >> Sent: 7/2/2014 2:11 PM >> To: Karl Wright >> Cc: [email protected] >> >> Subject: Re: Zookeeper in Apache ManifoldCF >> >> Hello, >> >> I have configured 3 zookeeper instances on port 2181, 2182, 2183 on my >> server and in mcf/dist/mulitprocess-zk-example i have configured all three >> servers as comma separated list. >> >> Now i have started all three zookeeper instances and i could see all >> three running. Next i tried with a crawl job but in manifoldcf.logs, i can >> see below error. >> >> ERROR 2014-07-02 19:07:15,716 (Agents thread) - Exception tossed: Service >> '' of type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is >> already active >> >> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Service '' of >> type 'AGENT_org.apache.manifoldcf.crawler.system.CrawlerAgent' is already >> active >> >> at >> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:156) >> >> at >> org.apache.manifoldcf.core.lockmanager.BaseLockManager.registerServiceBeginServiceActivity(BaseLockManager.java:120) >> >> at >> org.apache.manifoldcf.core.lockmanager.LockManager.registerServiceBeginServiceActivity(LockManager.java:69) >> >> at >> org.apache.manifoldcf.agents.system.AgentsDaemon.checkAgents(AgentsDaemon.java:270) >> >> at >> org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsThread.run(AgentsDaemon.java:208) >> >> >> How can i validate that these errors are not related to zookeeper or not? >> Also how to know if MCF is integrated with zookeeper. >> >> >> Regards. >> >> >> >> On Tue, Jul 1, 2014 at 3:19 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Lalit, >>> >>> I presumed in my recommendation that your "active" and "passive" >>> manifoldcf instances were using the same PostgreSQL server, but were using >>> different database instances within it. That is the only way it could >>> reasonable work. >>> >>> Any time you have a Zookeeper cluster, they recommend you have three >>> instances. Effectively you are setting up two ManifoldCF clusters: an >>> "active" one, and a "passive" one. Each one has its own database instance >>> within PostgreSQL, and each one (if it is multiprocess) should have 3 >>> zookeeper instances. >>> >>> I hope this is clear. >>> >>> Karl >>> >>> >>> >>> On Tue, Jul 1, 2014 at 9:54 AM, lalit jangra <[email protected]> >>> wrote: >>> >>>> Thanks Karl, >>>> >>>> I have a little variation here and this is about having both MCF nodes >>>> in Active/Active nodes pointing to same DB, so still Zookeeper is required? >>>> >>>> Also does it mean by " two sets of three zookeeper machines", i need >>>> to setup three zookeepers onto each node so total 6 zookepeer node here >>>> working on both machine in same ensamble? >>>> >>>> Regards. >>>> >>>> >>>> On Mon, Jun 30, 2014 at 6:50 PM, Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Lalit, >>>>> >>>>> You can keep things really simple by having both active and passive >>>>> mcf instances run each as a single process, either under jetty or using >>>>> the >>>>> combined war under tomcat. If that is not acceptable, you would need two >>>>> sets of three zookeeper machines, one set for each instance. >>>>> >>>>> Karl >>>>> >>>>> Sent from my Windows Phone >>>>> ------------------------------ >>>>> From: lalit jangra >>>>> Sent: 6/30/2014 12:19 PM >>>>> To: [email protected] >>>>> Subject: Re: Zookeeper in Apache ManifoldCF >>>>> >>>>> Thanks Karl & Graeme, >>>>> >>>>> Let me elaborate my scenario and what i am trying to achieve. >>>>> >>>>> I have two servers each running MCF 1.5.1 individually. But both of >>>>> them are backed by same PostGreSQL DB so both of MCF applications are >>>>> pointing to same DB at any point of time, without having their own >>>>> dedicated DBs. Next, primary/active DB instance is backed up with >>>>> periodical backups from active to passive instance. >>>>> >>>>> Only one DB instance will be active at any time, with other DB >>>>> instance acting as active standby. In case of breakdown of primary/active >>>>> instance, passive/secondary will take over and becomes primary/active >>>>> instance handling all DB transactions, thus making primary as new >>>>> secondary >>>>> DB instance. >>>>> >>>>> Similarly i have two solr 4.6 instances which act in active/passive >>>>> mode with periodic backup of active/primary to passive/secondary with >>>>> active standby and failover. >>>>> >>>>> So my intention of clustering is high availability of system with >>>>> failover but i will not use both of MCF instances parallely or >>>>> simultaneously. >>>>> >>>>> Finally i am limited to having two instances only but as mentioned >>>>> earlier, we need at least three Zookeeper instances for a proper Zookeeper >>>>> clustering. >>>>> >>>>> Is it still worthy to go and use Zookeeper or i can do simple >>>>> clustering where each of MCF node is clustered using same DB. Please >>>>> suggest. >>>>> >>>>> Thanks for help. >>>>> >>>>> Regards. >>>>> >>>>> >>>>> On Fri, Jun 27, 2014 at 11:15 AM, Graeme Seaton <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Lalit, >>>>>> >>>>>> For production use, you will want to spin up your own ZK cluster >>>>>> using the instructions on the zookeeper site (as pointed out earlier at >>>>>> least 3 is recommended).... >>>>>> >>>>>> You then need to modify the properties.xml file in >>>>>> multiprocess-zk-example to point to the list of Zookeeper servers. You >>>>>> also need to modify properties-global.xml with the appropriate global >>>>>> settings i.e. logging levels, Postgresql database etc. and then run >>>>>> setglobalproperties.sh to register the settings in ZK. >>>>>> >>>>>> To test that is working, set up a crawl and then tail the >>>>>> manifoldcf.log file on each of your nodes to check that they are all >>>>>> crawling in parallel. >>>>>> >>>>>> HTH, >>>>>> >>>>>> Graeme >>>>>> >>>>>> >>>>>> On 25/06/14 12:19, Karl Wright wrote: >>>>>> >>>>>> Hi Lalit, >>>>>> >>>>>> Zookeeper does not use a database; it keeps its stuff in the local >>>>>> file system. Each Zookeeper node has its own local data, and everything >>>>>> else is socket communication between them. >>>>>> >>>>>> As for information: http://zookeeper.apache.org/ >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 25, 2014 at 6:56 AM, lalit jangra < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Karl, >>>>>>> >>>>>>> Apologies as i am not very familiar with Zookeeper and trying to >>>>>>> figure out on same. >>>>>>> >>>>>>> Is there any more documentation/pointers available for same as that >>>>>>> would be more helpful. >>>>>>> >>>>>>> Also i have 2 tomcat servers in cluster, each having MCF 1.5.1 >>>>>>> setup and configured to point to same PostGreSQL DB & DB is backed up >>>>>>> for >>>>>>> failover. From your inputs, it seems that we need to configure a >>>>>>> separate >>>>>>> standalone Zookeeper server which will act as Master and both nodes in >>>>>>> cluster will need to work as slaves and talk to standalone Zookeeper >>>>>>> master. >>>>>>> >>>>>>> Also the Zookeeper server will have its own DB so either we can >>>>>>> host it separately or we can use same Postgres DB? >>>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 25, 2014 at 11:33 AM, Karl Wright <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Lalit, >>>>>>>> >>>>>>>> 1. zookeeper is already spun into MCF. in fact you start a >>>>>>>> zookeeper instance when you run the mcf zookeeper example. They >>>>>>>> recommend, >>>>>>>> though, that for failover you have 3 instances, etc. >>>>>>>> 2. Looks like the documentation is out of date and something old >>>>>>>> is left in there. >>>>>>>> 3. Zookeeper is a client/server kind of arrangement. You need at >>>>>>>> least ONE zookeeper server, and each cluster member includes a >>>>>>>> zookeeper >>>>>>>> client, which is configured to talk with ALL the zookeeper server >>>>>>>> instances >>>>>>>> you have. >>>>>>>> 4. There is ONE database instance; the instance may be supported >>>>>>>> by failover and redundant Postgresql, but it appears as one instance. >>>>>>>> TO >>>>>>>> get failover from Postgres you need the Enterprise Edition, which costs >>>>>>>> money. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 25, 2014 at 4:47 AM, lalit jangra < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Karl, >>>>>>>>> >>>>>>>>> That was helpful. >>>>>>>>> >>>>>>>>> I am setting clustered setup on Tomcats as i was following >>>>>>>>> instructions @ >>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Simplified+multi-process+model+using+ZooKeeper-based+synchronization >>>>>>>>> and i need some suggestions here. >>>>>>>>> >>>>>>>>> 1. Do we need to download zookeeper and put it in >>>>>>>>> multiprocess-zk-example folder or it is already spun into MCF and we >>>>>>>>> are >>>>>>>>> good to go? >>>>>>>>> 2. It says all jars under *processes *should be put into >>>>>>>>> classpath but i can not see any *processes *folder under MCF? >>>>>>>>> 3. Do we need to setup Zookeeper on both nodes or only at one >>>>>>>>> node, i assume we need to do on both nodes ? >>>>>>>>> 4. Do we also need to setup databases separately on both nodes >>>>>>>>> again. Also can we setup Zookeeper DB using same PostGreSQL or it >>>>>>>>> will use >>>>>>>>> its own HSQL DB? >>>>>>>>> >>>>>>>>> Finally how can i test that my Zookeeper is setp and ready to >>>>>>>>> roll? >>>>>>>>> >>>>>>>>> Thanks for your help. >>>>>>>>> >>>>>>>>> Regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 24, 2014 at 1:56 PM, Karl Wright <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Lalit, >>>>>>>>>> ZooKeeper is standard for cluster deployments these days. See >>>>>>>>>> the multiprocess-zookeeper example for ideas about how to deploy it. >>>>>>>>>> It's >>>>>>>>>> also important to read the how-to-build-and-deploy page to >>>>>>>>>> understand the >>>>>>>>>> example. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 24, 2014 at 8:04 AM, lalit jangra < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am planning to use MCF in cluster mode. For same, i want to >>>>>>>>>>> know if Zookeeper is of any help here? >>>>>>>>>>> >>>>>>>>>>> If yes, how can it be leveraged in distributed MCF servers? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Lalit Jangra. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Lalit Jangra. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Lalit Jangra. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Lalit Jangra. >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Lalit Jangra. >>>> >>> >>> >> >> >> -- >> Regards, >> Lalit Jangra. >> > > > > -- > Regards, > Lalit Jangra. > -- Regards, Lalit Jangra.
