[jira] [Resolved] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-12-03 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1546.
-
Resolution: Fixed

> Optimize Elasticsearch performance by removing 'forcemerge'
> ---
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Reporter: Hans Van Goethem
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the 
> Elasticsearch index. This optimization makes the Elastic faster for 
> read-operations but not for write-opeartions. On the contrary, performance on 
> the write operations becomes worse after every forcemerge. 
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for 
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against 
> Elasticsearch directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Manifold 2.10

2018-12-03 Thread Karl Wright
You can just change the setup provided you point to the same database.

Thanks,
Karl


On Mon, Dec 3, 2018 at 9:57 AM krishna agrawal  wrote:

> thanks Karl,
>
> I have deployed in my local as simple example and in Dev and QA with the
> recomendation of Dev Ops team we deployed as multiprocess file example we
> had brief discussion about considering multiprocess-zk-example and at that
> time we were unsure of multiprocess-zk-example.
>
> But i will check and let you know if we can change the set up now?
>
> One question do we need to do fresh install or can we upgrade to
> multiprocess-zk-example?
>
> Thanks for anticipation.
>
> Thanks,
> Krishna A
>
> On Sat, Dec 1, 2018 at 3:05 PM Karl Wright  wrote:
>
> > Another thing: it's quite important to guarantee a working setup here,
> > otherwise you're just wasting everyone's time.  So, please base your
> > installation on the multiprocess-zk-example.  Start off by running the
> > example as is, on a small test crawl.  Once you know how it works, then
> > move next to changing only what you have to -- namely, the database
> > properties in the global properties file, to point to your MySQL
> instance.
> > Try that also on a small test case (crawl some files for instance),
> before
> > trying it on your large case.  Every step of the way should work, and if
> it
> > doesn't, figure out why not before you move onto the next step.
> >
> > Thanks,
> > Karl
> >
> >
> > On Sat, Dec 1, 2018 at 2:59 PM Karl Wright  wrote:
> >
> > > Zookeeper does not require a locking directory.  It is a process that
> > > synchronizes other processes, and they connect to it by port.
> > >
> > > Karl
> > >
> > >
> > > On Sat, Dec 1, 2018 at 2:55 PM krishna agrawal 
> > > wrote:
> > >
> > >> Thanks for the information.
> > >> if we use Zookeeper how can we make sure all our ManifoldCF processes
> > use
> > >> same locking directory does it can be done at the configuration level
> > >> while
> > >> installing.
> > >>
> > >> thanks,
> > >> Krishna A
> > >>
> > >> On Sat, Dec 1, 2018 at 1:39 PM Karl Wright 
> wrote:
> > >>
> > >> > That error is the result of the database not managing transactions
> > >> > properly.  It can occur if the locking system is not set up
> properly,
> > >> or if
> > >> > you are using multiple agents processes and each process does not
> have
> > >> its
> > >> > own ID.  We have also seen it reported before just because MySQL
> seems
> > >> to
> > >> > have bugs and sometimes writes are delayed or don't go through.
> > >> >
> > >> > My recommendation would be to:
> > >> > (1) use zookeeper, not file locking
> > >> > (2) Make sure all your ManifoldCF processes use the SAME locking
> > >> directory
> > >> > or Zookeeper instance
> > >> > (3) If you are using multiple agents process, be certain that each
> > such
> > >> > process gets its own ID (as is done in the examples).
> > >> >
> > >> > Karl
> > >> >
> > >> >
> > >> > On Sat, Dec 1, 2018 at 11:43 AM krishna agrawal <
> krish.a...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > Thanks Karl,
> > >> > >
> > >> > > I will take a look at it
> > >> > >
> > >> > > But there is the error keep on tossing at manifold log
> > >> > >
> > >> > > ERROR 2018-12-01T11:13:26,297 (Job reset thread) - Exception
> tossed:
> > >> > > Unexpected job status encountered: 33
> > >> > > org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> > Unexpected
> > >> job
> > >> > > status encountered: 33
> > >> > > at
> > >> > >
> > >>
> > org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
> > >> > > ~[mcf-pull-agent.jar:?]
> > >> > > at
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8449)
> > >> > > ~[mcf-pull-agent.jar:?]
> > >> > > at
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
> > >> > > [mcf-pull-agent.jar:?]
> > >> > >
> > >> > > Thanks,
> > >> > > Krishna A
> > >> > >
> > >> > >
> > >> > > On Fri, Nov 30, 2018 at 7:00 PM Karl Wright 
> > >> wrote:
> > >> > >
> > >> > > > Hi Krishna,
> > >> > > >
> > >> > > > First of all I suggest that you *not* use
> > multiprocess-file-example,
> > >> > and
> > >> > > > instead use multiprocess-zk-example.
> > >> > > >
> > >> > > > Your symptoms suggest many possibilities.  But if you move to
> > >> Zookeeper
> > >> > > we
> > >> > > > will be able to eliminate dangling file locks as a complication.
> > So
> > >> > > please
> > >> > > > do that first.
> > >> > > >
> > >> > > > Karl
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Nov 30, 2018 at 6:29 PM krishna agrawal <
> > >> krish.a...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Yeah in our local set up we did Simple example but in  server
> we
> > >> did
> > >> > > > > multiprocess-file-example are you suggesting us to upgrade
> from
> > >> 2.10
> > >> > to
> > >> > > > > 2.11 ?
> > >> > > > >
> > >> > > > > and we are using MY Sql database ,
> > >> > 

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-12-03 Thread Steph van Schalkwyk (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707411#comment-16707411
 ] 

Steph van Schalkwyk commented on CONNECTORS-1546:
-

That's in the codebase I sent to you.
All removed. Also don't need the ES Version anymore as that was the ony
thing that it was used for.





> Optimize Elasticsearch performance by removing 'forcemerge'
> ---
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Reporter: Hans Van Goethem
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the 
> Elasticsearch index. This optimization makes the Elastic faster for 
> read-operations but not for write-opeartions. On the contrary, performance on 
> the write operations becomes worse after every forcemerge. 
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for 
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against 
> Elasticsearch directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Manifold 2.10

2018-12-03 Thread krishna agrawal
thanks Karl,

I have deployed in my local as simple example and in Dev and QA with the
recomendation of Dev Ops team we deployed as multiprocess file example we
had brief discussion about considering multiprocess-zk-example and at that
time we were unsure of multiprocess-zk-example.

But i will check and let you know if we can change the set up now?

One question do we need to do fresh install or can we upgrade to
multiprocess-zk-example?

Thanks for anticipation.

Thanks,
Krishna A

On Sat, Dec 1, 2018 at 3:05 PM Karl Wright  wrote:

> Another thing: it's quite important to guarantee a working setup here,
> otherwise you're just wasting everyone's time.  So, please base your
> installation on the multiprocess-zk-example.  Start off by running the
> example as is, on a small test crawl.  Once you know how it works, then
> move next to changing only what you have to -- namely, the database
> properties in the global properties file, to point to your MySQL instance.
> Try that also on a small test case (crawl some files for instance), before
> trying it on your large case.  Every step of the way should work, and if it
> doesn't, figure out why not before you move onto the next step.
>
> Thanks,
> Karl
>
>
> On Sat, Dec 1, 2018 at 2:59 PM Karl Wright  wrote:
>
> > Zookeeper does not require a locking directory.  It is a process that
> > synchronizes other processes, and they connect to it by port.
> >
> > Karl
> >
> >
> > On Sat, Dec 1, 2018 at 2:55 PM krishna agrawal 
> > wrote:
> >
> >> Thanks for the information.
> >> if we use Zookeeper how can we make sure all our ManifoldCF processes
> use
> >> same locking directory does it can be done at the configuration level
> >> while
> >> installing.
> >>
> >> thanks,
> >> Krishna A
> >>
> >> On Sat, Dec 1, 2018 at 1:39 PM Karl Wright  wrote:
> >>
> >> > That error is the result of the database not managing transactions
> >> > properly.  It can occur if the locking system is not set up properly,
> >> or if
> >> > you are using multiple agents processes and each process does not have
> >> its
> >> > own ID.  We have also seen it reported before just because MySQL seems
> >> to
> >> > have bugs and sometimes writes are delayed or don't go through.
> >> >
> >> > My recommendation would be to:
> >> > (1) use zookeeper, not file locking
> >> > (2) Make sure all your ManifoldCF processes use the SAME locking
> >> directory
> >> > or Zookeeper instance
> >> > (3) If you are using multiple agents process, be certain that each
> such
> >> > process gets its own ID (as is done in the examples).
> >> >
> >> > Karl
> >> >
> >> >
> >> > On Sat, Dec 1, 2018 at 11:43 AM krishna agrawal  >
> >> > wrote:
> >> >
> >> > > Thanks Karl,
> >> > >
> >> > > I will take a look at it
> >> > >
> >> > > But there is the error keep on tossing at manifold log
> >> > >
> >> > > ERROR 2018-12-01T11:13:26,297 (Job reset thread) - Exception tossed:
> >> > > Unexpected job status encountered: 33
> >> > > org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Unexpected
> >> job
> >> > > status encountered: 33
> >> > > at
> >> > >
> >>
> org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2145)
> >> > > ~[mcf-pull-agent.jar:?]
> >> > > at
> >> > >
> >> > >
> >> >
> >>
> org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8449)
> >> > > ~[mcf-pull-agent.jar:?]
> >> > > at
> >> > >
> >> > >
> >> >
> >>
> org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:77)
> >> > > [mcf-pull-agent.jar:?]
> >> > >
> >> > > Thanks,
> >> > > Krishna A
> >> > >
> >> > >
> >> > > On Fri, Nov 30, 2018 at 7:00 PM Karl Wright 
> >> wrote:
> >> > >
> >> > > > Hi Krishna,
> >> > > >
> >> > > > First of all I suggest that you *not* use
> multiprocess-file-example,
> >> > and
> >> > > > instead use multiprocess-zk-example.
> >> > > >
> >> > > > Your symptoms suggest many possibilities.  But if you move to
> >> Zookeeper
> >> > > we
> >> > > > will be able to eliminate dangling file locks as a complication.
> So
> >> > > please
> >> > > > do that first.
> >> > > >
> >> > > > Karl
> >> > > >
> >> > > >
> >> > > > On Fri, Nov 30, 2018 at 6:29 PM krishna agrawal <
> >> krish.a...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Yeah in our local set up we did Simple example but in  server we
> >> did
> >> > > > > multiprocess-file-example are you suggesting us to upgrade from
> >> 2.10
> >> > to
> >> > > > > 2.11 ?
> >> > > > >
> >> > > > > and we are using MY Sql database ,
> >> > > > >
> >> > > > > So most of time i saw nothing is running and still it say job is
> >> > > running
> >> > > > > and you have to wait for it to complete.
> >> > > > >
> >> > > > > and restarting also not helping.
> >> > > > >
> >> > > > > Any other solution woould be greatly appreciated.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Krishna A
> >> > > > >
> >> > > > > On Fri, Nov 30, 2018 at 10:50 AM Karl Wright <
> daddy...@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > > > 

[jira] [Commented] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-12-03 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706804#comment-16706804
 ] 

Karl Wright commented on CONNECTORS-1546:
-

Hi [~st...@remcam.net], can you let me know what happened to this?  We're 
trying to get 2.12 ready for completion.  Thanks!!


> Optimize Elasticsearch performance by removing 'forcemerge'
> ---
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Reporter: Hans Van Goethem
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the 
> Elasticsearch index. This optimization makes the Elastic faster for 
> read-operations but not for write-opeartions. On the contrary, performance on 
> the write operations becomes worse after every forcemerge. 
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for 
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against 
> Elasticsearch directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1546) Optimize Elasticsearch performance by removing 'forcemerge'

2018-12-03 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1546:

Fix Version/s: ManifoldCF 2.12

> Optimize Elasticsearch performance by removing 'forcemerge'
> ---
>
> Key: CONNECTORS-1546
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1546
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Reporter: Hans Van Goethem
>Assignee: Steph van Schalkwyk
>Priority: Major
> Fix For: ManifoldCF 2.12
>
>
> After crawling with ManifoldCF, forcemerge is applied to optimize the 
> Elasticsearch index. This optimization makes the Elastic faster for 
> read-operations but not for write-opeartions. On the contrary, performance on 
> the write operations becomes worse after every forcemerge. 
> Can you remove this forcemerge in ManifoldCF to optimize perfomance for 
> recurrent crawling to Elasticsearch?
> If somene needs this forcemerge, it can be applied mannually against 
> Elasticsearch directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1522) Add SSL trust certificates list to ElasticSearch output connector

2018-12-03 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1522.
-
Resolution: Fixed

Still needs testing.  That has been left to [~svanschalkwyk] to complete.

> Add SSL trust certificates list to ElasticSearch output connector
> -
>
> Key: CONNECTORS-1522
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1522
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Elastic Search connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Steph van Schalkwyk
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.12
>
>
> Add "SSL trust certificate list" to Elasticsearch output connector.
> Add User Id, Password functionality to ES output connector.
> Above as per SOLR output connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)