[jira] [Created] (CONNECTORS-201) Carrydown methods should have their own interface class
Carrydown methods should have their own interface class --- Key: CONNECTORS-201 URL: https://issues.apache.org/jira/browse/CONNECTORS-201 Project: ManifoldCF Issue Type: Improvement Components: Framework crawler agent Affects Versions: ManifoldCF 0.3 Reporter: Karl Wright Priority: Minor The carrydown methods are shared in IVersionActivity and IProcessActivity. They ought to have their own interface. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-201) Carrydown methods should have their own interface class
[ https://issues.apache.org/jira/browse/CONNECTORS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-201. Resolution: Fixed Fix Version/s: ManifoldCF 0.3 r1126427 Carrydown methods should have their own interface class --- Key: CONNECTORS-201 URL: https://issues.apache.org/jira/browse/CONNECTORS-201 Project: ManifoldCF Issue Type: Improvement Components: Framework crawler agent Affects Versions: ManifoldCF 0.3 Reporter: Karl Wright Priority: Minor Fix For: ManifoldCF 0.3 The carrydown methods are shared in IVersionActivity and IProcessActivity. They ought to have their own interface. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
poll method
What is the interval when poll is getting called? I recall reading that connect is not a place for a connector to connect to a external system. In my design I'm connecting to a single http server. So it just seems right to establish that when connect method is called and check it during the poll method. Thoughts? Thanks!
Re: poll method
The preferred way to set up connections is by having all methods that require a set-up connection to call a getSession() method. This is in fact pretty much enforced by the fact that connect() cannot throw a ManifoldCFException. Chapter 6 of ManifoldCF in Action describes the preferred form via a detailed example. The poll() method should be mainly to simply expire connections that have outlived their time, as determined by whatever expiration time your connector has recorded for the connection. Thus, you should not really need to know how often it is called. Suffice it to say it is something on the order of a minute to five minutes. Besides, it is only called when the connector class instance is sitting idle in the connection pool, not when it is actively in use. All of this described in Chapter 6. Thanks, Karl On Mon, May 23, 2011 at 3:30 PM, ho...@farzad.net wrote: What is the interval when poll is getting called? I recall reading that connect is not a place for a connector to connect to a external system. In my design I'm connecting to a single http server. So it just seems right to establish that when connect method is called and check it during the poll method. Thoughts? Thanks!
Re: poll method
Thanks for the clarification, my understanding is still muddy. Let me ask a Logging question, so for I had attached an id object to the thread context so that I can get a view of what is going on. While I know that is not correct to do, how do you see the log lines for a certain instance of a connector since we have a multi threaded system. I was using my made up id to sort out say all the log lines for the connector that has thread 3 set as its context. On Mon, 23 May 2011 15:45:26 -0400, Karl Wright daddy...@gmail.com wrote: The preferred way to set up connections is by having all methods that require a set-up connection to call a getSession() method. This is in fact pretty much enforced by the fact that connect() cannot throw a ManifoldCFException. Chapter 6 of ManifoldCF in Action describes the preferred form via a detailed example. The poll() method should be mainly to simply expire connections that have outlived their time, as determined by whatever expiration time your connector has recorded for the connection. Thus, you should not really need to know how often it is called. Suffice it to say it is something on the order of a minute to five minutes. Besides, it is only called when the connector class instance is sitting idle in the connection pool, not when it is actively in use. All of this described in Chapter 6. Thanks, Karl On Mon, May 23, 2011 at 3:30 PM, ho...@farzad.net wrote: What is the interval when poll is getting called? I recall reading that connect is not a place for a connector to connect to a external system. In my design I'm connecting to a single http server. So it just seems right to establish that when connect method is called and check it during the poll method. Thoughts? Thanks!
[jira] [Commented] (CONNECTORS-19) Look into converting SOLR connector to use SolrJ java library
[ https://issues.apache.org/jira/browse/CONNECTORS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038327#comment-13038327 ] Jan Høydahl commented on CONNECTORS-19: --- Any progress on this? I'd like to see a Solr outputConnector with MultiThread support (StreamingUpdateSolrServer) Look into converting SOLR connector to use SolrJ java library - Key: CONNECTORS-19 URL: https://issues.apache.org/jira/browse/CONNECTORS-19 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Reporter: Karl Wright Priority: Minor The SOLR connector currently uses its own multipart post code. It might be a good idea to convert it to use the SolrJ client api jar instead. This would require license confirmation, plus research to make sure there are no jar conflicts as a result, with any other connector. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CONNECTORS-202) SOLR connector suport for commitWithin
SOLR connector suport for commitWithin -- Key: CONNECTORS-202 URL: https://issues.apache.org/jira/browse/CONNECTORS-202 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Affects Versions: ManifoldCF 0.2 Reporter: Jan Høydahl The output connection must support commitWithin (http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22) in addition to sending a commit() at the end of a job. This allows for efficient handling of commits on the Solr side. The parameter should ideally be configurable per job. In that way you could say that for Important job commitWithin=10s while for Big crawl job, commitWithin=600s. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-19) Look into converting SOLR connector to use SolrJ java library
[ https://issues.apache.org/jira/browse/CONNECTORS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038360#comment-13038360 ] Karl Wright commented on CONNECTORS-19: --- The promised patch never materialized. One point, though, is that ManifoldCF is not single-threaded in any case, so you'd be unlikely to gain much in performance by going multithread on an already multi-threaded connector implementation. The current connector can maintain and use as many connections to Solr as you tell it. Memory buffering on the client side also is not a good idea because it violates the basic ManifoldCF principle that you can safely shut down and restart ManifoldCF at any time without loss. Solr also suffers from lack of a guaranteed delivery metaphor, which I've talked to the Solr team about in the past. The Solr commit model currently does not work this way but ManifoldCF really requires it, because without it there is no way to properly implement an incremental crawler. This would mean a significant new Solr feature. Look into converting SOLR connector to use SolrJ java library - Key: CONNECTORS-19 URL: https://issues.apache.org/jira/browse/CONNECTORS-19 Project: ManifoldCF Issue Type: Improvement Components: Lucene/SOLR connector Reporter: Karl Wright Priority: Minor The SOLR connector currently uses its own multipart post code. It might be a good idea to convert it to use the SolrJ client api jar instead. This would require license confirmation, plus research to make sure there are no jar conflicts as a result, with any other connector. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira