[jira] [Created] (CONNECTORS-201) Carrydown methods should have their own interface class

2011-05-23 Thread Karl Wright (JIRA)
Carrydown methods should have their own interface class
---

 Key: CONNECTORS-201
 URL: https://issues.apache.org/jira/browse/CONNECTORS-201
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Framework crawler agent
Affects Versions: ManifoldCF 0.3
Reporter: Karl Wright
Priority: Minor


The carrydown methods are shared in IVersionActivity and IProcessActivity.  
They ought to have their own interface.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CONNECTORS-201) Carrydown methods should have their own interface class

2011-05-23 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-201.


   Resolution: Fixed
Fix Version/s: ManifoldCF 0.3

r1126427


 Carrydown methods should have their own interface class
 ---

 Key: CONNECTORS-201
 URL: https://issues.apache.org/jira/browse/CONNECTORS-201
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Framework crawler agent
Affects Versions: ManifoldCF 0.3
Reporter: Karl Wright
Priority: Minor
 Fix For: ManifoldCF 0.3


 The carrydown methods are shared in IVersionActivity and IProcessActivity.  
 They ought to have their own interface.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


poll method

2011-05-23 Thread hokie
What is the interval when poll is getting called?  I recall reading 
that connect is not a place for a connector to connect to a external 
system.  In my design I'm connecting to a single http server.  So it 
just seems right to establish that when connect method is called and 
check it during the poll method.  Thoughts?  Thanks!


Re: poll method

2011-05-23 Thread Karl Wright
The preferred way to set up connections is by having all methods that
require a set-up connection to call a getSession() method.  This is in
fact pretty much enforced by the fact that connect() cannot throw a
ManifoldCFException.  Chapter 6 of ManifoldCF in Action describes the
preferred form via a detailed example.  The poll() method should be
mainly to simply expire connections that have outlived their time, as
determined by whatever expiration time your connector has recorded for
the connection.  Thus, you should not really need to know how often it
is called.  Suffice it to say it is something on the order of a minute
to five minutes.  Besides, it is only called when the connector class
instance is sitting idle in the connection pool, not when it is
actively in use.  All of this described in Chapter 6.

Thanks,
Karl

On Mon, May 23, 2011 at 3:30 PM,  ho...@farzad.net wrote:
 What is the interval when poll is getting called?  I recall reading that
 connect is not a place for a connector to connect to a external system.  In
 my design I'm connecting to a single http server.  So it just seems right to
 establish that when connect method is called and check it during the poll
 method.  Thoughts?  Thanks!



Re: poll method

2011-05-23 Thread hokie
Thanks for the clarification, my understanding is still muddy.  Let me 
ask a Logging question, so for I had attached an id object to the thread 
context so that I can get a view of what is going on.  While I know that 
is not correct to do, how do you see the log lines for a certain 
instance of a connector since we have a multi threaded system.  I was 
using my made up id to sort out say all the log lines for the connector 
that has thread 3 set as its context.


On Mon, 23 May 2011 15:45:26 -0400, Karl Wright daddy...@gmail.com 
wrote:

The preferred way to set up connections is by having all methods that
require a set-up connection to call a getSession() method.  This is 
in

fact pretty much enforced by the fact that connect() cannot throw a
ManifoldCFException.  Chapter 6 of ManifoldCF in Action describes the
preferred form via a detailed example.  The poll() method should be
mainly to simply expire connections that have outlived their time, as
determined by whatever expiration time your connector has recorded 
for
the connection.  Thus, you should not really need to know how often 
it
is called.  Suffice it to say it is something on the order of a 
minute

to five minutes.  Besides, it is only called when the connector class
instance is sitting idle in the connection pool, not when it is
actively in use.  All of this described in Chapter 6.

Thanks,
Karl

On Mon, May 23, 2011 at 3:30 PM,  ho...@farzad.net wrote:
What is the interval when poll is getting called?  I recall reading 
that
connect is not a place for a connector to connect to a external 
system.  In
my design I'm connecting to a single http server.  So it just seems 
right to
establish that when connect method is called and check it during the 
poll

method.  Thoughts?  Thanks!





[jira] [Commented] (CONNECTORS-19) Look into converting SOLR connector to use SolrJ java library

2011-05-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CONNECTORS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038327#comment-13038327
 ] 

Jan Høydahl commented on CONNECTORS-19:
---

Any progress on this? I'd like to see a Solr outputConnector with MultiThread 
support (StreamingUpdateSolrServer)

 Look into converting SOLR connector to use SolrJ java library
 -

 Key: CONNECTORS-19
 URL: https://issues.apache.org/jira/browse/CONNECTORS-19
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Reporter: Karl Wright
Priority: Minor

 The SOLR connector currently uses its own multipart post code.  It might be a 
 good idea to convert it to use the SolrJ client api jar instead.  This would 
 require license confirmation, plus research to make sure there are no jar 
 conflicts as a result, with any other connector.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CONNECTORS-202) SOLR connector suport for commitWithin

2011-05-23 Thread JIRA
SOLR connector suport for commitWithin
--

 Key: CONNECTORS-202
 URL: https://issues.apache.org/jira/browse/CONNECTORS-202
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Affects Versions: ManifoldCF 0.2
Reporter: Jan Høydahl


The output connection must support commitWithin 
(http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22)
 in addition to sending a commit() at the end of a job.

This allows for efficient handling of commits on the Solr side.

The parameter should ideally be configurable per job. In that way you could say 
that for Important job commitWithin=10s while for Big crawl job, 
commitWithin=600s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-19) Look into converting SOLR connector to use SolrJ java library

2011-05-23 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038360#comment-13038360
 ] 

Karl Wright commented on CONNECTORS-19:
---

The promised patch never materialized.

One point, though, is that ManifoldCF is not single-threaded in any case, so 
you'd be unlikely to gain much in performance by going multithread on an 
already multi-threaded connector implementation.  The current connector can 
maintain and use as many connections to Solr as you tell it.  Memory buffering 
on the client side also is not a good idea because it violates the basic 
ManifoldCF principle that you can safely shut down and restart ManifoldCF at 
any time without loss.

Solr also suffers from lack of a guaranteed delivery metaphor, which I've 
talked to the Solr team about in the past.  The Solr commit model currently 
does not work this way but ManifoldCF really requires it, because without it 
there is no way to properly implement an incremental crawler.  This would mean 
a significant new Solr feature.


 Look into converting SOLR connector to use SolrJ java library
 -

 Key: CONNECTORS-19
 URL: https://issues.apache.org/jira/browse/CONNECTORS-19
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Reporter: Karl Wright
Priority: Minor

 The SOLR connector currently uses its own multipart post code.  It might be a 
 good idea to convert it to use the SolrJ client api jar instead.  This would 
 require license confirmation, plus research to make sure there are no jar 
 conflicts as a result, with any other connector.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira