[jira] [Updated] (CONNECTORS-173) Table Entity on javadoc

2011-03-31 Thread Shinichiro Abe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichiro Abe updated CONNECTORS-173:
--

Attachment: CONNECTORS-173-webcrawler.patch
CONNECTORS-173-crawler.patch
CONNECTORS-173-authorities.patch
CONNECTORS-173-agents.patch

Hello.
Based on MCF_Tables.xls, I've created patches.
Please check and confirm patches.
Thank you.

 Table Entity on javadoc
 ---

 Key: CONNECTORS-173
 URL: https://issues.apache.org/jira/browse/CONNECTORS-173
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation
Affects Versions: ManifoldCF next
Reporter: Shinichiro Abe
Priority: Minor
  Labels: documentation
 Fix For: ManifoldCF next

 Attachments: CONNECTORS-173-agents.patch, 
 CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, 
 CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png

   Original Estimate: 24h
  Remaining Estimate: 24h

 Proposal:
 MCF manages about 20 tables.
 I want to check the database management through seeing tables, but now there 
 is almost no explanation in MCF documents.
 So, I think javadoc can explain this, such as example description below.
 It can help users know the relation on manager class and table, and the 
 relationship between tables, I think. 
 May I add the javadoc code for each manager classes?
 Related tables that will modify are in this attachment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-173) Table Entity on javadoc

2011-03-31 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013910#comment-13013910
 ] 

Karl Wright commented on CONNECTORS-173:


Offhand, these look good.  I'll have to see what the generated javadoc HTML 
looks like though.



 Table Entity on javadoc
 ---

 Key: CONNECTORS-173
 URL: https://issues.apache.org/jira/browse/CONNECTORS-173
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation
Affects Versions: ManifoldCF next
Reporter: Shinichiro Abe
Priority: Minor
  Labels: documentation
 Fix For: ManifoldCF next

 Attachments: CONNECTORS-173-agents.patch, 
 CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, 
 CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png

   Original Estimate: 24h
  Remaining Estimate: 24h

 Proposal:
 MCF manages about 20 tables.
 I want to check the database management through seeing tables, but now there 
 is almost no explanation in MCF documents.
 So, I think javadoc can explain this, such as example description below.
 It can help users know the relation on manager class and table, and the 
 relationship between tables, I think. 
 May I add the javadoc code for each manager classes?
 Related tables that will modify are in this attachment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CONNECTORS-173) Table Entity on javadoc

2011-03-31 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013964#comment-13013964
 ] 

Karl Wright commented on CONNECTORS-173:


I'll be pushing the newer javadocs out to the web site this evening.

 Table Entity on javadoc
 ---

 Key: CONNECTORS-173
 URL: https://issues.apache.org/jira/browse/CONNECTORS-173
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation
Affects Versions: ManifoldCF next
Reporter: Shinichiro Abe
Assignee: Karl Wright
Priority: Minor
  Labels: documentation
 Fix For: ManifoldCF next

 Attachments: CONNECTORS-173-agents.patch, 
 CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, 
 CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png

   Original Estimate: 24h
  Remaining Estimate: 24h

 Proposal:
 MCF manages about 20 tables.
 I want to check the database management through seeing tables, but now there 
 is almost no explanation in MCF documents.
 So, I think javadoc can explain this, such as example description below.
 It can help users know the relation on manager class and table, and the 
 relationship between tables, I think. 
 May I add the javadoc code for each manager classes?
 Related tables that will modify are in this attachment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Filtering out unwanted content from HTML pages

2011-03-31 Thread Karl Wright
This is a good question.  I think we should carry this conversation
forward on connectors-dev.

My initial thought on this issue is that the functionality really
belongs in Tika.  Tika is set up to extract and filter in exactly this
way.  The only reason you'd want to do it in MCF is if it would change
the links you might extract (or, skip), and that seems to me less
interesting.  How do you feel about it?

Karl

On Thu, Mar 31, 2011 at 10:41 AM, Erlend Garåsen
e.f.gara...@usit.uio.no wrote:

 All major commercial search engines are shipped with a web crawler which
 allows one to filter out unwanted content, such as certain html blocks,
 comments etc. Would it be advisable to add such a functionality to MCF? Or
 will it be difficult to implement since the idea behind the
 ExtractingRequestHandler is to send binary files to Solr?

 Say that you have an HTML document which includes the following comments:
 !-- stop indexing --
 !-- start indexing --
 All content within these comments should then be skipped from the index.

 I managed to rewrite Apache Nutch in order to add this functionality for
 some months ago.

 Erlend

 --
 Erlend Garåsen
 Center for Information Technology Services
 University of Oslo
 P.O. Box 1086 Blindern, N-0317 OSLO, Norway
 Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



Re: [VOTE] Release ManifoldCF-0.2-incubating

2011-03-31 Thread Koji Sekiguchi

(11/03/21 9:32), Karl Wright wrote:

The tag is 
https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0.
  You can download the candidate from
http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating.


Hi Karl,

I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did
ant test, but I got Database exception: Exception doing query: No current 
connection.
exception during the test. I'm not sure it is same as CONNECTORS-172.

Are you digging in the issue and planning to respin the RC?

Koji
--
http://www.rondhuit.com/en/


Re: [VOTE] Release ManifoldCF-0.2-incubating

2011-03-31 Thread Karl Wright
Yes, this does sound like CONNECTORS-172.  It's intermittent; if you
run the tests again it will likely pass.

The problem appeared with either the new Derby or the new Jetty.
Jetty changed its shutdown so that I can no longer guarantee that it
shuts down before the rest of the process shuts down - it could be
related to that.  The Derby changes involve having more queries (so
that they use indexes rather than scans), so it is possible that
something is taking a bit longer than before and causing an occasional
timeout.

I've not seen any evidence that the problem is other than test
related, though.  But I think it's worth trying to understand the
issue more completely before deciding whether to respin or accept the
RC as it stands.

Karl


On Thu, Mar 31, 2011 at 12:50 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 (11/03/21 9:32), Karl Wright wrote:

 The tag is
 https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0.
  You can download the candidate from
 http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating.

 Hi Karl,

 I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did
 ant test, but I got Database exception: Exception doing query: No current
 connection.
 exception during the test. I'm not sure it is same as CONNECTORS-172.

 Are you digging in the issue and planning to respin the RC?

 Koji
 --
 http://www.rondhuit.com/en/



[jira] [Commented] (CONNECTORS-172) Intermittent test failures

2011-03-31 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014098#comment-13014098
 ] 

Karl Wright commented on CONNECTORS-172:


Seems to be a Derby problem.  DERBY-5169 created.


 Intermittent test failures
 --

 Key: CONNECTORS-172
 URL: https://issues.apache.org/jira/browse/CONNECTORS-172
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tests
Affects Versions: ManifoldCF 0.2
Reporter: Karl Wright
Priority: Minor

 The Derby filesystem end-to-end tests sometimes randomly fail with a 
 Database error: No existing connection error during a job status wait.  Not 
 sure what's happening here, but they succeed much of the time.  There's not 
 much of a hint beyond the stack trace.  The message seems to be coming from 
 Derby, and may be the result of a too-short wait time limit, a race condition 
 in the test itself, or something else entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CONNECTORS-172) Intermittent test failures

2011-03-31 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-172:
---

Priority: Blocker  (was: Minor)

 Intermittent test failures
 --

 Key: CONNECTORS-172
 URL: https://issues.apache.org/jira/browse/CONNECTORS-172
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tests
Affects Versions: ManifoldCF 0.2
Reporter: Karl Wright
Priority: Blocker

 The Derby filesystem end-to-end tests sometimes randomly fail with a 
 Database error: No existing connection error during a job status wait.  Not 
 sure what's happening here, but they succeed much of the time.  There's not 
 much of a hint beyond the stack trace.  The message seems to be coming from 
 Derby, and may be the result of a too-short wait time limit, a race condition 
 in the test itself, or something else entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release ManifoldCF-0.2-incubating

2011-03-31 Thread Karl Wright
Looking at it in depth, it's likely that it's a Derby bug.  I've filed
ticket DERBY-5169 accordingly, and marked CONNECTORS-172 as a blocker
for 0.2-incubating.

Karl

On Thu, Mar 31, 2011 at 1:11 PM, Karl Wright daddy...@gmail.com wrote:
 Yes, this does sound like CONNECTORS-172.  It's intermittent; if you
 run the tests again it will likely pass.

 The problem appeared with either the new Derby or the new Jetty.
 Jetty changed its shutdown so that I can no longer guarantee that it
 shuts down before the rest of the process shuts down - it could be
 related to that.  The Derby changes involve having more queries (so
 that they use indexes rather than scans), so it is possible that
 something is taking a bit longer than before and causing an occasional
 timeout.

 I've not seen any evidence that the problem is other than test
 related, though.  But I think it's worth trying to understand the
 issue more completely before deciding whether to respin or accept the
 RC as it stands.

 Karl


 On Thu, Mar 31, 2011 at 12:50 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 (11/03/21 9:32), Karl Wright wrote:

 The tag is
 https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0.
  You can download the candidate from
 http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating.

 Hi Karl,

 I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did
 ant test, but I got Database exception: Exception doing query: No current
 connection.
 exception during the test. I'm not sure it is same as CONNECTORS-172.

 Are you digging in the issue and planning to respin the RC?

 Koji
 --
 http://www.rondhuit.com/en/