[jira] [Updated] (CONNECTORS-173) Table Entity on javadoc
[ https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichiro Abe updated CONNECTORS-173: -- Attachment: CONNECTORS-173-webcrawler.patch CONNECTORS-173-crawler.patch CONNECTORS-173-authorities.patch CONNECTORS-173-agents.patch Hello. Based on MCF_Tables.xls, I've created patches. Please check and confirm patches. Thank you. Table Entity on javadoc --- Key: CONNECTORS-173 URL: https://issues.apache.org/jira/browse/CONNECTORS-173 Project: ManifoldCF Issue Type: Improvement Components: Documentation Affects Versions: ManifoldCF next Reporter: Shinichiro Abe Priority: Minor Labels: documentation Fix For: ManifoldCF next Attachments: CONNECTORS-173-agents.patch, CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png Original Estimate: 24h Remaining Estimate: 24h Proposal: MCF manages about 20 tables. I want to check the database management through seeing tables, but now there is almost no explanation in MCF documents. So, I think javadoc can explain this, such as example description below. It can help users know the relation on manager class and table, and the relationship between tables, I think. May I add the javadoc code for each manager classes? Related tables that will modify are in this attachment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-173) Table Entity on javadoc
[ https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013910#comment-13013910 ] Karl Wright commented on CONNECTORS-173: Offhand, these look good. I'll have to see what the generated javadoc HTML looks like though. Table Entity on javadoc --- Key: CONNECTORS-173 URL: https://issues.apache.org/jira/browse/CONNECTORS-173 Project: ManifoldCF Issue Type: Improvement Components: Documentation Affects Versions: ManifoldCF next Reporter: Shinichiro Abe Priority: Minor Labels: documentation Fix For: ManifoldCF next Attachments: CONNECTORS-173-agents.patch, CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png Original Estimate: 24h Remaining Estimate: 24h Proposal: MCF manages about 20 tables. I want to check the database management through seeing tables, but now there is almost no explanation in MCF documents. So, I think javadoc can explain this, such as example description below. It can help users know the relation on manager class and table, and the relationship between tables, I think. May I add the javadoc code for each manager classes? Related tables that will modify are in this attachment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-173) Table Entity on javadoc
[ https://issues.apache.org/jira/browse/CONNECTORS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013964#comment-13013964 ] Karl Wright commented on CONNECTORS-173: I'll be pushing the newer javadocs out to the web site this evening. Table Entity on javadoc --- Key: CONNECTORS-173 URL: https://issues.apache.org/jira/browse/CONNECTORS-173 Project: ManifoldCF Issue Type: Improvement Components: Documentation Affects Versions: ManifoldCF next Reporter: Shinichiro Abe Assignee: Karl Wright Priority: Minor Labels: documentation Fix For: ManifoldCF next Attachments: CONNECTORS-173-agents.patch, CONNECTORS-173-authorities.patch, CONNECTORS-173-crawler.patch, CONNECTORS-173-webcrawler.patch, MCF_Tables.xls, example1.png, example2.png Original Estimate: 24h Remaining Estimate: 24h Proposal: MCF manages about 20 tables. I want to check the database management through seeing tables, but now there is almost no explanation in MCF documents. So, I think javadoc can explain this, such as example description below. It can help users know the relation on manager class and table, and the relationship between tables, I think. May I add the javadoc code for each manager classes? Related tables that will modify are in this attachment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Filtering out unwanted content from HTML pages
This is a good question. I think we should carry this conversation forward on connectors-dev. My initial thought on this issue is that the functionality really belongs in Tika. Tika is set up to extract and filter in exactly this way. The only reason you'd want to do it in MCF is if it would change the links you might extract (or, skip), and that seems to me less interesting. How do you feel about it? Karl On Thu, Mar 31, 2011 at 10:41 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: All major commercial search engines are shipped with a web crawler which allows one to filter out unwanted content, such as certain html blocks, comments etc. Would it be advisable to add such a functionality to MCF? Or will it be difficult to implement since the idea behind the ExtractingRequestHandler is to send binary files to Solr? Say that you have an HTML document which includes the following comments: !-- stop indexing -- !-- start indexing -- All content within these comments should then be skipped from the index. I managed to rewrite Apache Nutch in order to add this functionality for some months ago. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: [VOTE] Release ManifoldCF-0.2-incubating
(11/03/21 9:32), Karl Wright wrote: The tag is https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0. You can download the candidate from http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating. Hi Karl, I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did ant test, but I got Database exception: Exception doing query: No current connection. exception during the test. I'm not sure it is same as CONNECTORS-172. Are you digging in the issue and planning to respin the RC? Koji -- http://www.rondhuit.com/en/
Re: [VOTE] Release ManifoldCF-0.2-incubating
Yes, this does sound like CONNECTORS-172. It's intermittent; if you run the tests again it will likely pass. The problem appeared with either the new Derby or the new Jetty. Jetty changed its shutdown so that I can no longer guarantee that it shuts down before the rest of the process shuts down - it could be related to that. The Derby changes involve having more queries (so that they use indexes rather than scans), so it is possible that something is taking a bit longer than before and causing an occasional timeout. I've not seen any evidence that the problem is other than test related, though. But I think it's worth trying to understand the issue more completely before deciding whether to respin or accept the RC as it stands. Karl On Thu, Mar 31, 2011 at 12:50 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/03/21 9:32), Karl Wright wrote: The tag is https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0. You can download the candidate from http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating. Hi Karl, I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did ant test, but I got Database exception: Exception doing query: No current connection. exception during the test. I'm not sure it is same as CONNECTORS-172. Are you digging in the issue and planning to respin the RC? Koji -- http://www.rondhuit.com/en/
[jira] [Commented] (CONNECTORS-172) Intermittent test failures
[ https://issues.apache.org/jira/browse/CONNECTORS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014098#comment-13014098 ] Karl Wright commented on CONNECTORS-172: Seems to be a Derby problem. DERBY-5169 created. Intermittent test failures -- Key: CONNECTORS-172 URL: https://issues.apache.org/jira/browse/CONNECTORS-172 Project: ManifoldCF Issue Type: Bug Components: Tests Affects Versions: ManifoldCF 0.2 Reporter: Karl Wright Priority: Minor The Derby filesystem end-to-end tests sometimes randomly fail with a Database error: No existing connection error during a job status wait. Not sure what's happening here, but they succeed much of the time. There's not much of a hint beyond the stack trace. The message seems to be coming from Derby, and may be the result of a too-short wait time limit, a race condition in the test itself, or something else entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-172) Intermittent test failures
[ https://issues.apache.org/jira/browse/CONNECTORS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-172: --- Priority: Blocker (was: Minor) Intermittent test failures -- Key: CONNECTORS-172 URL: https://issues.apache.org/jira/browse/CONNECTORS-172 Project: ManifoldCF Issue Type: Bug Components: Tests Affects Versions: ManifoldCF 0.2 Reporter: Karl Wright Priority: Blocker The Derby filesystem end-to-end tests sometimes randomly fail with a Database error: No existing connection error during a job status wait. Not sure what's happening here, but they succeed much of the time. There's not much of a hint beyond the stack trace. The message seems to be coming from Derby, and may be the result of a too-short wait time limit, a race condition in the test itself, or something else entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release ManifoldCF-0.2-incubating
Looking at it in depth, it's likely that it's a Derby bug. I've filed ticket DERBY-5169 accordingly, and marked CONNECTORS-172 as a blocker for 0.2-incubating. Karl On Thu, Mar 31, 2011 at 1:11 PM, Karl Wright daddy...@gmail.com wrote: Yes, this does sound like CONNECTORS-172. It's intermittent; if you run the tests again it will likely pass. The problem appeared with either the new Derby or the new Jetty. Jetty changed its shutdown so that I can no longer guarantee that it shuts down before the rest of the process shuts down - it could be related to that. The Derby changes involve having more queries (so that they use indexes rather than scans), so it is possible that something is taking a bit longer than before and causing an occasional timeout. I've not seen any evidence that the problem is other than test related, though. But I think it's worth trying to understand the issue more completely before deciding whether to respin or accept the RC as it stands. Karl On Thu, Mar 31, 2011 at 12:50 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/03/21 9:32), Karl Wright wrote: The tag is https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC0. You can download the candidate from http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating. Hi Karl, I downloaded apache-manifoldcf-0.2-incubating-bin.tar.gz and did ant test, but I got Database exception: Exception doing query: No current connection. exception during the test. I'm not sure it is same as CONNECTORS-172. Are you digging in the issue and planning to respin the RC? Koji -- http://www.rondhuit.com/en/