[jira] Resolved: (CONNECTORS-103) RSS connector: Have better initial default values for throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-103. Fix Version/s: LCF Release 0.5 Resolution: Fixed r994959. RSS connector: Have better initial default values for throttling Key: CONNECTORS-103 URL: https://issues.apache.org/jira/browse/CONNECTORS-103 Project: Apache Connectors Framework Issue Type: Improvement Components: RSS connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 When you first create an rss connector connection, the bandwidth tab should come prepopulated with the following values: Max connections per server: 2 Max KB per second per server: 64 Max fetches per minute per server: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-103) RSS connector: Have better initial default values for throttling
[ https://issues.apache.org/jira/browse/CONNECTORS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-103: -- Assignee: Karl Wright RSS connector: Have better initial default values for throttling Key: CONNECTORS-103 URL: https://issues.apache.org/jira/browse/CONNECTORS-103 Project: Apache Connectors Framework Issue Type: Improvement Components: RSS connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 When you first create an rss connector connection, the bandwidth tab should come prepopulated with the following values: Max connections per server: 2 Max KB per second per server: 64 Max fetches per minute per server: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (CONNECTORS-102) Web Connector should have a prepopulated bandwidth throttle
[ https://issues.apache.org/jira/browse/CONNECTORS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright reassigned CONNECTORS-102: -- Assignee: Karl Wright Web Connector should have a prepopulated bandwidth throttle --- Key: CONNECTORS-102 URL: https://issues.apache.org/jira/browse/CONNECTORS-102 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Karl Wright Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 When you first create a web connector connection, the bandwidth tab should come prepopulated with a bandwidth throttle that has the following data: Description: All domains Bin regular expression: blank Max connections: 2 Max KB per second: 64 Max fetches per minute: 12 Too many casual users of ACF have been crawling without any throttling, and that's going to give ACF a bad name in the long run, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907154#action_12907154 ] Karl Wright commented on CONNECTORS-104: Trying to limit to the seed domains automatically would, I think, cause more confusion than help. I can, however, imagine introducing a checkbox on the Inclusions tab that, if checked, would limit the crawl to just the domains represented by the seeds, and even making it checked by default. The implied regular expression would be: ^http[?s]://domain[/$\?] for each seed, I believe. (That's potentially a lot of regular expressions if the number of seeds is large, so obviously the logic wouldn't be using regexp's in practice.) Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Jack Krupansky Priority: Minor Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907201#action_12907201 ] Jack Krupansky commented on CONNECTORS-104: --- Simple works best. This enhancement is primarily for the simple use case where a novice user tries to do what they think is obvious (crawl the web pages at this URL), but without considering all of the potential nuances or how to fully specify the details of their goal. One nuance is whether subdomains are considered part of the domain. I would say no if a subdomain was specified by the user and yes if no subdomain was specified. Another nuance is whether a path is specified to select a subset of a domain. It would be nice to handle that and (optionally) limit the crawl to that path (or sub-paths below it). An example would be to crawl the news archive for a site. Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Jack Krupansky Priority: Minor Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907203#action_12907203 ] Karl Wright commented on CONNECTORS-104: For someone who is purportedly trying to make things simpler, you have specified a rather complex set of rules, many of which seem of questionable utility to me. Since this is basically just a shortcut, I propose a simple feature that just limits all urls to hosts that are explicitly mentioned in the seeds. Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Jack Krupansky Priority: Minor Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-104) Make it easier to limit a web crawl to a single site
[ https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-104. Assignee: Karl Wright Fix Version/s: LCF Release 0.5 Resolution: Fixed r995042. Make it easier to limit a web crawl to a single site Key: CONNECTORS-104 URL: https://issues.apache.org/jira/browse/CONNECTORS-104 Project: Apache Connectors Framework Issue Type: Improvement Components: Web connector Reporter: Jack Krupansky Assignee: Karl Wright Priority: Minor Fix For: LCF Release 0.5 Unless the user explicitly enters an include regex carefully, a web crawl can quickly get out of control and start crawling the entire web when all the user may really want is to crawl just a single web site or portion thereof. So, it would be preferable if either by default or with a simple button the crawl could be limited to the seed web site(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907358#action_12907358 ] Jettro Coenradie commented on CONNECTORS-92: I worked on it tonight but I decided to stop. This path is not leading in a direction that I would like. To make most out of maven I would like to change more than you would be willing to right now. I cannot blame you, because you have something working right now. Maybe someone else wants to step in and finish what I have done. I can submit another patch with the stuff I have right now. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Assignee: Karl Wright Attachments: maven-poms-including-start-jar.patch, maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management
[ https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907392#action_12907392 ] Karl Wright commented on CONNECTORS-92: --- Jettro, Please go ahead and submit everything you have. I'd also like to know what you believe the stumbling blocks to be. Thanks for all your work on this so far. Move from ant to maven or other build system with decent library management --- Key: CONNECTORS-92 URL: https://issues.apache.org/jira/browse/CONNECTORS-92 Project: Apache Connectors Framework Issue Type: Wish Components: Build Reporter: Jettro Coenradie Assignee: Karl Wright Attachments: maven-poms-including-start-jar.patch, maven-poms-problem-starting-jetty-and-derby.patch, move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png I am looking at the current project structure. If we want to make another build tool available I think we need to change the directory structure. I tried to place a suggestion in an image. Can you please have a look at it. If we agree that this is a good way to go, than I will continue to work on a patch. Which might be a bit hard with all these changing directories, but I'll do my best to at least get an idea whether it would be working. So I have three questions: - Do you want to move to maven or put maven next to ant? - Do you prefer another build mechanism [ant with ivy, gradle, maven3] - Do you have an idea about the amount of scripts that need to be changed if we change the project structure The image of a possible project layout (that is based on the maven standards) is attached to the issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.