date:20100908

[jira] Resolved: (CONNECTORS-103) RSS connector: Have better initial default values for throttling

2010-09-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-103.


Fix Version/s: LCF Release 0.5
   Resolution: Fixed

r994959.


 RSS connector: Have better initial default values for throttling
 

 Key: CONNECTORS-103
 URL: https://issues.apache.org/jira/browse/CONNECTORS-103
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: RSS connector
Reporter: Karl Wright
Assignee: Karl Wright
Priority: Minor
 Fix For: LCF Release 0.5


 When you first create an rss connector connection, the bandwidth tab should 
 come prepopulated with the following values:
 Max connections per server: 2
 Max KB per second per server: 64
 Max fetches per minute per server: 12
 Too many casual users of ACF have been crawling without any throttling, and 
 that's going to give ACF a bad name in the long run,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CONNECTORS-103) RSS connector: Have better initial default values for throttling

2010-09-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-103:
--

Assignee: Karl Wright

 RSS connector: Have better initial default values for throttling
 

 Key: CONNECTORS-103
 URL: https://issues.apache.org/jira/browse/CONNECTORS-103
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: RSS connector
Reporter: Karl Wright
Assignee: Karl Wright
Priority: Minor
 Fix For: LCF Release 0.5


 When you first create an rss connector connection, the bandwidth tab should 
 come prepopulated with the following values:
 Max connections per server: 2
 Max KB per second per server: 64
 Max fetches per minute per server: 12
 Too many casual users of ACF have been crawling without any throttling, and 
 that's going to give ACF a bad name in the long run,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CONNECTORS-102) Web Connector should have a prepopulated bandwidth throttle

2010-09-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-102:
--

Assignee: Karl Wright

 Web Connector should have a prepopulated bandwidth throttle
 ---

 Key: CONNECTORS-102
 URL: https://issues.apache.org/jira/browse/CONNECTORS-102
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: Web connector
Reporter: Karl Wright
Assignee: Karl Wright
Priority: Minor
 Fix For: LCF Release 0.5


 When you first create a web connector connection, the bandwidth tab should 
 come prepopulated with a bandwidth throttle that has the following data:
 Description: All domains
 Bin regular expression: blank
 Max connections: 2
 Max KB per second: 64
 Max fetches per minute: 12
 Too many casual users of ACF have been crawling without any throttling, and 
 that's going to give ACF a bad name in the long run,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907154#action_12907154
 ] 

Karl Wright commented on CONNECTORS-104:


Trying to limit to the seed domains automatically would, I think, cause more 
confusion than help.  I can, however, imagine introducing a checkbox on the 
Inclusions tab that, if checked, would limit the crawl to just the domains 
represented by the seeds, and even making it checked by default.  The implied 
regular expression would be:

^http[?s]://domain[/$\?]

for each seed, I believe.  (That's potentially a lot of regular expressions if 
the number of seeds is large, so obviously the logic wouldn't be using regexp's 
in practice.)


 Make it easier to limit a web crawl to a single site
 

 Key: CONNECTORS-104
 URL: https://issues.apache.org/jira/browse/CONNECTORS-104
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: Web connector
Reporter: Jack Krupansky
Priority: Minor

 Unless the user explicitly enters an include regex carefully, a web crawl can 
 quickly get out of control and start crawling the entire web when all the 
 user may really want is to crawl just a single web site or portion thereof. 
 So, it would be preferable if either by default or with a simple button the 
 crawl could be limited to the seed web site(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-08 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907201#action_12907201
]

Jack Krupansky commented on CONNECTORS-104:
---

Simple works best. This enhancement is primarily for the simple use case where
a novice user tries to do what they think is obvious (crawl the web pages at
this URL), but without considering all of the potential nuances or how to
fully specify the details of their goal.

One nuance is whether subdomains are considered part of the domain. I would say
no if a subdomain was specified by the user and yes if no subdomain was
specified.

Another nuance is whether a path is specified to select a subset of a domain.
It would be nice to handle that and (optionally) limit the crawl to that path
(or sub-paths below it). An example would be to crawl the news archive for a
site.

Make it easier to limit a web crawl to a single site

Key: CONNECTORS-104
URL: https://issues.apache.org/jira/browse/CONNECTORS-104
Project: Apache Connectors Framework
Issue Type: Improvement
Components: Web connector
Reporter: Jack Krupansky
Priority: Minor

Unless the user explicitly enters an include regex carefully, a web crawl can
quickly get out of control and start crawling the entire web when all the
user may really want is to crawl just a single web site or portion thereof.
So, it would be preferable if either by default or with a simple button the
crawl could be limited to the seed web site(s).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907203#action_12907203
 ] 

Karl Wright commented on CONNECTORS-104:


For someone who is purportedly trying to make things simpler, you have 
specified a rather complex set of rules, many of which seem of questionable 
utility to me.

Since this is basically just a shortcut, I propose a simple feature that just 
limits all urls to hosts that are explicitly mentioned in the seeds.


 Make it easier to limit a web crawl to a single site
 

 Key: CONNECTORS-104
 URL: https://issues.apache.org/jira/browse/CONNECTORS-104
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: Web connector
Reporter: Jack Krupansky
Priority: Minor

 Unless the user explicitly enters an include regex carefully, a web crawl can 
 quickly get out of control and start crawling the entire web when all the 
 user may really want is to crawl just a single web site or portion thereof. 
 So, it would be preferable if either by default or with a simple button the 
 crawl could be limited to the seed web site(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

2010-09-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-104.


 Assignee: Karl Wright
Fix Version/s: LCF Release 0.5
   Resolution: Fixed

r995042.


 Make it easier to limit a web crawl to a single site
 

 Key: CONNECTORS-104
 URL: https://issues.apache.org/jira/browse/CONNECTORS-104
 Project: Apache Connectors Framework
  Issue Type: Improvement
  Components: Web connector
Reporter: Jack Krupansky
Assignee: Karl Wright
Priority: Minor
 Fix For: LCF Release 0.5


 Unless the user explicitly enters an include regex carefully, a web crawl can 
 quickly get out of control and start crawling the entire web when all the 
 user may really want is to crawl just a single web site or portion thereof. 
 So, it would be preferable if either by default or with a simple button the 
 crawl could be limited to the seed web site(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management

2010-09-08 Thread Jettro Coenradie (JIRA)

[
https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907358#action_12907358
]

Jettro Coenradie commented on CONNECTORS-92:

I worked on it tonight but I decided to stop. This path is not leading in a
direction that I would like. To make most out of maven I would like to change
more than you would be willing to right now. I cannot blame you, because you
have something working right now. Maybe someone else wants to step in and
finish what I have done. I can submit another patch with the stuff I have right
now.

Move from ant to maven or other build system with decent library management
---

Key: CONNECTORS-92
URL: https://issues.apache.org/jira/browse/CONNECTORS-92
Project: Apache Connectors Framework
Issue Type: Wish
Components: Build
Reporter: Jettro Coenradie
Assignee: Karl Wright
Attachments: maven-poms-including-start-jar.patch,
maven-poms-problem-starting-jetty-and-derby.patch,
move-to-maven-acf-framework.patch, Screen shot 2010-08-23 at 16.31.07.png

I am looking at the current project structure. If we want to make another
build tool available I think we need to change the directory structure. I
tried to place a suggestion in an image. Can you please have a look at it. If
we agree that this is a good way to go, than I will continue to work on a
patch. Which might be a bit hard with all these changing directories, but
I'll do my best to at least get an idea whether it would be working.
So I have three questions:
- Do you want to move to maven or put maven next to ant?
- Do you prefer another build mechanism [ant with ivy, gradle, maven3]
- Do you have an idea about the amount of scripts that need to be changed if
we change the project structure
The image of a possible project layout (that is based on the maven standards)
is attached to the issue

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management

2010-09-08 Thread Karl Wright (JIRA)

[
https://issues.apache.org/jira/browse/CONNECTORS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907392#action_12907392
]

Karl Wright commented on CONNECTORS-92:
---

Jettro,

Please go ahead and submit everything you have. I'd also like to know what you
believe the stumbling blocks to be. Thanks for all your work on this so far.

Move from ant to maven or other build system with decent library management
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CONNECTORS-103) RSS connector: Have better initial default values for throttling

[jira] Assigned: (CONNECTORS-103) RSS connector: Have better initial default values for throttling

[jira] Assigned: (CONNECTORS-102) Web Connector should have a prepopulated bandwidth throttle

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

[jira] Commented: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

[jira] Resolved: (CONNECTORS-104) Make it easier to limit a web crawl to a single site

[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management

[jira] Commented: (CONNECTORS-92) Move from ant to maven or other build system with decent library management

9 matches

Site Navigation

Mail list logo

Footer information