Hello,

 

I am building a database of external urls as shown below.  But when I start digging the limit_urls_to does not seem to be working  it appears to keep adding New servers for all links on the pages (see below). This simple search is taking way to long can someone and the results are not as expected.  Can someone point me in the right direction?

 

start_url:                       http://www.bid4assets.com/storefront/index.cfm?fuseaction=govtassets&sfid=21 /

                                                http://www.bidshares.com/auctions/index.cfm?fuseaction=act_search&auctiontypeid=1 /

                                                http://bottrealtyauction.com/listing.php3 /

                                                http://century21auctions.com/Auctions.htm /

                                                http://ctcis.net/~jmcampbe/upcoming.htm /

                                                http://www.ajbillig.com/upcoming_auctions.html

 

limit_urls_to:  http://www.bid4assets.com/storefront/index.cfm?fuseaction=govtassets&sfid=21 /

                                                http://www.bidshares.com/auctions/index.cfm?fuseaction=act_search&auctiontypeid=1 /

                                                http://bottrealtyauction.com/listing.php3 /

                                                http://century21auctions.com/Auctions.htm /

                                                http://ctcis.net/~jmcampbe/upcoming.htm /

                                                http://www.ajbillig.com/upcoming_auctions.html

 

the following keeps going and going and going

+

New server: www.php.net, 80

+++

New server: www.bronsonbeta.com, 80

+++++++

New server: paypal.typepad.com, 80

+

New server: www.bungie.net, 80

+

New server: theinquirer.net, 80

+*

New server: www.codingmonkeys.de, 80

++

New server: tenyearsofmylife.com, 80

+++

New server: www.macdevcenter.com, 80

++*

New server: dashes.com, 80

+

New server: evhead.com, 80

+**+

New server: kottke.org, 80

+

New server: www.randomfoo.net, 80

+

New server: glchoate.com, 80

+***

New server: www.dollarshort.org, 80

+

New server: lifeuncommon.org, 80

+*

New server: www.bradchoate.com, 80

++* size = 47391

335:6148:12:http://www.ccc.de/: +-++++++++

New server: ds.ccc.de, 80

+

New server: chaosradio.ccc.de, 80

+++-***+*+++++++++++++++++++++++++++++*

New server: www.blinkenlights.de, 80

+

New server: www.haecksen.org, 80

++

New server: www.de.inter.net, 80

++*+*+*+*+*+*+*+*+*+**+++++* size = 14701

336:7069:12:http://www.typekey.com/: --+++* size = 6095

337:4475:12:http://www.virusthreatcenter.com/articlearchive.aspx: ***********************++++++++++++++++********************************************************** size = 16932

338:7330:11:http://www.command-post.org/: -++

New server: www.conventionbloggers.com, 80

+

New server: www.rncbloggers.com, 80

+

New server: command-post.org, 80

++++++++++++++++++++++++

New server: www.cafeshops.com, 80

++

New server: www.philly.com, 80

+++

New server: www.time.com, 80

++

New server: www.foxnews.com, 80

+

New server: www.pbs.org, 80

++

New server: cbs.marketwatch.com, 80

+

New server: www.bbc.co.uk, 80

+

New server: www.chicagotribune.com, 80

+++

New server: www.chron.com, 80

+

New server: www.sunspot.net, 80

++

New server: news.ft.com, 80

+

New server: www.weeklystandard.com, 80

+

New server: discover.npr.org, 80

+++

New server: www.20minutos.es, 80

++

New server: www.reseaux-telecoms.com, 80

+

New server: www.theaustralian.news.com.au, 80

+

New server: www.smh.com.au, 80

++

New server: www.knoxnews.com, 80

+

New server: www.helsinginsanomat.fi, 80

+

New server: www.sun-sentinel.com, 80

+

New server: www.rockymountainnews.com, 80

+

New server: www.twincities.com, 80

++++++++++

New server: www.e-democracy.org, 80

+*

New server: www.sekimori.com, 80

+

New server: www.hostmatters.com, 80

+

New server: sm2.sitemeter.com, 80

++*

New server: www.blogshares.com, 80

++

New server: 2004weblogawards.com, 80

+**+++++++**+++++++**+++++++**+++++++*+++++++**+++++++

New server: proxy.blogads.com, 80

Unable to build connection with proxy.blogads.com:80

+--

New server: www.iuniverse.com, 80

+---------- size = 37189

339:8513:12:http://pages.prodigy.net/thomasn528/blog/newsrackblog.html: ++++++

New server: memory.loc.gov, 80

+*+*+

 

 

TIA

 

Rob

 

Reply via email to