[General] Webboard: In/out links and fetching time for each page + xpath

bar Thu, 05 Dec 2013 10:10:56 -0800

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi,

> Hi there,
> 
> Is it possible to obtain these informations after having crawled a website :
> - Fetching / downloading time of each page
> - Total in and out links (from the website structure itself)


This is possible in mnogosearch-3.4.0, which is in pre-alpha stage at the 
moment. If you'd like to give it a try, please download it from here:
http://www.mnogosearch.org/Download/mnogosearch-3.4.0.tar.gz
(note, this is not the final 3.4.0).

- See the ResponseTime special purpose section here:
http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-special

- The structure of the table "links" has changed.
It now can store all links between the pages.
Please see here how to configure it:
http://www.mnogosearch.org/doc34/msearch-cmdref-collectlinks.html


> 
> Would it be possible to add xpath support instead of regex for Sections ?
> Using a plugin or natively.

I guess you need this is for XML files.

XPath is currently not possible. We could take advantage
of libxml2 to add XPath support. But this needs some
development efforts.

> 
> Many thanks !


Reply: <http://www.mnogosearch.org/board/message.php?id=21598>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: In/out links and fetching time for each page + xpath

Reply via email to