date:20131205

[General] Webboard: In/out links and fetching time for each page + xpath

2013-12-05 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hi,

 Hi there,
 
 Is it possible to obtain these informations after having crawled a website :
 - Fetching / downloading time of each page
 - Total in and out links (from the website structure itself)

This is possible in mnogosearch-3.4.0, which is in pre-alpha stage at the 
moment. If you'd like to give it a try, please download it from here:
http://www.mnogosearch.org/Download/mnogosearch-3.4.0.tar.gz
(note, this is not the final 3.4.0).

- See the ResponseTime special purpose section here:
http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-special

- The structure of the table links has changed.
It now can store all links between the pages.
Please see here how to configure it:
http://www.mnogosearch.org/doc34/msearch-cmdref-collectlinks.html


 
 Would it be possible to add xpath support instead of regex for Sections ?
 Using a plugin or natively.

I guess you need this is for XML files.

XPath is currently not possible. We could take advantage
of libxml2 to add XPath support. But this needs some
development efforts.

 
 Many thanks !


Reply: http://www.mnogosearch.org/board/message.php?id=21598

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: In/out links and fetching time for each page + xpath

2013-12-05 Thread bar

Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
skip
 
 I guess you need this is for XML files.
 
 XPath is currently not possible. We could take advantage
 of libxml2 to add XPath support. But this needs some
 development efforts.
 

Btw, simple extraction from a given XML tag is supported
in 3.3.x, with help of the Section command.

For example:

xml
  a
   bI want to extract this/b
  /a
/xml

A command like this will do the trick:

Section xml.a.b  10 128



Reply: http://www.mnogosearch.org/board/message.php?id=21599

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

[General] Webboard: In/out links and fetching time for each page + xpath

[General] Webboard: In/out links and fetching time for each page + xpath

2 matches

Site Navigation

Mail list logo

Footer information