Friday, September 11, 2009, 1:09:42 PM, Hans wrote: > For TextExtract I cannot just use PmWiki's search engine, > because we need to extract text. But thanks to your suggestion I was > inspired to look at the handling of search terms again, and will > incorporate the way PmWiki's search handles search terms, so we can > have input like > 'abc xyz' => output with 'abc' AND 'xyz' in the page; > '"abc def" xyz' => output with 'abc def' AND 'xyz' in the page; > 'abc -xyz' => output with 'abc' but NOT 'xyz' in the page; > 'abc|xyz' => output with 'abc' OR 'xyz' in the page;
Now available in the latest release. http://www.pmwiki.org/wiki/Cookbook/TextExtract I also added some template variables for use in parameters header= , footer= , phead= for instance a header with a custom title and the search time: header="%rfloat%{$$time}%%'''Listing'''" I split regular expression search from standard search, to allow easier term input, and added a checkbox for regular expression search to the search form. I added a checkbox for 'Match whole words' for whole word searches. A note on efficiency: TextExtract with its in-built pagelist function runs faster than using PmWiki's pagelist, or MakePageList() function, mainly because PmWiki's pagelist process opens every page to check if the user is authorised to see the page, because it does not want to output any non-authorised pages, for instance read-protected pages. This file opening can be quite time consuming. On the other hand TextExtract constructs a pagelist including even read-protected pages, authorisations are not checked at this stage in the process. Only later when each page on the source list is opened will authorisation be checked, before text lines are extracted and processed. So a lot less pages need to be opened, which makes for a faster process. That is the main reason I did not use MakePageList() as a source pagelist generator. Still, a possibility remains to use the PmWiki searchbox with a fmt=#extract option, which will use PmWiki's pagelist functions and TextExtract formatting functions. Useful if you need to pass pagelist parameters TextExtract does not understand. ~Hans _______________________________________________ pmwiki-users mailing list [email protected] http://www.pmichaud.com/mailman/listinfo/pmwiki-users
