[fw-general] Determining installed version of QMail, Exim, etc.

2007-01-27 Thread Simon Mundy

Hi everyone

I'm in the middle of writing a 'Mail' config utility for  
Zend_Environment that tries to determine the locally installed MTA -  
it will help debugging quite a lot.


So far I can reliably determine Postfix (since it's my MTA of  
choice :) and I _think_ I've got Exim covered but can't for the life  
of me work out how you can check which version of Qmail is installed.


Can anyone help with the following:-

* Using either a config file or binary to determine version for Qmail
* Using either a config file or binary to determine version for Exim  
(I currently use:-


if (preg_match('/Exim version/msi', $this->_config =  
shell_exec('exim -bV'))) {

return 'exim';
}

* Using either a config file or binary to determine version for Sendmail

Any others that we should be looking for?

I'm also needing it to be cross platform, although the php.ini file  
can give some clues as to the path/host/port etc.


As a start, here's how Postfix config can be returned:-

if (preg_match('/mail_version/msi', $this->_config = shell_exec 
('postconf -d'))) {

return 'postfix';
}

and parsed...

preg_match_all('/([^\s]+)\s=\s([^\s]+)/msi', $this->_config,  
$config, PREG_PATTERN_ORDER);

$config = array_combine($config[1], $config[2]);

Cheers

--

Simon Mundy | Director | PEPTOLAB

""" " "" "" "" "" """ " "" " " " "  "" "" "
202/258 Flinders Lane | Melbourne | Victoria | Australia | 3000
Voice +61 (0) 3 9654 4324 | Mobile 0438 046 061 | Fax +61 (0) 3 9654  
4124

http://www.peptolab.com




[fw-general] Zend_Search_Lucene questions ...

2007-01-27 Thread Sebi
1. I want to use Zend_Lucene_Search with millions of documents. I will delete 
existing documents and add new ones many times. Is Zend_Search capable to face 
this situation? Will these actions damage the segments in any way?

2. When I delete an existing document, when the automatic commit will happen?

3. I have 8737 documents which are indexed right now. When I search after  
keywords like: 'arte', 'galeria', etc, I get a time about 3.15 sec. When I had 
only  4500 documents my time was about 1.6 sec. The generated query looks like: 
+(((titleSrch:galeria)) ((descriptionSrch:galeria)) ((tagsSrch:galeria))) 
+(countryID:1) . 

I mention that I measure only the find time and not the total time (the find 
time plus the extraction of the documents fields). The 3 second for a search 
with only 8737 documents are a lot. What I will have 2 millions of documents.

What is you opinion Alex? Is there a way to improve my query? (maybe fewer 
parentheses?). I need a solution because the time is slow. 

Thank you.





 

Want to start your own business?
Learn how on Yahoo! Small Business.
http://smallbusiness.yahoo.com/r-index


Re: [fw-general] Any plans for Zend_Controller_Request_Cli?

2007-01-27 Thread Matthew Weier O'Phinney
-- Tony Ford <[EMAIL PROTECTED]> wrote
(on Saturday, 27 January 2007, 01:00 AM -0700):
> It would be nice for utilities and especially crons. Support for cli 
> options "-r -n" as well as arguments would be nice.

My plan all along was to allow supporting CLI options. Now that
Zend_Console_Getopt is in the incubator, this possibility is a step
closer to realization.

If you're interested in helping develop this request object, please
contact me and we'll work on a proposal and design guideline.

-- 
Matthew Weier O'Phinney
PHP Developer| [EMAIL PROTECTED]
Zend - The PHP Company   | http://www.zend.com/


Re: [fw-general] Interesting Zend_Search additions

2007-01-27 Thread Simon Mundy

Hi Alexander

I'm a wee bit lazy so I just run all my HTML text through Tidy (added  
as a PHP extension) and it's a consistent base to start from. I  
realise this isn't going to be possible in all environments but it  
may be a good idea to check if it exists and 'sanitise' the HTML  
input with this first. A fallback would be to perform some sort of  
regex check on the HTML to see if it can be parsed but not to the  
extent of DOCTYPE checking for HTML/XHTML. I imagine that true HTML  
parsing would be a nightmare in itself.


Having said that I would say that if a document is too broken as to  
be unreadable to regex checks then it doesn't deserve indexing. Regex  
checking IMO is going to be somewhat more flexible that using DOM (as  
not all HTML docs are going to be XHTML compliant) and is likely to  
have better support. It's quick enough too.


I use this to grab all links from within a doc:-

function _extractLinks($body)
{
preg_match_all("/<(?:a|link|area)[^>+]href\s*=\s*\"([^\"]+) 
\"/is", $body, $hrefMatches);
preg_match_all("/<(?:img|script|input|i?frame)[^>+]src\s*=\s* 
\"([^\"]+)\"/is", $body, $srcMatches);
preg_match_all("/<(?:body|table|td|th)[^>+]background\s*=\s* 
\"([^\"]+)\"/is", $body, $bgMatches);


$links   = array();
$matches = array_map('trim', array_merge($hrefMatches[1],  
$srcMatches[1], $bgMatches[1]));


foreach ($matches as $link) {
if (($link = $this->_parseLink($link)) !== false) { //  
internal check for URI well-formedness

 array_push($links, $link);
}
}

return $links;
}

and then use internal MIME-type checking to work out what is/isn't  
parsable. $body is the entire text of a doc, not simply the  
 ...  content.


Hmmm... actually looking at this code it deserves some refinement but  
that's for another day... But this could be a start.


I'd imaging that instead of returning an array of URIs it may be  
better to return an iterator containing Zend_Uri objects?


I don't think it's really the responsibility of Zend_Search to pre- 
check the validity of an HTML document, anyway. If it's created and  
fed text then it should assume that the HTML is OK and do its duty.


Hope these random thoughts help - look forward to your next step


Hi Simon,

There was no HTML documents parsing/indexing capability in  
Zend_Search up to now. But it's most common format for Internet :)


It's experimental now, so it's not documented and I didn't make any  
announcement :)


I consider what should be used for this.
1) Pure PHP parser gives possibility to implement just what we  
want. The question is a performance if we plan to index a lot of  
documents.


2) DOM HTML parsing functionality. It's good and fast. It also  
allows to use XPath expressions to retrieve any part of a document.  
But parsing behavior is not under control, ex. it doesn't recognize  
document encoding in some cases.


3) regex's
I have some scepsis with this.
There are some non-trivial cases like parsing time encoding  
recognition, non-matched tags, scripts, '<' sign within script  
strings, escaped quotas, double/single quotas usage and so on.
I've never seen non-buggy regex, which parses all these things  
correctly. :(


Any ideas?


With best regards,
   Alexander Veremyev.

Simon Mundy wrote:

Hi Alexander
Just noticed a new HTML document component in Zend_Search. Is this  
the start of the killer ZF-powered spider? :)
Would be very keen to know how you intended to use it as I've  
implemented a spider of sorts that can parse HTML and PDF files  
but is probably a little limited in its scope. And I use regex's  
instead of the more clean usage of the DOM library that you have...

Cheers
--
Simon Mundy | Director | PEPTOLAB
""" " "" "" "" "" """ " "" " " " "  "" "" "
202/258 Flinders Lane | Melbourne | Victoria | Australia | 3000
Voice +61 (0) 3 9654 4324 | Mobile 0438 046 061 | Fax +61 (0) 3  
9654 4124

http://www.peptolab.com




--

Simon Mundy | Director | PEPTOLAB

""" " "" "" "" "" """ " "" " " " "  "" "" "
202/258 Flinders Lane | Melbourne | Victoria | Australia | 3000
Voice +61 (0) 3 9654 4324 | Mobile 0438 046 061 | Fax +61 (0) 3 9654  
4124

http://www.peptolab.com




[fw-general] Any plans for Zend_Controller_Request_Cli?

2007-01-27 Thread Tony Ford
It would be nice for utilities and especially crons. Support for cli 
options "-r -n" as well as arguments would be nice.


Just curious ... if not, I'm sure we'll end up implementing one cause we 
pretty much do not write any perl crons anymore.


- Tony