From: [EMAIL PROTECTED]
Operating system: win32
PHP version: 4.3.0RC2
PHP Bug Type: Output Control
Bug description: z
hello
i wrote a script, that reads a few htmlpages with the function fgetss(),
who strips away the html code. this works propperly in the php-version
(php.4.2)
but with php4.3 something goes wrong and i cant get no output from the
function fgetss(). here are the code.
i hope this is a serious problem and it was helpful for you to report
this... greets timon
<?php
// Liest aus den HTML -Dateien den Text aus
// und strukturiert ihn für die Datenbank.
$fh = opendir("html/") or die("cant read from ./html");
$x = 0;
while($file = readdir($fh)) {
if($file == '.' || $file == '..') continue;
$files[$x++] = $file;
}
closedir($fh);
echo sizeof($files)." HTML-Dateien ausgelesen...\n";
sort($files);
foreach($files as $file) {
$fh = fopen("html/$file",'r');
while($line = fgetss($fh,filesize("html/$file"))) {
$raw_txt .= zeileputzen($line);
}
fclose($fh);
}
echo sizeof($files)." Dateien wurden geparst...\n";
$fh = fopen('raw.txt','w+');
fputs($fh,$raw_txt);
fclose($fh);
echo "...und in die Datei <a href=\"raw.txt\">raw.txt</a>
geschrieben...\n";
function zeileputzen($zeile) {
// Tabulatoren, , Linktext usw raus...
$zeile = str_replace("\t",'',$zeile);
$zeile = str_replace("nach oben",'',$zeile);
$zeile = str_replace(" ",'',$zeile);
$zeile = preg_replace("/^ */",'',$zeile);
if(preg_match("/^LvH-Umfeld/",$zeile)) {$zeile = '';}
if(preg_match("/^Umfeld/",$zeile)) {$zeile = '';}
if(preg_match("/^Personen/",$zeile)) {$zeile = '';}
if(preg_match("/^- \w/",$zeile)) {$zeile = '';}
$zeile = preg_replace("/^\W ?/","",$zeile);
$zeile = preg_replace("/(B: )/","\n@B: ",$zeile);
$zeile = preg_replace("/(Br: )/","\n@Br: ",$zeile);
$zeile = preg_replace("/(K: )/","\n@K: ",$zeile);
$zeile = preg_replace("/(B: )(\n)/",'B: ',$zeile);
$zeile = preg_replace("/(Br: )(\n)/",'Br: ',$zeile);
$zeile = preg_replace("/(K: )(\n)/",'K: ',$zeile);
return $zeile;
}
$fh = fopen("raw.txt",'r') or die("unable to read from raw.txt");
$raw_txt = fread($fh,filesize('raw.txt'));
fclose($fh);
echo "raw.txt ausgelesen\n";
$pieces = explode(chr(10),$raw_txt);
$f = 0;
foreach($pieces as $lines) {
if(strlen($lines) == 0) { $f++; }
if(strlen($lines) > 2) { $f = 0; }
if($f > 10) {
$new_buffer .= '###';
$f = 0;
}
echo " :: line -> $lines\n";
$new_buffer .= $lines."\n";
}
$f_pieces = explode('###',$new_buffer);
unset($new_buffer);
foreach($f_pieces as $l) {
echo strlen($l);
if(preg_match("/[A-Za-z0-9]/",$l)) {
$f_lines = explode(chr(10),$l);
$new_buffer .= "***";
foreach($f_lines as $f) {
if(!strlen($f)) { continue; }
$f = trim($f);
$new_buffer .= trim($f)."\n";
}
}
}
$fh = fopen('formatted.txt','w+');
fwrite($fh,$new_buffer);
fclose($fh);
echo "<a href=\"formatted.txt\">formatted.txt</a> geschrieben!\n";
echo "<a href=\"formattedtxt2mysql.php\">text in db eintragen...</a>\n";
?>
--
Edit bug report at http://bugs.php.net/?id=20918&edit=1
--
Try a CVS snapshot: http://bugs.php.net/fix.php?id=20918&r=trysnapshot
Fixed in CVS: http://bugs.php.net/fix.php?id=20918&r=fixedcvs
Fixed in release: http://bugs.php.net/fix.php?id=20918&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=20918&r=needtrace
Try newer version: http://bugs.php.net/fix.php?id=20918&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=20918&r=support
Expected behavior: http://bugs.php.net/fix.php?id=20918&r=notwrong
Not enough info: http://bugs.php.net/fix.php?id=20918&r=notenoughinfo
Submitted twice: http://bugs.php.net/fix.php?id=20918&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=20918&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=20918&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=20918&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=20918&r=isapi