hello dear List
i have a Problem with the Parsing of a html-document:
I tried to run the following perl parser script on the HTML further below...
but i was not lucky - so now i want to try it with PHP - i head about
DomDocument - this should save my backside
i have to get involved with Domdocument
here the full story - and my trials in PERL...:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1,
br_translate => 0 );
$table->parse($html);
foreach my $row ($table->rows)
sub cleanup {
for ( @_ ) {
s/\s+//;
s/[\xa0 ]+\z//;
s/\s+/ /g;
}
}
{ print join("\t", @$row), "\n"; }
Well - friends - now i will try this with PHP . Any idas or assets of "sharing"
this or that!??!!
And i head bout DomDocument -
I want to aks all Experts here i need to swithc from HTML to PHP.
Regarding the above mentioned issue: I am not able to figure out how to use the
columns method on the below HTML-file:My intuition makes me think it should be
something like the following (but my intuition is wrong): foreach my $column
($table->columns) { print join("\t", @$column), "\n"; }
The HTML::TableExtract-documentation doesn't shed much light (for me anyway). I
can see in the code of the module that the columns method belongs to
HTML::TableExtract::Table, but I can't figure out how to use it. I appreciate
any help.
Background:
I try to get the table extracted and I have a very very small document of
tables that i want to parse with this
(HTML::TableExtract) module
I am trying to search for keywords in the HTML - so that i can takte them for
the attribs
I have to print only the necessary data.
I tried going CPAN but could not really find how to search through it for
particular keywords.
One way to do it would be HTML::TableExtract - the other way would be to parse
with HTML :: TokeParser
I have very little experience with HTML :: TokeParser
Well - one or the other way i need to do this parsing.:
i want to output the result of the parsed tables into some .text - or even
better store it into a database.
The problem here:: is I cant find anyway to search through the resulting parsed
table and get necessary data.
thanks for the reply I appreciate
##### the code: #####
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css">
<title>Weitere Schulinformationen</title>
</head>
<body class="bodyclass">
<div style="text-align:center;"><center>
<!-- <fieldset><legend> general information </legend>
-->
<br/>
<table border="1" cellspacing="0" bordercolordark="white"
bordercolorlight="black" width="80%" class='bp_result_tab_info'>
<!-- <table border="0" cellspacing="0" bordercolordark="white"
bordercolorlight="black" width="80%" class='bp_search_info'>
-->
<tr>
<td width="100%" colspan="2" class="ldstabTitel"><strong>data_one
</strong></td>
</tr>
<tr>
<td width="27%"><strong>data_two</strong></td>
<td width="73%">nbsp;116439
</td>
</tr>
<tr>
<td width="27%"><strong>official_description</strong></td>
<td width="73%">the name </td>
</tr>
<tr>
<td width="27%"><strong>name of the street</strong></td>
<td width="73%">champs elysee</td>
</tr>
<tr>
<td width="27%"><strong>number and town</strong></td>
<td width="73%"> 75000 paris </td>
</tr>
<tr>
<td width="27%"><strong>telefon</strong></td>
<td width="73%">nbsp;000241 49321
</td>
</tr>
<tr>
<td width="27%"><strong>fax</strong></td>
<td width="73%">nbsp;000241 4093287
</td>
</tr>
<tr>
<td width="27%"><strong>e-mail-adresse</strong></td>
<td width="73%"> <a
href=mailto:1111116...@my_domain.org>[email protected]</a>
</td>
</tr>
<tr>
<td width="27%"><strong>internet-site</strong></td>
<td width="73%"> <a href=http://www.thesite.org>http://www.thesite.org</td>
</tr>
<!--
<tr>
<td width="27%">nbsp;</td>
<td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? print $SCHULNR
?>" target="_blank">
[Schuldaten;</a>
</tr>
</td> -->
<tr>
<td width="27%">bsp;</td>
<td width="73%">the department</td>
</tr>
<tr>
<td width="100%" colspan=2><strong>nbsp;</strong></td>
</tr>
<tr>
<td width="27%"><strong>number of indidviduals</strong></td>
<td width="73%">nbsp;1y92</td>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<!-- if (!fsp.isEmpty()){
ztext = "nbsp;";
int i = 0;
Iterator it = fsp.iterator();
while (it.hasNext()){
String[] zwert = new String[2];
zwert = (String[])it.next();
if (i==0){
if (zwert[1].equals("0")){
ztext = ztext+zwert[0];
}else{
ztext = ztext+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schuuml;ler";
}else{
ztext = ztext+" Schuuml;lern";
}
}
i++;
}else{
if (zwert[1].equals("0")){
ztext = ztext+"<br>nbsp;"+zwert[0];
}else{
ztext = ztext+"<br>nbsp;"+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schuuml;ler";
}else{
ztext = ztext+" Schuuml;lern";
}
}
}
}
-->
</table>
<!-- </fieldset> -->
<br>
</body>
</html>
I look forwar to hear from you...
regards martin
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt auch mit
gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php