On Sat, Apr 3, 2010 at 8:29 AM, tedd <t...@sperling.com> wrote:

> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a question. The
> question is noted in the HTML DOM like so:
>
> <p class="question">
>  Who is Roger Rabbit?
> </p>
>
> My question is -- how can I extract the string "Who is Roger Rabbit?" from
> each page using php? You see, I want to store the questions in a database
> without having to re-type, or cut/paste, each one.
>

if the files are html on the server then it should be easy to loop over each
one, loading the markup into memory and searching for what you want.  id go
for xpath myself; i tend to always start there and fall back to regex since
xpath & xsl are so much cleaner for dealing w/ markup.

anyway heres the demo
--------------
tedd.html
--------------
<html>
 <div>
   sadfasdf
 </div>
 <h1>hello</h1>
 <p class="question">
  Who is Roger Rabbit?
 </p>
 <h2>more stuff</h2>
 <p class="question">
  Who is Roger Rabbit?
 </p>
</html>

---------------------
transform.php
---------------------
<?php
// here is where you load a single file or change to iterate over a
// directory of files
$oDomDoc = DOMDocument::loadHTMLFile('./tedd.html');

// here is where you search for the question sections of each file
$oDomXpath = new DOMXPath($oDomDoc);
$oNodeList = $oDomXpath->query("//p...@class='question']");

// here is where you extract the question sections of each file
foreach($oNodeList as $oDomNode)
var_dump($oDomNode->nodeValue);


should be trivial to expand that to work w/ multiple files.



> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>

tedd, are you slipping?  i thought you were searching by the class
attribute, lol.

-nathan

Reply via email to