Ashley Sheridan wrote:
> I've been thinking about this problem for a little while, and the thing
> is, I can think of ways of doing it, but they're not very nice, and I
> don't think they're going to be fast.
>
> Basically, I have a load of HTML formatted content in a database that
> get displayed onto the site. It's part of a rudimentary CMS.
>
> Currently, the titles for each article are displayed on a page, and each
> title links to the full article. However, that leaves me with a page
> which is essentially a list of links, and that's not ideal for SEO. What
> I wanted to do to enhance the page is to have a short excerpt of x
> number of words/characters beneath each article title. The idea being
> that search engines will find the page as more than a link farm, and
> visitors won't have to just rely on the title alone for the content.
>
> Here's the rub though. As the content is in HTML form, I can't just grab
> the first 100 characters and display them as that could leave an open
> tag without a closing one, potentially breaking the page. I could use
> strip_tags on the 100-character excerpt, but what if the excerpt itself
> broke a tag in half (i.e. <acronym title="something"> could become
> <acron )
>
> The only solutions I can see are:
>
>
> * retrieve the entire article, perform a strip_tags and then take
> the excerpt
> * use a regex inside of mysql to pull out only the text
>
>
> The thing is, neither of these seems particularly pretty, and I am sure
> there's a better way, but it's too early in the week for my brain to be
> fully functional I think!
>
> Does anyone have any ideas about what I could do, or do you think I'm
> seeing problems where there are none?
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
/**
* creates an abstract from any string, a nice one that stops at a full
* stop or end of a word betwen 140-180 chars.
*
*/
function createAbstract( $string )
{
$lines = explode( "\n" , $string );
if( count($lines) > 1 && strlen($lines[0]) > 140 ) {
$string = $lines[0];
}
if( strlen($string) < 180 ) return $string;
$string = substr( $string , 0 , 180);
$chars = str_split( $string );
$string = '';
foreach( $chars as $char ) {
$string .= $char;
if( $char == '.' && strlen($string) > 120 ) {
return $string;
}
}
$string = '';
foreach( $chars as $char ) {
$string .= $char;
if( $char == ' ' && strlen($string) > 140 ) {
return trim( $string ) . '...';
}
}
return $string;
}
/**
* given an html (or fragment) tidy in to usable html
* and strip back to text, new lines in tact
*
*/
function htmlToText( $html )
{
$html = str_replace( '&' , '&' , str_replace( '&' , '&' ,
$html ) );
$config = array(
'clean' => true,
'drop-proprietary-attributes' => true,
'output-xhtml' => true,
'show-body-only' => true,
'word-2000' => true,
'wrap' => '0'
);
$tidy = new tidy();
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
$html = tidy_get_output($tidy);
$text = str_replace( '&' , '&' , str_replace( '&' , '&' ,
$text ) );
return strip_tags($text);
}
using those two together should do it; they're pretty basic and could do
with a tidy, but gets the job done (you'll probably want to change the
140 chars to something different)
Best,
Nathan
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php