Hi,
        Today I noticed that yahootopstories.pm was broken.  I've made a
quick fix.  It has the same interface as the old handler, so nothing in
your template files needs to be changed.  

Caveat: This handler works by extracting content from date-specific
pages... Look at the handler and you'll know what i mean... This might
lead to the handler getting no content when run just after midnight and if
no news exist for the current day...For now, Health news don't work
because on yahoo, the latest health news was for 4/10/2000 and this
handler looks on a page specifically for today (4/12/2000)...

Anyways, here it is.. i've tested it with my site but more people
need to test it...

Please notify me of any changes/fixes because this is a pretty important
part of my site.

Abhik.
# -*- mode: Perl; -*-

# NOTE: This is a modified version of the handler,
# modified by Abhik Shah ([EMAIL PROTECTED]).
#
# It was modified on 4/12/2000 because the old handler
# no longer worked... I fixed it to meet my own needs.
# I tried to make sure that it outputted data the same way as
# the old handler (to make it compatible) but I cannot guarantee it.

# NOTE: Health news may not work

# AUTHOR: Ryan Lathouwers (modified by Abhik Shah)
# EMAIL: [EMAIL PROTECTED] (Abhik: [EMAIL PROTECTED])
# ONE LINE DESCRIPTION: Top Story Headlines from Yahoo
# URL: http://dailynews.yahoo.com/headlines/
# TAG SYNTAX:
# <input name=yahootopstories source=W>
#   Returns an array of hashes.  The hashes include:
#     url, headlines, blurb
#   W=reuters:       Reuters Top Stories (default)
#     ap:            Associated Press Top Stories
#     business:      Reuters Business
#     tech:          Reuters Technology
#     politics:      Reuters Politics
#     world:         Reuters World
#     ent:           AP Entertainment
#     sports:        Reuters Sports
#     science:       Reuters Science
#     health:        Reuters Health
#     apUS           Associated Press U.S. Top Stories
#
# EXAMPLE:
#   Basic:  Returns reuters news headlines in two column list
#     <input name=yahootopstories>
#   More Complex: Returns ap news as it looks on yahoo.
#       <input name=yahootopstories source=ap>
#       <filter name=map filter=hash2string format='
#          <a href="%{url}"><b>%{headline}</b></a><br>
#          <font size="-1">%{blurb}</font><br><br>'>
#       <filter name=limit number=10>
#       <output name=array numcols=1 prefix='' suffix=''>
#
# LICENSE: artistic
# NOTES:

package NewsClipper::Handler::Acquisition::yahootopstories;

use strict;
use Date::Format;
use NewsClipper::Handler;
use NewsClipper::Types;
use vars qw( @ISA $VERSION );
@ISA = qw(NewsClipper::Handler);

# DEBUG for this package is the same as the main.
use constant DEBUG => main::DEBUG;

use NewsClipper::AcquisitionFunctions qw( &GetHtml );

$VERSION = 0.6;

# ------------------------------------------------------------------------------

# This function is used to get the raw data from the URL.
sub Get
{
  my $self = shift;
  my $attributes = shift;
  my $url_get;

  $attributes->{source} = 'reuters' unless defined $attributes->{source};
 
 my $d = time2str('%Y%m%d',time);

  my %sourceMap = (
    'reuters'  => "nm/$d/ts/",
    'ap'       => "ap/$d/ts/",
    'business' => "nm/$d/bs/",
    'tech'     => "nm/$d/tc/",
    'politics' => "nm/$d/pl/",
    'world'    => "nm/$d/wl/",
    'ent'      => "ap/$d/en/",
    'sports'   => "nm/$d/sp/",
    'science'  => "nm/$d/sc/",
    'apUS'     => "ap/$d/us/",
    'health'   => "nm/$d/hl/"
  );

  $url_get = 'http://dailynews.yahoo.com/h/' . $sourceMap{$attributes->{source}};

  my $data = &GetHtml($url_get,'^','$');
  #my $data = &GetHtml($url,'/font>\n(?:</small>\n)?</b>\n</center>','Earlier 
Stories');

  return undef unless defined $data;

  #@$data = grep {m#http://dailynews.yahoo.com/headlines/.*story.html#} @$data;
  
  my @grabbedData;
  my ($url, $headline, $blurb);
  my @items = split /<a/, $$data;
  shift @items;
  #shift @items;

  foreach my $item (@items)
  {
    ($url, $headline) = $item =~ /href="(.*?)"><b>(.*?)<\/b><\/a>/si;
    ($blurb) = $item =~ /<font.*?>(.*?)<\/font>/si;

    push @grabbedData, {
      'url'      => $url,
      'headline' => $headline,
      'blurb'    => $blurb
    };
  }
  MakeSubtype('ArrayOfYTSHash','ArrayOfHash');
  bless \@grabbedData,'ArrayOfYTSHash';
  return \@grabbedData;

}

# ------------------------------------------------------------------------------

sub GetDefaultHandlers
{
  my $self = shift;
  my $inputAttributes = shift;

  my @returnVal = (
    {'name' => 'map', 'filter' => 'hash2string','format' => '<a 
href="%{url}">%{headline}</a>' },
    {'name' => 'limit','number' => '10'},
    {'name' => 'array'},
  );

  return @returnVal;
}

sub GetUpdateTimes
{
  return ['always']
}



1;

Reply via email to