I have written a new module, WWW::Patent::Page, and
propose to submit it to CPAN. Your comments would be
appreciated.
Does the name seem reasonable? I am happy to take
suggestions. I think it is reasonable to have a
"Patent" namespace in WWW, since much patent
information is available on the WWW. For example,
searches of the prior art, patent family
relationships, patent applications via XML, etc. With
a namespace, related modules may be grouped easily.
One can imagine future modules like
"WWW::Patent::Apply", WWW::Patent::Family", or
WWW::Patent::Search" for interacting with various web
services.
WWW::Patent::Page is alpha software- my first module,
and my intent is to see if the perl community has any
interest in the idea. It is rough around the edges,
but passes what tests it has.
The module provides a consistent way to obtain pages
of patent documents from various patent offices that
make them available on the WWW. Typically, doing this
is relatively easy by hand, page by page, but takes a
bit of work if you want to do automate it effectively
for many pages or documents. The offices typically
make it hard to get the whole document, presumably
because supplying that is one source of revenue.
>From this primitive module, users can stitch together
tiff or PDF into multipage documents by whatever
method they prefer.
The module uses submodules, specific to separate
patent offices, and comes with working examples for
the USPTO and EPO, which between them supply granted
patents in html and tiff (USPTO) and pdf (US, EP, and
much of the world...). Hopefully, other interested
users will create new or improved submodules and feed
them back into the distribution.
For casual users, this module should simplify life.
Abusive users will likely find their IP address banned
by the patent office being spidered.
Here is the documentation as it now stands:
NAME
WWW::Patent::Page - retrieve a patent page (e.g.
from United States
Patent and Trademark Office (USPTO) website or the
European Patent
Office (ESPACE_EP). )
SYNOPSIS
Please see the test suite for working examples.
The following is not
guaranteed to be working or up-to-date.
use WWW::Patent::Page;
my $patent_document = WWW::Patent::Page->new();
# new object
my $document1 =
$patent_document->provide_doc('6,123,456');
# defaults: office => 'USPTO',
# country => 'US',
# format => 'htm',
# page => '1', #
typically htm IS "1" page
# modules => qw/ us ep / ,
my $document2 =
$patent_document->provide_doc('US_6_123_456',
office => 'ESPACE_EP' ,
format => 'tif',
page => 2 ,
);
my $pages_known =
$patent_document->pages_available( # e.g. TIFF
document=> '6 123 456',
);
DESCRIPTION
Intent: Use public sources to retrieve patent
documents such as
TIFF images of patent pages, html of patents,
pdf, etc.
Expandable for your office of interest by
writing new submodules..
Alpha release by newbie to find if there is any
interest
USAGE
See also SYNOPSIS above
Standard process for building & installing
modules:
perl Build.PL
./Build
./Build test
./Build install
Examples of use:
$patent_document = WWW::Patent::Page->new(
doc_id =>
'US6,654,321(B2)issued_2_Okada',
office => 'ESPACE_EP' ,
format => 'tif',
page => 2 ,
agent => 'Mozilla/5.0
(Windows; U; Windows NT 5.0; en-US; rv:1.4b)
Gecko/20030516 Mozilla Firebird/0.6',
);
# 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE
6.0; Windows NT
5.1)',
# 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U;
Windows NT 5.0; en-US;
rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6',
# 'Mac Safari' => 'Mozilla/5.0 (Macintosh; U; PPC
Mac OS X; en-us)
AppleWebKit/85 (KHTML, like Gecko) Safari/85',
# 'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC
Mac OS X Mach-O;
en-US; rv:1.4a) Gecko/20030401',
# 'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux
i686; en-US; rv:1.4)
Gecko/20030624',
# 'Linux Konqueror' => 'Mozilla/5.0 (compatible;
Konqueror/3; Linux)',
my %attributes =
$patent_document->get_patent('all'); # hash of all
my $document_id =
$patent_document->get_patent('doc_id');
# US6,654,321(B2)issued_2_Okada
my $office_used =
$patent_document->get_patent('office'); # ep
my $country_used =
$patent_document->get_patent('country'); #US
my $doc_id_used =
$patent_document->get_patent('doc_id'); # 6654321
my $page_used =
$patent_document->get_patent('page'); # 2
my $kind_used =
$patent_document->get_patent('kind'); # B2
my $comment_used =
$patent_document->get_patent('comment'); #
issued_2_Okada
my $format_used =
$patent_document->get_patent('format'); #tif
my $pages_total =
$patent_document->get_patent('pages_available'); #
101
my $terms_and_conditions =
$patent_document->terms('us'); # and conditions
my $document =
$patent_document->get_patent('document'); # the loot
BUGS
Pre-alpha release, to gauge whether the perl
community has any interest.
Code contributions, suggestions, and critiques are
welcome.
Error handling is undeveloped.
By definition, a non-trivial program contains
bugs.
For United States Patents (US) via the USPTO (us),
the 'kind' is ignored
in method provide_doc
SUPPORT
Yes, please. Checks are best. Or email me at
[EMAIL PROTECTED] to
arrange fund transfers.
AUTHOR
Wanda B. Anon
[EMAIL PROTECTED]
COPYRIGHT
This program is free software; you can
redistribute it and/or modify it
under the same terms as Perl itself.
The full text of the license can be found in the
LICENSE file included
with this module.
ACKNOWLEDGEMENTS
Andy Lester for WWW::Mechanize, that got me
thinking,
The authors of Finance::Quote, which served as an
example of providing
submodules,
Erik Oliver for patentmailer, serving as an
example of getting patent
documents,
Howard P. Katseff of AT&T Laboratories for wsp.pl,
version 2, a proxy
that speaks LWP and understands proxies,
and of course Larry and Randal and the gang.
SEE ALSO
perl(1).
Subroutine _countries_known()
Usage : internal method only
Purpose : list all entities that could give a
patent
Returns : ref to a hash with keys of
abbreviations and values of entities (usually a
country) ...
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com