Re: Using __DATA__ with XML::LibXML

2017-06-23 Thread khalil zakaria Zemmoura
Hi,

I didn't understand what you did with the package that you defined
'XML_Check'

the use of the  "SPECIAL LITERAL"  _DATA_  is not clear since i didn't see
any usage for it in the 'XML_Check' package. I mean you didn't use in
it.

I didn't try running the script but i can give you some hints to work with:

First:

Did you try to print the lines inside the  filehandler.

while () {
  print;
}

to see if you are accessing it.

Second:

if you are importing the package in a script for example, the TOKENS after
the data in the package 'XML_Check' are accessible via the XML_Check::DATA
filehandler according to the documentation.

"  Text after __DATA__ may be read via the filehandle "PACKNAME::DATA",
where "PACKNAME" is
   the package that was current when the __DATA__token was encountered.
"

maybe this should work.

Third:

if you use _DATA_ section in your main script, you can use it as you did.

you can also reference it as 

hope that helps

Good luck

*Khalil Zakaria Zemmoura*
*Visiteur Médical EST*
*Laboratoire NOVOMEDIS*

On Thu, Jun 22, 2017 at 10:27 PM, SSC_perl  wrote:

> I think I'm losing it.  I'm trying to do something simple here but
> I can't get it to work.
>
> I'm using XML::LibXML to verify XML sitemaps and it's working fine
> with an external XSD file.  However, I'd like to add the schema to a
> __DATA__ section so that I only need a single file.  However, no matter
> what I do, I get error messages***.  I've tried the following:
>
> my @data = ;
> my $data = join ('', @data);
> my $schema = XML::LibXML::Schema->new(string => $data);
> -
> my $schema = XML::LibXML::Schema->new(string => *DATA);
> -
> my $schema = XML::LibXML::Schema->new(string => \*DATA);
>
>
> but the only thing that works is:
>
> my $schema_file = '/home/user/cron/sitemaps/xsd-schema.xsd';
> my $schema = XML::LibXML::Schema->new(location => $schema_file);
>
>
> Why can't I get the DATA solution to work?
>
> Thanks,
> Frank
>
>
> *** Some of the error messages I've seen:
>
> Schemas parser error : Failed to parse the XML resource 'in_memory_buffer'.
>
> Entity: line 1: parser error : Start tag expected, '<' not found
>
> ---
>
> The full script is as follows:
>
> package XML_Check;
>
> use XML::LibXML;
> use FindBin qw($Bin);
> use lib "$Bin/../../perl/Modules";
> require EmailSender;
>
> #
> ## Load YAML Config File 
> #
> use YAML qw(LoadFile);
> my $config = LoadFile('/home/user/conf/config.yaml');
> my $to_address   = $config->{'email'}{'report_to'};
> my $from_address = $config->{'email'}{'report_from'};
> ##
>
> sub verify_xml {
> my $document = shift;
> my $schema_file = '/home/user/cron/sitemaps/xsd-schema.xsd';
> my $schema = XML::LibXML::Schema->new(location => $schema_file);
>
> my $parser = XML::LibXML->new;
> my $doc= $parser->parse_file($document);
>
> eval { $schema->validate($doc) };
> if ($@) {
> my $subject = 'XML Sitemap Error';
> my $message = "There was an error in the $document sitemap.";
> EmailSender::send_mail($subject, $message, $to_address,
> $from_address, 'text/plain');
> }
>
> return $@;
> }
>
> 1;
>
> __DATA__
> http://www.w3.org/2001/XMLSchema; xmlns="
> http://www.sitemaps.org/schemas/sitemap/0.9; targetNamespace="http://www.
> sitemaps.org/schemas/sitemap/0.9" elementFormDefault="qualified">
> 
> 
> XML Schema for Sitemap files. Last Modifed 2008-03-26
> 
> 
> 
> 
> 
> Container for a set of up to 50,000 document elements. This is the root
> element of the XML file.
> 
> 
> 
> 
>  processContents="strict"/>
> 
> 
> 
> 
> 
> 
> 
> Container for the data needed to describe a document to crawl.
> 
> 
> 
> 
> 
> 
> 
>  processContents="strict"/>
> 
> 
> 
> 
> 
> REQUIRED: The location URI of a document. The URI must conform to RFC 2396
> (http://www.ietf.org/rfc/rfc2396.txt).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> OPTIONAL: The date the document was last modified. The date must conform
> to the W3C DATETIME format (http://www.w3.org/TR/NOTE-datetime). Example:
> 2005-05-10 Lastmod may also contain a timestamp. Example:
> 2005-05-10T17:33:30+08:00
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> OPTIONAL: Indicates how frequently the content at a particular URL is
> likely to change. The value "always" should be used to describe documents
> that change each time they are accessed. The value "never" should be used
> to describe archived URLs. Please note that web crawlers may not
> necessarily crawl pages marked "always" more often. Consider this element
> as a friendly suggestion and not a command.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> OPTIONAL: The priority of a particular URL relative to other pages on the
> same site. The value for this element is a number between 0.0 and 1.0 where
> 0.0 identifies the 

Using __DATA__ with XML::LibXML

2017-06-22 Thread SSC_perl
I think I'm losing it.  I'm trying to do something simple here but I 
can't get it to work.

I'm using XML::LibXML to verify XML sitemaps and it's working fine with 
an external XSD file.  However, I'd like to add the schema to a __DATA__ 
section so that I only need a single file.  However, no matter what I do, I get 
error messages***.  I've tried the following:

my @data = ;
my $data = join ('', @data);
my $schema = XML::LibXML::Schema->new(string => $data);
-
my $schema = XML::LibXML::Schema->new(string => *DATA);
-
my $schema = XML::LibXML::Schema->new(string => \*DATA);


but the only thing that works is:

my $schema_file = '/home/user/cron/sitemaps/xsd-schema.xsd';
my $schema = XML::LibXML::Schema->new(location => $schema_file);


Why can't I get the DATA solution to work?

Thanks,
Frank


*** Some of the error messages I've seen:

Schemas parser error : Failed to parse the XML resource 'in_memory_buffer'.

Entity: line 1: parser error : Start tag expected, '<' not found

---

The full script is as follows:

package XML_Check;

use XML::LibXML;
use FindBin qw($Bin);
use lib "$Bin/../../perl/Modules";
require EmailSender;

#
## Load YAML Config File 
#
use YAML qw(LoadFile);
my $config = LoadFile('/home/user/conf/config.yaml');
my $to_address   = $config->{'email'}{'report_to'};
my $from_address = $config->{'email'}{'report_from'};
##

sub verify_xml {
my $document = shift;
my $schema_file = '/home/user/cron/sitemaps/xsd-schema.xsd';
my $schema = XML::LibXML::Schema->new(location => $schema_file);

my $parser = XML::LibXML->new;
my $doc= $parser->parse_file($document);

eval { $schema->validate($doc) };
if ($@) {
my $subject = 'XML Sitemap Error';
my $message = "There was an error in the $document sitemap.";
EmailSender::send_mail($subject, $message, $to_address, $from_address, 
'text/plain');
}

return $@;
}

1;

__DATA__
http://www.w3.org/2001/XMLSchema; 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9; 
targetNamespace="http://www.sitemaps.org/schemas/sitemap/0.9; 
elementFormDefault="qualified">


XML Schema for Sitemap files. Last Modifed 2008-03-26





Container for a set of up to 50,000 document elements. This is the root element 
of the XML file.












Container for the data needed to describe a document to crawl.













REQUIRED: The location URI of a document. The URI must conform to RFC 2396 
(http://www.ietf.org/rfc/rfc2396.txt).










OPTIONAL: The date the document was last modified. The date must conform to the 
W3C DATETIME format (http://www.w3.org/TR/NOTE-datetime). Example: 2005-05-10 
Lastmod may also contain a timestamp. Example: 2005-05-10T17:33:30+08:00














OPTIONAL: Indicates how frequently the content at a particular URL is likely to 
change. The value "always" should be used to describe documents that change 
each time they are accessed. The value "never" should be used to describe 
archived URLs. Please note that web crawlers may not necessarily crawl pages 
marked "always" more often. Consider this element as a friendly suggestion and 
not a command.















OPTIONAL: The priority of a particular URL relative to other pages on the same 
site. The value for this element is a number between 0.0 and 1.0 where 0.0 
identifies the lowest priority page(s). The default priority of a page is 0.5. 
Priority is used to select between pages on your site. Setting a priority of 
1.0 for all URLs will not help you, as the relative priority of pages on your 
site is what will be considered.








--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/