Re: Perl in shared hosting environments

2011-09-22 Thread Tatsuhiko Miyagawa
On Wed, Sep 21, 2011 at 8:49 AM, Randal L. Schwartz
 wrote:
>>>>>> "Dirk" == Dirk Koopman  writes:
>
>
> Dirk> Reason include:
>
> You forgot:
>
> * My application only requires content generation, and I can completely
>  ignore the other 14 phases of serving, because I don't want custom
>  redirects, authentication, authorization, mime interpretation, and/or
>  logging and other things written in the language of my choice: Perl.

This is where PSGI stands out. If you write a couple of mod_perl
handlers that work on these 14 phases of serving, then it will *only*
work with mod_perl, putting aside the fact that mod_perl1 handlers
won't work with mod_perl 2.

If you write PSGI middleware that does URL rewrites, authentication,
authorization, custom redirects, logging etc. as PSGI middleware, then
you can use that with ANY of PSGI supported web servers (Starman,
mod_perl, FastCGI etc.) and ANY of PSGI supported frameworks
(Catalyst, Dancer, Mojolicious etc.)

You can see the examples of middleware by searching on CPAN:
https://metacpan.org/search?q=plack%3A%3Amiddleware

> Seriously... *nothing* can compete with mod_perl as far as its reach
> into Apache.  *Nothing*.  With mod_perl, you can inject new behavior at
> every level of decision making that Apache does.  FastCGI is replacing
> just *one* of those 14 stages.

True: you can further its reach into *Apache* and nothing else.

-- 
Tatsuhiko Miyagawa



Re: Perl in shared hosting environments

2011-09-22 Thread Tatsuhiko Miyagawa
On Wed, Sep 21, 2011 at 4:04 AM, Dirk Koopman  wrote:

>> Check out Plack:
>>
>> http://plackperl.org/   - https://metacpan.org/module/Plack
>>
>>
>> http://blog.plackperl.org/2011/08/plack-basics-for-perl-websites-yapceu-2011.html
>>
>> Then you can switch between mod_perl / FastCGI / Starman / Twiggy to your
>> hearts content (we found Starman is REALLY fast).
>
> I am happy to be educated, but I found Plack introduced a load of
> dependencies that I did not want,

Plack is the reference implementation of PSGI handlers and middleware,
and it only has a few dependencies that could potentially be already
dependent in most cases if you're doing a web development in perl
(such as URI and LWP), but that mileage might vary.

The nice thing about Plack/PSGI however is that you *decouple* your
application code from the runtime environment such as mod_perl and
FastCGI - you write your application to target the PSGI interface,
then your app will run on any environments that supports PSGI (via
Plack handlers) such as mod_perl, FastCGI, Starman etc. with
absolutely zero changes.

> it *is* another layer which cannot help
> but reduce speed - which may not matter - but did to me at the time.

Switching to Plack/PSGI actually gives better performance in many
cases, particularly because PSGI uses Perl's native data types such as
hash reference, array reference instead of objects with hundreds of
method calls. Wrapping the interface with a nice OO is the job of
framework, not PSGI. It does *not* get in the way.

For my simple (and naive) Hello World application testing, Starman,
Starlet and Feersum web servers got a better performance than CGI.pm
(with mp2 wrapper) + mod_perl, for example.

http://www.reddit.com/r/perl/comments/h6qqr/the_psgi_is_the_limit/ has
an amusing comment thread about the performance and benefit of PSGI
over existing wrappers like CGI.pm, started by "shi4" - you might need
to click the thread to collapse to read the whole thread.

>>
>> You also get a lot (160+ modules) of nice middleware available.
>>
>
> More software to, at least potentially, "get in the way" or add unnecessary
> dependencies or unwanted constraints.

PSGI interface and middleware works consistent as an interface between
the application and servers. It doesn't get in the way.


-- 
Tatsuhiko Miyagawa



Re: Recommended hotels or crash place for LPW 2009?

2009-11-20 Thread Tatsuhiko Miyagawa
Thanks everyone for the great suggestions and offers. I've got an
offer from a person on this list who can let me stay for the three
nights during the conference and decided to stay there. Thank you
again.


-- 
Tatsuhiko Miyagawa


Recommended hotels or crash place for LPW 2009?

2009-11-17 Thread Tatsuhiko Miyagawa
Hi,

I'm looking for a place to stay for the upcoming London Perl Workshop,
can anyone recommend onlist or offlist a good hotel, or actually can
let me stay for 3 nights Dec 4-6 (or just partial of them would be
helpful too).

It'd be nice if the hotel has a good access to the venue, has a free
(or decent priced) WiFi and costs around 100 GBP/night. Small rooms
are fine. If you can let me stay, the only thing I can't stay with is
cats because i'm allergic.

Thanks for your help,

-- 
Tatsuhiko Miyagawa


Re: Maintainer needed for perlsphere.net

2009-10-03 Thread Tatsuhiko Miyagawa
On Sat, Oct 3, 2009 at 6:58 AM, Dave Cross  wrote:

>> Miyagawa pointed out this:
>>
>> http://cpandeps.cantrell.org.uk/?module=Plagger
>>
>> But surely that's not including all the optional modules? ;)

And they're truly optional and you don't need it to build Planet sites.

> Yeah, that's just the required modules. The recommended ones are a far
> longer (and esoteric) list.
>
> See http://cpansearch.perl.org/src/MIYAGAWA/Plagger-0.7.17/META.yml

But it still has more chances to successfully install than yours :)

http://cpandeps.cantrell.org.uk/?module=Plagger
http://cpandeps.cantrell.org.uk/?module=Perlanet



-- 
Tatsuhiko Miyagawa


Re: [OT] finding memory hungry bits of my code

2009-04-09 Thread Tatsuhiko Miyagawa
B::TerseSize might be what you want, and Apache2 has a Status handler
to enable that.
http://search.cpan.org/dist/mod_perl/docs/api/Apache2/Status.pod#StatusTerseSizeMainSummary

In a standalone script you can do:

use B::TerseSize;
use Devel::Symdump;

my $stab = Devel::Symdump->rnew("main");
my %size;
for my $package ("main", $stab->packages) {
my($subs, $opcount, $opsize) = B::TerseSize::package_size($package);
$size{$package} = $opsize;
}
for my $package (sort {$size{$b}<=>$size{$a}} keys %size) {
printf "%-24s %8d [KB]\n", $package, $size{$package}/1024;
}

to get the equivalent.

On Thu, Apr 9, 2009 at 4:31 AM, Edmund von der Burg
 wrote:
> I don't think that it is a memory leak - the size tends to remain
> constant after a few requests (it's a webapp - Catalyst under
> mod_perl).
>
-- 
Tatsuhiko Miyagawa


Re: XML::LibXML and HTML (in >=v1.67)

2009-04-01 Thread Tatsuhiko Miyagawa
On Wed, Apr 1, 2009 at 6:21 PM, Toby Wintermute  wrote:

> Thanks, Web::Scraper looks quite neat.
> However I want to avoid applications breaking on random CPAN module
> upgrades (as just happened with the XML::LibXML upgrade yesterday), so
> I might steer clear of it until it loses the big, bold warning about
> the API still being unstable.
> I'm sure you understand :)

The github master version just took it out the big fat warning a few
days ago, ready to be shipped to CPAN soon with more POD docs :)


-- 
Tatsuhiko Miyagawa


Re: XML::LibXML and HTML (in >=v1.67)

2009-04-01 Thread Tatsuhiko Miyagawa
On Wed, Apr 1, 2009 at 2:53 AM, Dave Cross  wrote:
>> I know that really one should escape the ampersand in those
>> circumstances, however real-world web-pages rarely do this.. And this
>> behaviour was tolerated in XML::LibXML 1.66, just not subsequent
>> versions.. but eh, maybe it's just the way I'm calling the parser?
>
> Sounds like XML::LibXML has fixed a bug. XML parsers are supposed to throw
> an exception when they encounter invalid XML.

The method we're talking about here is parse_*html*, and libxml2
continues parsing HTML with errors like this, and XML::LibXML has an
option (recover=>1) not to choke on that:

perldoc XML::LibXML::Parser

   Parsing HTML may cause problems, especially if the ampersand ('&') is
   used.  Such links cause the parser to throw errors. In
   such cases libxml2 still parses the entire document as there was no
   error ...  Such HTML documents should be parsed using the recover flag.


-- 
Tatsuhiko Miyagawa


Re: XML::LibXML and HTML (in >=v1.67)

2009-04-01 Thread Tatsuhiko Miyagawa
On Tue, Mar 31, 2009 at 10:45 PM, Toby Wintermute  wrote:
> The problem occurs when the html contains (the commonly used) & symbol
> within attributes, such as:
> 
>
> I know that really one should escape the ampersand in those
> circumstances, however real-world web-pages rarely do this.. And this
> behaviour was tolerated in XML::LibXML 1.66, just not subsequent
> versions.. but eh, maybe it's just the way I'm calling the parser?

XML::Liberal [1] exactly addresses issues like this, and it also got
broken with XML::LibXML 1.67 with its error format change but works
with 1.69_2 on CPAN.

> Alternatively.. what do YOU use to parse real-world websites that are
> often not totally valid?

I use my own Web::Scraper [2,3] to scrape stuff and it uses
HTML::TreeBuilder (and ::XPath) to build a DOM tree and runs XPath or
CSS selector against it. It's definitely slower than LibXML but can
deal with such broken HTML documents very well. If you really care
about performance there's also HTML::TreeBuilder::LibXML on github [4]
that is a drop-in replacement for H::TB::XPath but uses LibXML under
the hood.

Another option would be to filter out such XHTML errors with
HTML::Tidy before passing it to LibXML. It would be neat if you do
that cleanup only if libxml parsing fails even with recover_errors
etc. set.

[1] http://search.cpan.org/dist/XML-Liberal
[2] http://search.cpan.org/dist/Web-Scraper
[3] http://github.com/miyagawa/web-scraper
[4] http://github.com/tokuhirom/html--treebuilder--libxml

-- 
Tatsuhiko Miyagawa