Re: htmlParser.className default value

Philippe Mouawad Sat, 05 Oct 2013 14:52:14 -0700

Created :
https://issues.apache.org/bugzilla/show_bug.cgi?id=55632


On Thu, Sep 26, 2013 at 11:05 PM, Philippe Mouawad <
[email protected]> wrote:

>
>
>
> On Thu, Sep 26, 2013 at 10:58 PM, sebb <[email protected]> wrote:
>
>> On 26 September 2013 21:48, Philippe Mouawad <[email protected]>
>> wrote:
>> > Hello,
>> > I really think this setting should be changed as
>> > HtmlParserHTMLParser is really catastrophic in terms of performance and
>> > memory use.
>> >
>> > Or at least a note should be added, but my preference goes to switching
>> to
>> > REGEXP which seems to be doing the job.
>>
>> I don't think we should change the default; it may well break test
>> plans as commenting out sections is a common practise.
>>
>
> Why not change the default and document that users can set the old parser
> to what it was ?
> Take a new comer, he won't read all documentation once, in my opinion,
> defaults should be the best options for performances.
>
> If users have issues with Regexp, we will have bugzillas and will fix
> them, they can provide the page for which parsing failed , as we already
> had a report on this, it's easy.
>
> While if we keep it like this, you will have users face OOM on high load
> tests because of this, and I am not sure they will report or if they do it
> could be much harder to find out it was due to this.
> And we will always have this "urban legend" about JMeter having OOM, which
> frankly is starting to upset me :-)
>
>
>> However by all means add a note to jmeter.properties and
>> component_reference
>>
>> > Regards
>> > Philippe
>> >
>> >
>> > On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <
>> [email protected]
>> >> wrote:
>> >
>> >> Hello,
>> >>
>> >> I made recently a Real world test which downloaded resources.
>> >> As the site started to slow down, I ended up having an OOM.
>> >>
>> >> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
>> >> majority was taken by DOM build by htmlparser.
>> >>
>> >> So I think Regexp is far more efficient on memory usage. But if you
>> say it
>> >> is a quick and dirty alternative then it's another point.
>> >>
>> >> I wonder if it would not be interesting to explore using JSOUP in a new
>> >> implementation.
>> >>
>> >> Regards
>> >> Philippe
>> >>
>> >>
>> >> On Sun, Mar 3, 2013 at 3:42 PM, sebb <[email protected]> wrote:
>> >>
>> >>> On 2 March 2013 19:42, Philippe Mouawad <[email protected]>
>> >>> wrote:
>> >>> > Hello,
>> >>> > I was wondering if there is any reason for htmlParser.className
>> default
>> >>> > value being
>> org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
>> >>> and
>> >>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>> >>> >
>> >>> > It seems to me the latter is much more efficient than the current
>> >>> default
>> >>> > value.
>> >>>
>> >>> I think one would need to benchmark that to see how much faster it is.
>> >>>
>> >>> > Any objection on changing to
>> >>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
>> >>>
>> >>> The Regex version does not take account of context, so will find
>> >>> references in comment sections.
>> >>>
>> >>> It was intended as a quick and dirty alternative.
>> >>>
>> >>> > --
>> >>> > Regards.
>> >>> > Philippe
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Cordialement.
>> >> Philippe Mouawad.
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Cordialement.
>> > Philippe Mouawad.
>>
>
>
>
> --
> Cordialement.
> Philippe Mouawad.
>
>
>


-- 
Cordialement.
Philippe Mouawad.

Re: htmlParser.className default value

Reply via email to