Re: How can I extract cell data (content surrounded by ) from a in HTML response?

Fri, 20 Nov 2009 07:45:12 -0800

Hello,

Thanks for your explanation.
In fact the HTML layout that I try to parse is stable and hardly subjected
to future change, that's why I need to parse it.

Now that I'm not goot at regex, I will use JMeter just to get the HTML
response from an https-based web site, and to store parsing results in java
objects like ArrayList.

So I created some Http request samplers, then attached a BeanShell
PostProcessor to it.
In the BeanShell script, I wrote some logic with dom w3c and jtidy API, and
now I can see the extracted cell contents by System.err.println() in my
BeanShell.

After that I had difficulties about JMeter variables usage. In my BeanShell
script I created ArrayList objects and stored extracted texts in them, and
put them into JMeter context:
                vars.put("responseList", responseList);
                vars.put("responseDateList", responseDateList);
http://old.nabble.com/file/p26443545/BeanShellPostProcessor.gif 

After having parsed my HTML response, I would need a ForEach Controller to
iterate on these List objects' elements (which are just an array of values
in selected <td> elements), and to issue JDBC request to store them in
database (or any other possible operations to send them out of JMeter).
http://old.nabble.com/file/p26443545/ForEachController.gif 

However I was unable to get a ForEach Controller operate on objects in vars.

What did I miss and what should I do to iterate on vars' content and run a
sampler on each value in the iteration?

With my best wishes,

Rosière


Deepak Shetty wrote:
> 
> Hi
> the regex you are using doesnt seem correct
> [^tr]
>  is any character that is not 't' or not 'r' it doesnt mean not the
> sequence
> tr.
> 
> Plus if you are getting multiple <tr> instead of 1 that you expect your
> regex is probably too greedy try replacing .* constructs with .*? or
> modify
> the regex
> 
> In any case XPath is as dependent on HTML structure as a Regex is (e.g.
> what
> if you move to a tableless layout)
> 
> 
> regards
> deepak
> 
> On Thu, Nov 19, 2009 at 8:17 AM, rosiere <[email protected]> wrote:
> 
>>
>> Hello,
>>
>> Thanks for your advice.
>>
>> I did applied case insensitive check: like this:
>>
>> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>>
>> However I still face problem. Now I capture all <tr> elements in a same
>> group instead of each <tr> element.
>>
>> I read in my jmeter.log these informations about matching:
>>
>> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex =
>> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> RegexExtractor:
>> Match found!
>> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> RegexExtractor:
>> Template piece #0 = 1
>> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> RegexExtractor:
>> Template piece #1 =
>> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex
>> Extractor
>> result =
>> <TD>....<TD>
>> <TR>...</TR>
>> ...
>> <TR>....</TR>
>> <TD>
>>
>>
>> As for alternatives, I did want to parse a HTML with org.w3c.dom api, but
>> dom methods like getElementsByTagName() are all case sensitive and may
>> not
>> be able to parse an HTML with both uppercase and lowercase tags.
>>
>> Besides, whenever the HTML page changes, I will have to rewrite my Java
>> code
>> based on dom api. So in order to minimize these unwanted effects on my
>> Java
>> code, I would still like to use regex, so that, whenever HTML structure
>> changes, I need only change the regex in JMeter but not my java code that
>> cosumes the extracted HTML portions.
>>
>>
>>
>> Deepak Shetty wrote:
>> >
>> > You should probably make the check case insensitive. but I agree with
>> sebb
>> > ,
>> > parsing html constructs with regex is a pain and breaks quite
>> frequently
>> > regards
>> > deepak
>> >
>> > On Wed, Nov 18, 2009 at 10:37 AM, Andre Arnold <[email protected]>
>> wrote:
>> >
>> >> sebb schrieb:
>> >> > On 18/11/2009, rosiere <[email protected]> wrote:
>> >> >
>> >> >>  Hello,
>> >> >>
>> >> >>  I found that JMeter's oro regex is somehow different from java's.
>> >> >>
>> >> >
>> >> > Yes.
>> >> >
>> >> > But not all that different; and neither is particularly well suited
>> to
>> >> > this task.
>> >> >
>> >> > The XPath Extractor will probably be much easier to use.
>> >> >
>> >> >
>> >>
>> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#XPath_Extractor
>> >> >
>> >> > This was discussed on the mailing list earlier this year.
>> >> >
>> >> >
>> >> >>  Now I need to iterate on different <tr> that matches a pattern,
>> then:
>> >> >>   capture all the <td> elements within each <tr> , and select the
>> 8th
>> >> and 9th
>> >> >>  <td>.
>> >> >>
>> >> >>  Since many <tr> elements appears in the HTML response, in order to
>> do
>> >> this I
>> >> >>  have to capture <tr> line by line without including two lines in a
>> >> same
>> >> >>  group:
>> >> >>
>> >> >>  so I should avoid capturing  continuous <tr>..</tr><tr>..</tr>
>> into
>> >> the
>> >> same
>> >> >>  group.
>> >> >>
>> >> >>  By writing (?is)<tr\sclass="tgDataLine.*1\)\" >(.*)</tr> I will
>> >> capture
>> >> only
>> >> >>  one group that contains many real <tr> elements
>> >> >>  So what should I write in the regex?
>> >> >>
>> >> >>
>> >> If you still need a pattern to match your needs.
>> >> I found that the following matches your the number you wanted and the
>> >> following column value.
>> >>
>> >> reference: ref
>> >> pattern:     (?s)<TR.+?<TD.+?>([1-9|0]+?)</TD.+?<TD.+?>(.+?)</TD>
>> >> template:  $1$$2$
>> >> match :     1
>> >>
>> >> In ref_g1 you'll find the number.
>> >> In ref_g2 you'll find the following column value.
>> >>
>> >> To catch all the matches you need to increment a counter for the match
>> >> and check wether there is another one or not.
>> >>
>> >> Your Testplan should look sth like this:
>> >>
>> >> -while controller (${__javaScript("${ref}"!="error")}  )
>> >> --counter (from 1 with increment 1 for the regex match value)
>> >> --Http Sampler (to get your site)
>> >> ---RegEx Extractor (as shown above)
>> >> --if controller( same as while controller--> ${ref}"!="error" )
>> >> ---your jdbc action (use ref_g1 & ref_g2)
>> >>
>> >>
>> >> Hope I got your problem right.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26421379.html
>> Sent from the JMeter - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26443545.html
Sent from the JMeter - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to