Re: How can I extract cell data (content surrounded by <td></td>) from a <table> in HTML response?

Hello,

Thanks for your advice.


I did applied case insensitive check: like this:

(?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>

However I still face problem. Now I capture all <tr> elements in a same
group instead of each <tr> element.

I read in my jmeter.log these informations about matching:

2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex =
(?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr> 
2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: RegexExtractor:
Match found! 
2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: RegexExtractor:
Template piece #0 = 1 
2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: RegexExtractor:
Template piece #1 =  
2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex Extractor
result =
<TD>....<TD>
<TR>...</TR>
...
<TR>....</TR>
<TD>


As for alternatives, I did want to parse a HTML with org.w3c.dom api, but
dom methods like getElementsByTagName() are all case sensitive and may not
be able to parse an HTML with both uppercase and lowercase tags. 

Besides, whenever the HTML page changes, I will have to rewrite my Java code
based on dom api. So in order to minimize these unwanted effects on my Java
code, I would still like to use regex, so that, whenever HTML structure
changes, I need only change the regex in JMeter but not my java code that
cosumes the extracted HTML portions.



Deepak Shetty wrote:
> 
> You should probably make the check case insensitive. but I agree with sebb
> ,
> parsing html constructs with regex is a pain and breaks quite frequently
> regards
> deepak
> 
> On Wed, Nov 18, 2009 at 10:37 AM, Andre Arnold <[email protected]> wrote:
> 
>> sebb schrieb:
>> > On 18/11/2009, rosiere <[email protected]> wrote:
>> >
>> >>  Hello,
>> >>
>> >>  I found that JMeter's oro regex is somehow different from java's.
>> >>
>> >
>> > Yes.
>> >
>> > But not all that different; and neither is particularly well suited to
>> > this task.
>> >
>> > The XPath Extractor will probably be much easier to use.
>> >
>> >
>> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#XPath_Extractor
>> >
>> > This was discussed on the mailing list earlier this year.
>> >
>> >
>> >>  Now I need to iterate on different <tr> that matches a pattern, then:
>> >>   capture all the <td> elements within each <tr> , and select the 8th
>> and 9th
>> >>  <td>.
>> >>
>> >>  Since many <tr> elements appears in the HTML response, in order to do
>> this I
>> >>  have to capture <tr> line by line without including two lines in a
>> same
>> >>  group:
>> >>
>> >>  so I should avoid capturing  continuous <tr>..</tr><tr>..</tr> into
>> the
>> same
>> >>  group.
>> >>
>> >>  By writing (?is)<tr\sclass="tgDataLine.*1\)\" >(.*)</tr> I will
>> capture
>> only
>> >>  one group that contains many real <tr> elements
>> >>  So what should I write in the regex?
>> >>
>> >>
>> If you still need a pattern to match your needs.
>> I found that the following matches your the number you wanted and the
>> following column value.
>>
>> reference: ref
>> pattern:     (?s)<TR.+?<TD.+?>([1-9|0]+?)</TD.+?<TD.+?>(.+?)</TD>
>> template:  $1$$2$
>> match :     1
>>
>> In ref_g1 you'll find the number.
>> In ref_g2 you'll find the following column value.
>>
>> To catch all the matches you need to increment a counter for the match
>> and check wether there is another one or not.
>>
>> Your Testplan should look sth like this:
>>
>> -while controller (${__javaScript("${ref}"!="error")}  )
>> --counter (from 1 with increment 1 for the regex match value)
>> --Http Sampler (to get your site)
>> ---RegEx Extractor (as shown above)
>> --if controller( same as while controller--> ${ref}"!="error" )
>> ---your jdbc action (use ref_g1 & ref_g2)
>>
>>
>> Hope I got your problem right.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26421379.html
Sent from the JMeter - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to