Re: How can I extract cell data (content surrounded by <td></td>) from a <table> in HTML response?

On 20/11/2009, rosiere <[email protected]> wrote:
>
>  Hello,
>
>  Thanks for your explanation.
>  In fact the HTML layout that I try to parse is stable and hardly subjected
>  to future change, that's why I need to parse it.
>
>  Now that I'm not goot at regex, I will use JMeter just to get the HTML
>  response from an https-based web site, and to store parsing results in java
>  objects like ArrayList.
>
>  So I created some Http request samplers, then attached a BeanShell
>  PostProcessor to it.
>  In the BeanShell script, I wrote some logic with dom w3c and jtidy API, and
>  now I can see the extracted cell contents by System.err.println() in my
>  BeanShell.


You could have saved yourself some work by using the XPath Extractor...

>  After that I had difficulties about JMeter variables usage. In my BeanShell
>  script I created ArrayList objects and stored extracted texts in them, and
>  put them into JMeter context:
>                 vars.put("responseList", responseList);
>                 vars.put("responseDateList", responseDateList);
>  http://old.nabble.com/file/p26443545/BeanShellPostProcessor.gif
>
>  After having parsed my HTML response, I would need a ForEach Controller to
>  iterate on these List objects' elements (which are just an array of values
>  in selected <td> elements), and to issue JDBC request to store them in
>  database (or any other possible operations to send them out of JMeter).
>  http://old.nabble.com/file/p26443545/ForEachController.gif
>
>  However I was unable to get a ForEach Controller operate on objects in vars.
>
>  What did I miss and what should I do to iterate on vars' content and run a
>  sampler on each value in the iteration?
>
>  With my best wishes,
>
>  Rosière
>
>
>
>  Deepak Shetty wrote:
>  >
>  > Hi
>  > the regex you are using doesnt seem correct
>  > [^tr]
>  >  is any character that is not 't' or not 'r' it doesnt mean not the
>  > sequence
>  > tr.
>  >
>  > Plus if you are getting multiple <tr> instead of 1 that you expect your
>  > regex is probably too greedy try replacing .* constructs with .*? or
>  > modify
>  > the regex
>  >
>  > In any case XPath is as dependent on HTML structure as a Regex is (e.g.
>  > what
>  > if you move to a tableless layout)
>  >
>  >
>  > regards
>  > deepak
>  >
>  > On Thu, Nov 19, 2009 at 8:17 AM, rosiere <[email protected]> wrote:
>  >
>  >>
>  >> Hello,
>  >>
>  >> Thanks for your advice.
>  >>
>  >> I did applied case insensitive check: like this:
>  >>
>  >> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>  >>
>  >> However I still face problem. Now I capture all <tr> elements in a same
>  >> group instead of each <tr> element.
>  >>
>  >> I read in my jmeter.log these informations about matching:
>  >>
>  >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex =
>  >> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>  >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>  >> RegexExtractor:
>  >> Match found!
>  >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>  >> RegexExtractor:
>  >> Template piece #0 = 1
>  >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>  >> RegexExtractor:
>  >> Template piece #1 =
>  >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex
>  >> Extractor
>  >> result =
>  >> <TD>....<TD>
>  >> <TR>...</TR>
>  >> ...
>  >> <TR>....</TR>
>  >> <TD>
>  >>
>  >>
>  >> As for alternatives, I did want to parse a HTML with org.w3c.dom api, but
>  >> dom methods like getElementsByTagName() are all case sensitive and may
>  >> not
>  >> be able to parse an HTML with both uppercase and lowercase tags.
>  >>
>  >> Besides, whenever the HTML page changes, I will have to rewrite my Java
>  >> code
>  >> based on dom api. So in order to minimize these unwanted effects on my
>  >> Java
>  >> code, I would still like to use regex, so that, whenever HTML structure
>  >> changes, I need only change the regex in JMeter but not my java code that
>  >> cosumes the extracted HTML portions.
>  >>
>  >>
>  >>
>  >> Deepak Shetty wrote:
>  >> >
>  >> > You should probably make the check case insensitive. but I agree with
>  >> sebb
>  >> > ,
>  >> > parsing html constructs with regex is a pain and breaks quite
>  >> frequently
>  >> > regards
>  >> > deepak
>  >> >
>  >> > On Wed, Nov 18, 2009 at 10:37 AM, Andre Arnold <[email protected]>
>  >> wrote:
>  >> >
>  >> >> sebb schrieb:
>  >> >> > On 18/11/2009, rosiere <[email protected]> wrote:
>  >> >> >
>  >> >> >>  Hello,
>  >> >> >>
>  >> >> >>  I found that JMeter's oro regex is somehow different from java's.
>  >> >> >>
>  >> >> >
>  >> >> > Yes.
>  >> >> >
>  >> >> > But not all that different; and neither is particularly well suited
>  >> to
>  >> >> > this task.
>  >> >> >
>  >> >> > The XPath Extractor will probably be much easier to use.
>  >> >> >
>  >> >> >
>  >> >>
>  >> 
> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#XPath_Extractor
>  >> >> >
>  >> >> > This was discussed on the mailing list earlier this year.
>  >> >> >
>  >> >> >
>  >> >> >>  Now I need to iterate on different <tr> that matches a pattern,
>  >> then:
>  >> >> >>   capture all the <td> elements within each <tr> , and select the
>  >> 8th
>  >> >> and 9th
>  >> >> >>  <td>.
>  >> >> >>
>  >> >> >>  Since many <tr> elements appears in the HTML response, in order to
>  >> do
>  >> >> this I
>  >> >> >>  have to capture <tr> line by line without including two lines in a
>  >> >> same
>  >> >> >>  group:
>  >> >> >>
>  >> >> >>  so I should avoid capturing  continuous <tr>..</tr><tr>..</tr>
>  >> into
>  >> >> the
>  >> >> same
>  >> >> >>  group.
>  >> >> >>
>  >> >> >>  By writing (?is)<tr\sclass="tgDataLine.*1\)\" >(.*)</tr> I will
>  >> >> capture
>  >> >> only
>  >> >> >>  one group that contains many real <tr> elements
>  >> >> >>  So what should I write in the regex?
>  >> >> >>
>  >> >> >>
>  >> >> If you still need a pattern to match your needs.
>  >> >> I found that the following matches your the number you wanted and the
>  >> >> following column value.
>  >> >>
>  >> >> reference: ref
>  >> >> pattern:     (?s)<TR.+?<TD.+?>([1-9|0]+?)</TD.+?<TD.+?>(.+?)</TD>
>  >> >> template:  $1$$2$
>  >> >> match :     1
>  >> >>
>  >> >> In ref_g1 you'll find the number.
>  >> >> In ref_g2 you'll find the following column value.
>  >> >>
>  >> >> To catch all the matches you need to increment a counter for the match
>  >> >> and check wether there is another one or not.
>  >> >>
>  >> >> Your Testplan should look sth like this:
>  >> >>
>  >> >> -while controller (${__javaScript("${ref}"!="error")}  )
>  >> >> --counter (from 1 with increment 1 for the regex match value)
>  >> >> --Http Sampler (to get your site)
>  >> >> ---RegEx Extractor (as shown above)
>  >> >> --if controller( same as while controller--> ${ref}"!="error" )
>  >> >> ---your jdbc action (use ref_g1 & ref_g2)
>  >> >>
>  >> >>
>  >> >> Hope I got your problem right.
>  >> >>
>  >> >> ---------------------------------------------------------------------
>  >> >> To unsubscribe, e-mail: [email protected]
>  >> >> For additional commands, e-mail: [email protected]
>  >> >>
>  >> >>
>  >> >
>  >> >
>  >>
>  >> --
>  >> View this message in context:
>  >> 
> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26421379.html
>  >> Sent from the JMeter - User mailing list archive at Nabble.com.
>  >>
>  >>
>  >> ---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: [email protected]
>  >> For additional commands, e-mail: [email protected]
>  >>
>  >>
>  >
>  >
>
>
> --
>  View this message in context: 
> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26443545.html
>
> Sent from the JMeter - User mailing list archive at Nabble.com.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [email protected]
>  For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to