Presumably it's related to "Beautiful Soup" - which is nice and liberal 
when it comes to parsing HTML and XML.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): https://developer.ibm.com/tv/mpt/    or 
  
https://itunes.apple.com/gb/podcast/mainframe-performance-topics/id1127943573?mt=2



From:   "Barkow, Eileen" <ebar...@doitt.nyc.gov>
To:     IBM-MAIN@LISTSERV.UA.EDU
Date:   27/04/2017 14:51
Subject:        Re: How to pull webpage into batch job
Sent by:        IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



Thank you Andrew for the info about Jsoup - I  had never heard of it.

the  jar files to compile and  run can be downloaded from:



https://jsoup.org/download



api is at:



https://jsoup.org/apidocs/



-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On 
Behalf Of Andrew Rowley
Sent: Thursday, April 27, 2017 3:19 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: How to pull webpage into batch job



I would suggest Java as well. There are open source libraries that can

do the HTML parsing too e.g. Jsoup.



I just tested this example on z/OS, it worked (fetch the Wikipedia home

page and list items from the In the news section):



import java.io.IOException;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;



public class JsoupTest {

     public static void main(String[] args) throws IOException {

         Document doc = Jsoup.connect("http://en.wikipedia.org/";).get();

         Elements newsHeadlines = doc.select("#mp-itn li");

         for (Element e : newsHeadlines) {

             System.out.println(e.text());

         }

     }

}



--

Andrew Rowley

Black Hill Software

+61 413 302 386



----------------------------------------------------------------------

For IBM-MAIN subscribe / signoff / archive access instructions,

send email to lists...@listserv.ua.edu<mailto:lists...@listserv.ua.edu> 
with the message: INFO IBM-MAIN

________________________________

This e-mail, including any attachments, may be confidential, privileged or 
otherwise legally protected. It is intended only for the addressee. If you 
received this e-mail in error or from someone who was not authorized to 
send it to you, do not disseminate, copy or otherwise use this e-mail or 
its attachments. Please notify the sender immediately by reply e-mail and 
delete the e-mail from your system.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to