I am having problems plucking using <include> with JPluck 2.pre7 that works with JPluck 0.9.1.
The site I am trying to pluck is the press releases section of EurakAlert (http://www.eurekalert.org/pubnews.php). I am interested only in the first page of the press releases page and the first level links and inline images that are hosted on that website. In my JPLuck 0.9 jxl file I have a section that looks like the following. "pub_release" are in the urls I want to pluck. The images I want have "release_graphics" in their url: ===== <document stayOnHost="true" stayBelowPath="false" linkDepth="1" includeImages="true" includeImageAltText="true" includeFullSizeImages="false"> <name>EurekAlert</name> <startPage>http://www.eurekalert.org/pubnews.php</startPage> <category>News</category> <language>EN</language> <include> <pattern>.*/pub_releases/.*</pattern> <pattern>.*/release_graphics/.*</pattern> </include> </document> ===== My JPluck 2.0 jxl file to pluck the site looks like this: ===== <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE jxl PUBLIC "-//jpluck//DTD JXL 2.0//EN" "http://jpluck.sourceforge.net/jxl/DTD/jxl-2.0.dtd"> <jxl> <site> <name>EurekAlert</name> <uri maxDepth="1" restrict="host">http://www.eurekalert.org/pubnews.php</uri> <include> <pattern>.*/pub_releases/.*</pattern> <pattern>.*/release_graphics/.*</pattern> </include> <images> <embedded bpp="16" maxHeight="150" maxWidth="150"/> </images> <category>News</category> </site> </jxl> ===== While the Jpluck 0.9 pluck works, JPluck 2.0 would fetch a lot of unwanted links and images from the site. Am I doing something wrong with <include> in the JPluck 2.0 jxl file? Regards, Kam-Yung _______________________________________________________ The FREE service that prevents junk email http://www.mailshell.com _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

