[ https://issues.apache.org/jira/browse/ANY23-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated ANY23-115: --------------------------------------- Fix Version/s: 0.7.1 > Empty spans seem to break ANY23 > ------------------------------- > > Key: ANY23-115 > URL: https://issues.apache.org/jira/browse/ANY23-115 > Project: Apache Any23 > Issue Type: Bug > Components: html-scraper > Environment: Any23.org public scraper > Reporter: Christophe Dupriez > Fix For: 0.7.1 > > > One of the 2 thousand URLs with the problem: > http://www.oceanexpert.net/viewMemberRecord.php?&memberID=20045 > The piece of HTML creating the problem seems to be: > <h1> > Details of<span itemprop="name"> <span > itemprop="honorificPrefix"></span> <span > itemprop="givenName">Laury</span> <span > itemprop="familyName">Miller</span></span> > </h1> > (this may disappear as we may workaround the problem) > Error message: > Internal error. > ================================================================ > java.lang.IllegalArgumentException: Invalid content '' > at > org.apache.any23.extractor.microdata.ItemPropValue.<init>(ItemPropValue.java:89) > at > org.apache.any23.extractor.microdata.MicrodataParser.getPropertyValue(MicrodataParser.java:341) > at > org.apache.any23.extractor.microdata.MicrodataParser.getItemProps(MicrodataParser.java:394) > at > org.apache.any23.extractor.microdata.MicrodataParser.getItemScope(MicrodataParser.java:471) > at > org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:186) > at > org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:203) > at > org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:100) > at > org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:62) > at > org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:477) > at > org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:260) > at org.apache.any23.Any23.extract(Any23.java:294) > at org.apache.any23.Any23.extract(Any23.java:446) > at > org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:113) > at org.apache.any23.servlet.Servlet.doGet(Servlet.java:74) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > at java.lang.Thread.run(Thread.java:662) > ================================================================ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira