[ 
https://issues.apache.org/jira/browse/TIKA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler resolved TIKA-2539.
-------------------------------
    Resolution: Duplicate

> TagSoup HTML parser is project EOL
> ----------------------------------
>
>                 Key: TIKA-2539
>                 URL: https://issues.apache.org/jira/browse/TIKA-2539
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.16, 1.17
>         Environment: All
>            Reporter: Richard Jones
>
> The TagSoup HTML parser is project EOL, and the last update was to create the 
> 1.2.1 version (that Tika references) back in Aug 2011.
> I cannot find any TagSoup forks that are still active but there are many 
> alternative (and perhaps better if you believe the reviews and wikipedia 
> comparisons) html parsers out there.
> Perhaps the most active is already pulled in by Tika as a transitive 
> dependency of edu.ucar:grib, and that is jsoup with over 1,000 usages and 
> updates as recent as a few months ago:
> https://mvnrepository.com/artifact/org.jsoup/jsoup
> https://jsoup.org/
> Requesting consideration of moving away from the long EOL'd TagSoup to an 
> active and modern HTML parser like jsoup that is already a transitive Tika 
> dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to