[
https://issues.apache.org/jira/browse/CONNECTORS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046963#comment-15046963
]
Issei Nishigata commented on CONNECTORS-1264:
---------------------------------------------
Patch that I applied can solve two of cases.
1. can solve to parse quotes around attribute value.
like below.
{code}
<a href="/hello/out/there">hello</a>
{code}
Then MCF's web crawler extracts links as "/hello/out/there".
"http://localhost/hello/out/there"(for example, ) will be the next crawl
object.
2. can solve to parse no quotes around attribute value.
like below.
{code}
<a href=/hello/out/there>hello</a>
{code}
MCF's web crawler does as well as describe above.
> HTML parsing doesn't handle unquoted attribute values with "/" characters
> right
> -------------------------------------------------------------------------------
>
> Key: CONNECTORS-1264
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1264
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 2.2
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.3
>
> Attachments: CONNECTORS-1264-2.patch, CONNECTORS-1264-3.patch,
> CONNECTORS-1264.patch, alternative.patch
>
>
> HTML tags like "<a href=hello/out/there >" fail to parse properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)