[ 
https://issues.apache.org/jira/browse/CONNECTORS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046963#comment-15046963
 ] 

Issei Nishigata commented on CONNECTORS-1264:
---------------------------------------------

Patch that I applied can solve two of cases.

1.  can solve to parse quotes around attribute value.
like below.
{code}
<a href="/hello/out/there">hello</a>
{code}
Then MCF's web crawler extracts links as "/hello/out/there".
"http://localhost/hello/out/there";(for example, ) will be the next crawl 
object. 

2. can solve to parse no quotes around attribute value.
like below.
{code}
<a href=/hello/out/there>hello</a>
{code}
MCF's web crawler does as well as describe above.




> HTML parsing doesn't handle unquoted attribute values with "/" characters 
> right
> -------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1264
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1264
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.2
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.3
>
>         Attachments: CONNECTORS-1264-2.patch, CONNECTORS-1264-3.patch, 
> CONNECTORS-1264.patch, alternative.patch
>
>
> HTML tags like "<a href=hello/out/there >" fail to parse properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to