[
https://issues.apache.org/jira/browse/NUTCH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578972#action_12578972
]
Andrzej Bialecki commented on NUTCH-157:
-----------------------------------------
This branch is in End Of Life status.
> Problem during parsing msword document . It fetching properly but parsing is
> not working. Please show me the way how can i parse it
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-157
> URL: https://issues.apache.org/jira/browse/NUTCH-157
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 0.7
> Environment: windows
> Reporter: karamjit
>
> Ms word document not parsing.
> Error messages :----------
> Page from url Path in fetch
> ====file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
> 060301 173204 fetching
> file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
> 060301 173204 Parsing
> [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL
> PROTECTED]
> 060301 173204 fetch of
> file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc failed with:
> java.lang.NoSuchMethodError:
> org.apache.poi.hpsf.SummaryInformation.getEditTime()J
> 060301 173204 Could not clean the content-type [], Reason is
> [org.apache.nutch.util.mime.MimeTypeException: The type can not be null or
> empty]. Using its raw version...
> 060301 173204 Parsing
> [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL
> PROTECTED]
> 060301 173205 status: segment 20060301173203, 1 pages, 1 errors, 35840 bytes,
> 1000 ms
> 060301 173205 status: 1.0 pages/s, 280.0 kb/s, 35840.0 bytes/page
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.