Hi Marshall,
we can do the file -> uri -> url conversion to handle spaces in path string
inside the FileSystemCollectionReader, keeping the old API with
populateCASFromURL instead of populateCASFromURI inside the TikaWrapper.
Tommaso

p.s.:
we definitely need more test cases for some Sandbox projects :-)

2010/9/21 Marshall Schor <[email protected]>

>  I noticed that the patch changes a public API (populateCASfromURL).  This
> will
> break backwards compatibility, if anyone has code that is depending on that
> API.
>
> If there is a convenient way to implement the fix without changing the
> APIs, I
> think our users may prefer that :-) .
>
> -Marshall
>
> On 9/20/2010 1:52 AM, Tommaso Teofili (JIRA) wrote:
> >      [
> https://issues.apache.org/jira/browse/UIMA-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Tommaso Teofili resolved UIMA-1878.
> > -----------------------------------
> >
> >     Resolution: Fixed
> >
> >> TikaAnnotator doesn't handle spaces in path string
> >> --------------------------------------------------
> >>
> >>                 Key: UIMA-1878
> >>                 URL: https://issues.apache.org/jira/browse/UIMA-1878
> >>             Project: UIMA
> >>          Issue Type: Bug
> >>          Components: Sandbox-TikaAnnotator
> >>    Affects Versions: 2.3
> >>         Environment: Windows
> >>            Reporter: Greg Holmberg
> >>         Attachments: TikaAnnotator-patch.txt
> >>
> >>
> >> If you give a value for InputDirectory that contains a space, then
> TikiAnnotator silently does nothing.
> >> This is because File objects are converted directly to a URL, and
> openStream() fails because the space character wasn't converted to %20.
> >> When this happens, the exception is ignored and the CAS text is set to
> "".
> >> It would be better to convert the File object to a URI and the URI to a
> URL.  This will convert the space character correctly.
> >> Secondly, it would be better the throw an exception rather than silently
> ignore it.
> >> A suggested patch is attached.
>

Reply via email to