[ https://issues.apache.org/jira/browse/TIKA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
susserj updated TIKA-2105: -------------------------- Comment: was deleted (was: Hi Tim When I added the -I <input_dir> -o <output_dir> to my command line I got a bunch of zero byte files in my output directory but the specific file in the input directory called "français.docx" was missing. It doesn't like the French characters in the filenames. ) > Unable to process documents with french accents in filenames > ------------------------------------------------------------ > > Key: TIKA-2105 > URL: https://issues.apache.org/jira/browse/TIKA-2105 > Project: Tika > Issue Type: Bug > Components: batch > Affects Versions: 1.13 > Environment: Windows 7, Java version 1.7.0.111 > Reporter: susserj > > When I execute the following batch test1.bat script from my command prompt, > I get this error message: > test1.bat > @echo off > "C:\Program Files (x86)\Java\jre7\bin\java" -jar c:\temp\tika-app-1.13.jar -m > "S:\2008-09\2009-10\IC IT Environment 2009\français.docx" > Error: > Exception in thread "main" java.net.MalformedURLException: unknown protocol: s > at java.net.URL.<init>(Unknown Source) > at java.net.URL.<init>(Unknown Source) > at java.net.URL.<init>(Unknown Source) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145) > When the filenames don't have special French characters, it works fine. (I > cannot change the names of all the files that need to be processed). > I apologise, my experience with java and TIKA is very limited. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)