>> any warning message given if things are removed (I hope)?

> No, maybe we should make it optional. Then the following would happen:
> 
> 1. User selects a couple of files, one or more contains non-xml chars
> 2. Import fails because of that complaining about the first file and suggest
>     to enable "Remove non-xml chars" option
> 3. User enables "Remove non-xml chars" option and retries
> 
> What do you think?

+1 for the warning ;) Having the option sounds like a good idea. I guess these 
illegal characters should only be very few that typically do not ever occur in 
a text (control characters, etc.?)

> Maybe we should speak a little about how the import wizard should be.
> The current one can only import plain/text and rtf files. And it supports
> only one view.

One view is fine for me.

> One more restriction we currently have is that it only imports 
> plain/text from
> files which end with .txt (and .rtf). Should we remove this limitation?

How about using TIKA in the importer?

> Do we need to set the language in the wizard?

Would be very nice to have the option.

> Do you think the name "Document" import wizard is fine?

I think that's ok. You an audio file or video probably wouldn't be called 
document by most people. A Word or PDF, however, would be and can be converted 
to plain text.

Cheers,

Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 




Reply via email to