Mark, I agree. its not really a good "option". Just saying that trec content that isn't ISO-8859-1 does exist :)
I like Shai's idea of having a configurable option, this is more obvious. On Thu, Jul 2, 2009 at 5:23 PM, Mark Miller<markrmil...@gmail.com> wrote: > bq. (For this I actually ran it with -Dfile.encoding=UTF-8 to prevent this > problem), so its "configurable" already...but not obvious. > > Right, I considered this option, but it changes the default encoding for the > whole JVM - probably going to be fine for running benchmark, but not ideal > in terms of managing and running content sources with different encodings > longer term. > > On Thu, Jul 2, 2009 at 5:17 PM, Robert Muir (JIRA) <j...@apache.org> wrote: >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726685#action_12726685 >> ] >> >> Robert Muir commented on LUCENE-1730: >> ------------------------------------- >> >> I'd like this to be configurable. I used this package to test LUCENE-1628. >> >> (For this I actually ran it with -Dfile.encoding=UTF-8 to prevent this >> problem), so its "configurable" already...but not obvious. >> >> >> > TrecContentSource should use a fixed encoding, rather than system >> > dependent >> > >> > --------------------------------------------------------------------------- >> > >> > Key: LUCENE-1730 >> > URL: https://issues.apache.org/jira/browse/LUCENE-1730 >> > Project: Lucene - Java >> > Issue Type: Bug >> > Components: contrib/benchmark >> > Reporter: Shai Erera >> > Fix For: 2.9 >> > >> > Attachments: LUCENE-1730.patch >> > >> > >> > TrecContentSource opens InputStreamReader w/o a fixed encoding. On >> > Windows, this means CP1252 (at least on my machine) which is ok. However, >> > when I opened it on a Linux machine w/ a default of UTF-8, it failed to >> > read >> > the files. The patch changes it to use ISO-8859-1, which seems to be the >> > right one (and http://mg4j.dsi.unimi.it/man/manual/ch01s04.html mentions >> > this encoding in its example of a script which reads the data). >> > Patch to follow shortly. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > > -- > -- > - Mark > > http://www.lucidimagination.com > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org