Ah... OK. Aliaksandr
On Wed, Jan 18, 2012 at 1:05 AM, James Kosin <[email protected]> wrote: > Aliaksandr, > > I put the TODO there; because I couldn't determine if it was a better > place. The only big downside to using the Stream is we have no control > over the encoding. So, I was thinking more that this method of loading > the item would be deprecated anyway. In favor of the other method. > > James > > On 1/17/2012 5:50 AM, Aliaksandr Autayeu wrote: > > Guys, if somebody knows that part of the code well, it would be nice to > > take a look at: > > > > 1) TODO left there > > 2) .reset() raising the above exception if the PlainTextByLineStream is > > created with a stream. > > > > Aliaksandr > > > > On Tue, Jan 17, 2012 at 12:12 AM, [email protected] < > > [email protected]> wrote: > > > >> Thank you, Aliaksandr! > >> > >> > >> > >> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu > >> <[email protected]> wrote: > >>> I have reproduced the problem. It boils down to different > initialization > >>> of PlainTextByLineStream. If it is instantiated by > >>> > >>> public PlainTextByLineStream(Reader in) { > >>> this.in = new BufferedReader(in); > >>> this.channel = null; > >>> this.encoding = null; > >>> } > >>> > >>> it does not work. If it is instantiated with a channel: > >>> > >>> public PlainTextByLineStream(FileChannel channel, String > charsetName) { > >>> this.encoding = charsetName; > >>> this.channel = channel; > >>> > >>> // TODO: Why isn't reset called here ? > >>> in = new BufferedReader(Channels.newReader(channel, encoding)); > >>> } > >>> > >>> it does work, because later on in reset: > >>> > >>> if (channel == null) { > >>> in.reset(); > >>> } > >>> else { > >>> channel.position(0); > >>> in = new BufferedReader(Channels.newReader(channel, encoding)); > >>> } > >>> > >>> reader is recreated instead of direct in.reset() call. > >>> > >>> > >>> Now, these differences come into play because > WordTagSampleStreamFactory > >> has > >>> different PlainTextByLineStream initialization, which is probably my > >> fault > >>> due to work on factories in 402. Looks like a copy-paste error. > >>> > >>> I have tried to commit a fix, but I'm getting 403 error :( Please, > apply > >>> the attached patch. > >>> > >>> Aliaksandr > >>> > >>> > >>> On Mon, Jan 16, 2012 at 12:54 AM, [email protected] > >>> <[email protected]> wrote: > >>>> Hi, > >>>> > >>>> I am having an error in POS Tagger CrossValidator tool from the trunk. > >>>> I tried the same command with a released version and it worked, also I > >>>> tried Chunker CV tool and it is working too. > >>>> I tried debugging the code and check the SVN history for some clue, > >>>> but could not find anything. Any idea what is wrong? > >>>> > >>>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman > >>>> -data pos1.txt -cutoff 50 > >>>> > >>>> IO error while reading training data or indexing data: Stream not > marked > >>>> > >>>> Stack trace: > >>>> java.io.IOException: Stream not marked > >>>> at java.io.BufferedReader.reset(BufferedReader.java:485) > >>>> at > >>>> > >> > opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79) > >>>> at > >>>> > opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43) > >>>> at > >>>> > >> > opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256) > >>>> at > >>>> > >> > opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113) > >>>> at > >>>> > >> > opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72) > >>>> at opennlp.tools.cmdline.CLI.main(CLI.java:212) > >>>> > >>>> > >>>> Any idea what is wrong? > >>>> > >>>> Thanks, > >>>> William > >>> > >
