Aliaksandr,

I put the TODO there; because I couldn't determine if it was a better
place.  The only big downside to using the Stream is we have no control
over the encoding.  So, I was thinking more that this method of loading
the item would be deprecated anyway.  In favor of the other method.

James

On 1/17/2012 5:50 AM, Aliaksandr Autayeu wrote:
> Guys, if somebody knows that part of the code well, it would be nice to
> take a look at:
>
> 1) TODO left there
> 2) .reset() raising the above exception if the PlainTextByLineStream is
> created with a stream.
>
> Aliaksandr
>
> On Tue, Jan 17, 2012 at 12:12 AM, [email protected] <
> [email protected]> wrote:
>
>> Thank you, Aliaksandr!
>>
>>
>>
>> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
>> <[email protected]> wrote:
>>> I have reproduced the problem. It boils down to different initialization
>>> of PlainTextByLineStream. If it is instantiated by
>>>
>>>   public PlainTextByLineStream(Reader in) {
>>>     this.in = new BufferedReader(in);
>>>     this.channel = null;
>>>     this.encoding = null;
>>>   }
>>>
>>> it does not work. If it is instantiated with a channel:
>>>
>>>   public PlainTextByLineStream(FileChannel channel, String charsetName) {
>>>     this.encoding = charsetName;
>>>     this.channel = channel;
>>>
>>>     // TODO: Why isn't reset called here ?
>>>     in = new BufferedReader(Channels.newReader(channel, encoding));
>>>   }
>>>
>>> it does work, because later on in reset:
>>>
>>>     if (channel == null) {
>>>         in.reset();
>>>     }
>>>     else {
>>>       channel.position(0);
>>>       in = new BufferedReader(Channels.newReader(channel, encoding));
>>>     }
>>>
>>> reader is recreated instead of direct in.reset() call.
>>>
>>>
>>> Now, these differences come into play because WordTagSampleStreamFactory
>> has
>>> different PlainTextByLineStream initialization, which is probably my
>> fault
>>> due to work on factories in 402. Looks like a copy-paste error.
>>>
>>> I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
>>> the attached patch.
>>>
>>> Aliaksandr
>>>
>>>
>>> On Mon, Jan 16, 2012 at 12:54 AM, [email protected]
>>> <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I am having an error in POS Tagger CrossValidator tool from the trunk.
>>>> I tried the same command with a released version and it worked, also I
>>>> tried Chunker CV tool and it is working too.
>>>> I tried debugging the code and check the SVN history for some clue,
>>>> but could not find anything. Any idea what is wrong?
>>>>
>>>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
>>>> -data pos1.txt -cutoff 50
>>>>
>>>> IO error while reading training data or indexing data: Stream not marked
>>>>
>>>> Stack trace:
>>>> java.io.IOException: Stream not marked
>>>>        at java.io.BufferedReader.reset(BufferedReader.java:485)
>>>>        at
>>>>
>> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
>>>>        at
>>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>>>        at
>>>>
>> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
>>>>        at
>>>>
>> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
>>>>        at
>>>>
>> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
>>>>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
>>>>
>>>>
>>>> Any idea what is wrong?
>>>>
>>>> Thanks,
>>>> William
>>>

Reply via email to