End of line whitespaces in Eclipse

2014-04-10 Thread William Colen
When I save a .java file in Eclipse, it is removing the end of line
whitespaces. I am using the
http://opennlp.apache.org/code-formatter/OpenNLP-Eclipse-Formatter.xml

This is causing lots of changes in files I actually needed to change only
one line. Do anybody know how to I avoid it?

Thank you,
William


Re: Doccat evaluator

2014-04-10 Thread William Colen
Yes, I just finished implementing the confusion matrix report, just like
the one I did for the POS Tagger. I will commit it today.

I could not test it properly with Leipzig corpus. For some reason to Doccat
never fails with this corpus!
To effectively test it I used the 20news corpus.


2014-04-10 19:37 GMT-03:00 Jörn Kottmann :

> I thought it should be done similar to the way pos tags are measured when
> I implemented that.
>
> A confusion matrix might also be helpful to see which categories are more
> difficult to classify for the system.
>
> Jörn
>
>
> On 04/10/2014 03:00 PM, William Colen wrote:
>
>> Actually, since we always add a tag to each document, accuracy makes
>> sense.
>> We could implement F-1 for the individual categories.
>>
>> 2014-04-09 17:23 GMT-03:00 William Colen :
>>
>>  Hello,
>>>
>>> I was checking if there is any open issue related to Doccat, and I found
>>> this one -
>>>
>>> OPENNLP-81: Add a cli tool for the doccat evaluation support
>>>
>>> I noticed that there is already a class
>>> named DocumentCategorizerEvaluator, which is not used anywhere
>>> internally.
>>> This is evaluating performance in terms of accuracy, but I believe it
>>> would
>>> be better do do it in terms of F-Measuare.
>>>
>>> Any thoughts?
>>>
>>> As we are working in a major version, I think it would be OK to change
>>> it.
>>>
>>>
>>> Thank you,
>>> William
>>>
>>>
>


Re: Doccat evaluator

2014-04-10 Thread Jörn Kottmann
I thought it should be done similar to the way pos tags are measured 
when I implemented that.


A confusion matrix might also be helpful to see which categories are 
more difficult to classify for the system.


Jörn

On 04/10/2014 03:00 PM, William Colen wrote:

Actually, since we always add a tag to each document, accuracy makes sense.
We could implement F-1 for the individual categories.

2014-04-09 17:23 GMT-03:00 William Colen :


Hello,

I was checking if there is any open issue related to Doccat, and I found
this one -

OPENNLP-81: Add a cli tool for the doccat evaluation support

I noticed that there is already a class
named DocumentCategorizerEvaluator, which is not used anywhere internally.
This is evaluating performance in terms of accuracy, but I believe it would
be better do do it in terms of F-Measuare.

Any thoughts?

As we are working in a major version, I think it would be OK to change it.


Thank you,
William





Re: Doccat evaluator

2014-04-10 Thread William Colen
Actually, since we always add a tag to each document, accuracy makes sense.
We could implement F-1 for the individual categories.

2014-04-09 17:23 GMT-03:00 William Colen :

> Hello,
>
> I was checking if there is any open issue related to Doccat, and I found
> this one -
>
> OPENNLP-81: Add a cli tool for the doccat evaluation support
>
> I noticed that there is already a class
> named DocumentCategorizerEvaluator, which is not used anywhere internally.
> This is evaluating performance in terms of accuracy, but I believe it would
> be better do do it in terms of F-Measuare.
>
> Any thoughts?
>
> As we are working in a major version, I think it would be OK to change it.
>
>
> Thank you,
> William
>