thanks Joern, I'll take a closer look.

On Wed, Mar 5, 2014 at 2:30 PM, Jörn Kottmann <[email protected]> wrote:

> Have a look at the Sequence Coding thread here on the list.
>
> The name finder always used IOB2 coding by default, we made this now
> configurable and it can be replaced by other codecs such BILOU, or when
> the work is done by a user implemented codec.
>
> To detect names in a sentence the name finder uses a learn able
> classifier. The classifier
> has to decide if a token is part of name or not. The logic on which labels
> are used to encode/
> decode name spans is now the responsibility of the SequenceCodec object.
>
> In the IOB2 codec (see the BioCodec class) the tokens are labels as Begin,
> Inside, Other.
> Each new name span has to start with the Begin label.
>
> The BILOU codec uses the following labels: Begin, Inside, Last, Unit and
> Other.
>
> The might be advantages to switch the codec depending on the data you are
> using,
> in the German CONLL03 data the evaluation results are slightly better with
> BILOU
> instead of IOB2.
>
> The BILOU codec uses more labels, and will be more resource intensive
> compared to IOB2.
>
> Also have a look at the wikipedia article about IOB:
> http://en.wikipedia.org/wiki/Inside_Outside_Beginning
>
> HTH,
> Jörn
>
>
> On 03/05/2014 02:18 PM, Mark G wrote:
>
>> Hello, I updated the tools trunk two days ago and stopped getting NER
>> results. I chatted with Joern and he made a change to the seq codec that
>> brought everything back to normal. For the benefit of everyone on the dev
>> list, would it be possible for someone to explain the changes regarding
>> the
>> sequence codec: its benefits, the differences, and where in the code to
>> look to see what it is actually doing. Don't need anything elaborate, just
>> a point of departure for inquiry.
>> MG
>>
>>
>

Reply via email to