Re: Using MulticlassClassificationEvaluator for NER evaluation

martin Thu, 11 Nov 2021 04:39:53 -0800

OK, thank you, Gourav. I didn't realize that Spark works with numericalformats only by design.

What I am trying to achieve is rather straight-forward: Evaluate atrained model using the standard metrics provided byMulticlassClassificationEvaluator. Since this isn't possible for textlabels, we'll need to work around it and possible create a wrapperevaluator around the Spark standard class.


Thanks a lot for the help.

Cheers,

Martin

Am 2021-11-11 13:10, schrieb Gourav Sengupta:

Hi Martin,
okay, so you will ofcourse need to translate the NER string output to anumerical format as you would do with any text data before feeding itto SPARK ML. Please read SPARK ML documentation on this. I think thatthey are quite clear on how to do that.But more importantly please try to answer Sean's question, explainingwhat you are trying to achieve and how, always helps.
Regards,
Gourav Sengupta
On Thu, Nov 11, 2021 at 11:03 AM Martin Wunderlich<mar...@wunderlich.com> wrote:
Hi Gourav,
Mostly correct. The output of SparNLP here is a trainedpipeline/model/transformer. I am feeding this trained pipeline to theMulticlassClassificationEvaluator for evaluation and thisMulticlassClassificationEvaluator only accepts floats or doubles arethe labels (instead of NER labels).
Cheers,

Martin

Am 11.11.21 um 11:39 schrieb Gourav Sengupta:
Hi Martin,
just to confirm, you are taking the output of SPARKNLP, and then tryingto feed it to SPARK ML for running algorithms on the output ofNERgenerated by SPARKNLP right?
Regards,
Gourav Sengupta

On Thu, Nov 11, 2021 at 8:00 AM <mar...@wunderlich.com> wrote:

Hi Sean,
Apologies for the delayed reply. I've been away on vacation and thenbusy catching up afterwards.
Regarding the evalution using MulticlassClassificationEvaluator: Thisis a about a sequence labeling task to identify specific non-standardnamed entities. The training and evaluation data is in CoNLL format.The training works all fine, using the categorical labels for the NEs.In order to use the MulticlassClassificationEvaluator, however, I needto convert these to floats. This is possible and also works fine, it isjust inconvenient having to do the extra step. I would have expectedthe MulticlassClassificationEvaluator to be able to use the labelsdirectly.
I will try to create and propose a code change in this regard, if orwhen I find the time.
Cheers,

Martin

Am 2021-10-25 14:31, schrieb Sean Owen:
I don't think the question is representation as double. The question ishow this output represents a label? This looks like the result of anannotator. What are you classifying? you need, first, ground truth andprediction somewhere to use any utility to assess classificationmetrics.
On Mon, Oct 25, 2021 at 5:42 AM <mar...@wunderlich.com> wrote:

Hello,
I am using SparkNLP to do some NER. The result datastructure aftertraining and classification is a Dataset<Row>, with one column each forlabels and predictions. For evaluating the model, I would like to usethe Spark ML classorg.apache.spark.ml.evaluation.MulticlassClassificationEvaluator.However, this evaluator expects labels as double numbers. In the caseof an NER task, the results in my case are of typearray<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>.
It would be possible, of course, to convert this format to the requireddoubles. But is there a way to easily applyMulticlassClassificationEvaluator to the NER task or is there maybe abetter evaluator? I haven't found anything yet (neither in Spark ML norin SparkNLP).
Thanks a lot.

Cheers,

Martin

Re: Using MulticlassClassificationEvaluator for NER evaluation

Reply via email to