Re: [R] ROCR predictions

Assa Yeroslaviz Tue, 17 Aug 2010 09:23:17 -0700

Dear Claudia,

thank you for your fast answer.
I add again the table of the data as an example.


     Protein ID Pfam Domain p-value Expected Is Expected True Postive False
Negative False Positive True Negative  NP_000011.2 APH 1.15E-05 APH TRUE 1 0
0 0  NP_000011.2 MutS_V 0.0173 APH FALSE 0 0 1 0  NP_000062.1 CBS 9.40E-08
CBS TRUE 1 0 0 0  NP_000066.1 APH 3.83E-06 APH TRUE 1 0 0 0  NP_000066.1
CobU 0.009 APH FALSE 0 0 1 0  NP_000066.1 FeoA 0.3975 APH FALSE 0 0 1 0
NP_000066.1 Phage_integr_N 0.0219 APH FALSE 0 0 1 0  NP_000161.2
Beta_elim_lyase 6.25E-12 Beta_elim_lyase TRUE 1 0 0 0  NP_000161.2
Glyco_hydro_6 0.002 Beta_elim_lyase FALSE 0 0 1 0  NP_000161.2 SurE 0.0059
Beta_elim_lyase FALSE 0 0 1 0  NP_000161.2 SapB_2 0.0547 Beta_elim_lyase
FALSE 0 0 1 0  NP_000161.2 Runt 0.1034 Beta_elim_lyase FALSE 0 0 1 0
NP_000204.3 EGF 0.004666118 EGF TRUE 1 0 0 0  NP_000229.1 PAS 3.13E-06 PAS
TRUE 1 0 0 0  NP_000229.1 zf-CCCH 0.2067 PAS FALSE 0 1 1 0  NP_000229.1
E_raikovi_mat 0.0206 PAS FALSE 0 0 0 0  NP_000388.2 NAD_binding_1 8.21E-24
NAD_binding_1 TRUE 1 0 0 0  NP_000388.2 ABM 1.40E-08 NAD_binding_1 FALSE 0 0
1 0  NP_000483.3 MMR_HSR1 1.98E-05 MMR_HSR1 TRUE 1 0 0 0  NP_000483.3 DEAD
2.30E-05 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 APS_kinase 1.80E-09 MMR_HSR1
FALSE 0 0 1 0  NP_000483.3 CbiA 0.0003 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3
CoaE 1.28E-07 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 FMN_red 4.61E-08 MMR_HSR1
FALSE 0 0 1 0  NP_000483.3 Fn_bind 0.3855 MMR_HSR1 FALSE 0 0 1 0
NP_000483.3 Invas_SpaK 0.2431 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3
PEP-utilizers 0.127 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 NIR_SIR_ferr 0.1661
MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 AAA 0.0031 MMR_HSR1 FALSE 0 0 1 0
NP_000483.3 DUF448 0.0021 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 CBF_beta
0.1201 MMR_HSR1 FALSE 0 0 1 0  NP_000483.3 zf-C3HC4 0.0959 MMR_HSR1 FALSE 0
0 1 0  NP_000560.5 ig 5.69E-39 ig TRUE 1 0 0 0  NP_000704.1 Epimerase
4.40E-21 Epimerase TRUE 1 0 0 0  NP_000704.1 Lipase_GDSL 6.63E-11 Epimerase
FALSE 0 0 1 0
 ...

this is a shorted list from one of the 10 lists I have for different
p-values.

As you can see I have separate p-value experiments and probably need to
calculate for each of them a separate ROC. But I don't know how to calculate
these characteristics for the p-values.
How do I assign the predictions to each of the single p-value experiments?

I would appreciate any help

Thanks
Assa


On Tue, Aug 17, 2010 at 12:55, Claudia Beleites <cbelei...@units.it> wrote:

> Dear Assa,
>
>
>
>> I am having a problem building a ROC curve with my data using the ROCR
>> package.
>>
>> I have 10 lists of proteins such as attached (proteinlist.xls). each of
>> the
>>
> your file didn't make it to the list.
>
>
>
>  lists was calculated with a different p-value.
>> The goal is to find the optimal p-value for the highest number of true
>> positives as well as lowaest number of false positives.
>>
>
>  As far as I understood the explanations from the vignette of ROCR, my data
>> of TP and FP are the labels of the prediction function. But I don't know
>> how
>> to assign the right predictions to these labels.
>>
>
> I assume the p-values are different cutoffs that you use for "hardening" (=
> making yes/no predictions) from some soft (= continuous class membership)
> output of your classifier.
>
> Usually, ROCR calculates the curves as function of the cutoff/threshold
> itself from the continuos predictions. If you have these soft predictions,
> let ROCR do the calculation for you.
>
> If you don't have them, ROCR can calculate your characteristics (sens,
> spec, precision, recall, whatever) for each of the p-values. While you could
> combine the results "by hand" into a ROCR-performance object and let ROCR do
> the plotting, it is then probably easier if you plot directly yourself.
>
> Don't be shy to look into the prediction and performance objects, I find
> them pretty obvious. Maybe start with the objects produced by the examples.
>
> Also, note ROCR works with binary validation data only. If your data has
> more than one class, you need to make two-class-problems first (e.g. protein
> xy ./. not protein xy).
>
>
>
>  BTW, Is there a way of finding the optimum in the curve? I mean to find
>> the
>> exact value in the ROC curve (see sheet 2 in the excel file for the ROC
>> curve).
>>
>
> Someone asked for optimum on ROC a couple of months ago, RSiteSearch on the
> mailing list with ROC and optimal or optimum should get you answers.
>
>
>
>  I would like to thank for any help in advance
>>
> You're welcome.
>
> Claudia
>
> --
> Claudia Beleites
> Dipartimento dei Materiali e delle Risorse Naturali
> UniversitÃ  degli Studi di Trieste
> Via Alfonso Valerio 6/a
> I-34127 Trieste
>
> phone: +39 0 40 5 58-37 68
> email: cbelei...@units.it
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ROCR predictions

Reply via email to