Dear Claudia, thank you for your fast answer. I add again the table of the data as an example.
Protein ID Pfam Domain p-value Expected Is Expected True Postive False Negative False Positive True Negative NP_000011.2 APH 1.15E-05 APH TRUE 1 0 0 0 NP_000011.2 MutS_V 0.0173 APH FALSE 0 0 1 0 NP_000062.1 CBS 9.40E-08 CBS TRUE 1 0 0 0 NP_000066.1 APH 3.83E-06 APH TRUE 1 0 0 0 NP_000066.1 CobU 0.009 APH FALSE 0 0 1 0 NP_000066.1 FeoA 0.3975 APH FALSE 0 0 1 0 NP_000066.1 Phage_integr_N 0.0219 APH FALSE 0 0 1 0 NP_000161.2 Beta_elim_lyase 6.25E-12 Beta_elim_lyase TRUE 1 0 0 0 NP_000161.2 Glyco_hydro_6 0.002 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SurE 0.0059 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SapB_2 0.0547 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 Runt 0.1034 Beta_elim_lyase FALSE 0 0 1 0 NP_000204.3 EGF 0.004666118 EGF TRUE 1 0 0 0 NP_000229.1 PAS 3.13E-06 PAS TRUE 1 0 0 0 NP_000229.1 zf-CCCH 0.2067 PAS FALSE 0 1 1 0 NP_000229.1 E_raikovi_mat 0.0206 PAS FALSE 0 0 0 0 NP_000388.2 NAD_binding_1 8.21E-24 NAD_binding_1 TRUE 1 0 0 0 NP_000388.2 ABM 1.40E-08 NAD_binding_1 FALSE 0 0 1 0 NP_000483.3 MMR_HSR1 1.98E-05 MMR_HSR1 TRUE 1 0 0 0 NP_000483.3 DEAD 2.30E-05 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 APS_kinase 1.80E-09 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 CbiA 0.0003 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 CoaE 1.28E-07 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 FMN_red 4.61E-08 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 Fn_bind 0.3855 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 Invas_SpaK 0.2431 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 PEP-utilizers 0.127 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 NIR_SIR_ferr 0.1661 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 AAA 0.0031 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 DUF448 0.0021 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 CBF_beta 0.1201 MMR_HSR1 FALSE 0 0 1 0 NP_000483.3 zf-C3HC4 0.0959 MMR_HSR1 FALSE 0 0 1 0 NP_000560.5 ig 5.69E-39 ig TRUE 1 0 0 0 NP_000704.1 Epimerase 4.40E-21 Epimerase TRUE 1 0 0 0 NP_000704.1 Lipase_GDSL 6.63E-11 Epimerase FALSE 0 0 1 0 ... this is a shorted list from one of the 10 lists I have for different p-values. As you can see I have separate p-value experiments and probably need to calculate for each of them a separate ROC. But I don't know how to calculate these characteristics for the p-values. How do I assign the predictions to each of the single p-value experiments? I would appreciate any help Thanks Assa On Tue, Aug 17, 2010 at 12:55, Claudia Beleites <cbelei...@units.it> wrote: > Dear Assa, > > > >> I am having a problem building a ROC curve with my data using the ROCR >> package. >> >> I have 10 lists of proteins such as attached (proteinlist.xls). each of >> the >> > your file didn't make it to the list. > > > > lists was calculated with a different p-value. >> The goal is to find the optimal p-value for the highest number of true >> positives as well as lowaest number of false positives. >> > > As far as I understood the explanations from the vignette of ROCR, my data >> of TP and FP are the labels of the prediction function. But I don't know >> how >> to assign the right predictions to these labels. >> > > I assume the p-values are different cutoffs that you use for "hardening" (= > making yes/no predictions) from some soft (= continuous class membership) > output of your classifier. > > Usually, ROCR calculates the curves as function of the cutoff/threshold > itself from the continuos predictions. If you have these soft predictions, > let ROCR do the calculation for you. > > If you don't have them, ROCR can calculate your characteristics (sens, > spec, precision, recall, whatever) for each of the p-values. While you could > combine the results "by hand" into a ROCR-performance object and let ROCR do > the plotting, it is then probably easier if you plot directly yourself. > > Don't be shy to look into the prediction and performance objects, I find > them pretty obvious. Maybe start with the objects produced by the examples. > > Also, note ROCR works with binary validation data only. If your data has > more than one class, you need to make two-class-problems first (e.g. protein > xy ./. not protein xy). > > > > BTW, Is there a way of finding the optimum in the curve? I mean to find >> the >> exact value in the ROC curve (see sheet 2 in the excel file for the ROC >> curve). >> > > Someone asked for optimum on ROC a couple of months ago, RSiteSearch on the > mailing list with ROC and optimal or optimum should get you answers. > > > > I would like to thank for any help in advance >> > You're welcome. > > Claudia > > -- > Claudia Beleites > Dipartimento dei Materiali e delle Risorse Naturali > Università degli Studi di Trieste > Via Alfonso Valerio 6/a > I-34127 Trieste > > phone: +39 0 40 5 58-37 68 > email: cbelei...@units.it > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.