I think the answer to 1 should be yes, duplicate keys are allowed. For instance, a vector of ids and a factor that groups the ids somehow (e.g., by experiment), with ids unique in each group.
So I'm for step #2. Martin ________________________________________ From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of James W. MacDonald [jmac...@uw.edu] Sent: Friday, November 20, 2015 5:30 PM To: bioc-devel@r-project.org Subject: [Bioc-devel] Behavior of select function in AnnotationDbi There is an inconsistency in how select() works in AnnotationDbi when a user passes in duplicated keys to be mapped, depending on if the mapping is 1:1 or 1:many. It's easiest to show using an example. > select(org.Hs.eg.db, rep("1", 3), "SYMBOL") 'select()' returned many:1 mapping between keys and columns ENTREZID SYMBOL 1 1 A1BG 2 1 A1BG 3 1 A1BG > select(org.Hs.eg.db, rep("1", 3), "GO") 'select()' returned many:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0003674 ND MF 3 1 GO:0003674 ND MF This is obviously a bug. A single query for that ID results in this: > select(org.Hs.eg.db, "1", "GO") 'select()' returned 1:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0005576 IDA CC 3 1 GO:0005615 IDA CC 4 1 GO:0008150 ND BP 5 1 GO:0070062 IDA CC 6 1 GO:0072562 IDA CC So the returned results are completely borked. However, the question I have is what should be returned? To be consistent with the first example, it should be the output expected for a single key, repeated three times, which I have patched AnnotationDbi to do: > select(org.Hs.eg.db, rep("1", 3), "GO") 'select()' returned many:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0005576 IDA CC 3 1 GO:0005615 IDA CC 4 1 GO:0008150 ND BP 5 1 GO:0070062 IDA CC 6 1 GO:0072562 IDA CC 7 1 GO:0003674 ND MF 8 1 GO:0005576 IDA CC 9 1 GO:0005615 IDA CC 10 1 GO:0008150 ND BP 11 1 GO:0070062 IDA CC 12 1 GO:0072562 IDA CC 13 1 GO:0003674 ND MF 14 1 GO:0005576 IDA CC 15 1 GO:0005615 IDA CC 16 1 GO:0008150 ND BP 17 1 GO:0070062 IDA CC 18 1 GO:0072562 IDA CC So, two questions. 1. Should duplicate keys be allowed, or should duplicates be removed before querying the database, preferably with a message saying that dups were removed? 2. If the answer to #1 is yes, then to be consistent, I will just commit the patch I have made to both devel and release. Best, Jim -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel