There is an inconsistency in how select() works in AnnotationDbi when a user passes in duplicated keys to be mapped, depending on if the mapping is 1:1 or 1:many. It's easiest to show using an example.
> select(org.Hs.eg.db, rep("1", 3), "SYMBOL") 'select()' returned many:1 mapping between keys and columns ENTREZID SYMBOL 1 1 A1BG 2 1 A1BG 3 1 A1BG > select(org.Hs.eg.db, rep("1", 3), "GO") 'select()' returned many:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0003674 ND MF 3 1 GO:0003674 ND MF This is obviously a bug. A single query for that ID results in this: > select(org.Hs.eg.db, "1", "GO") 'select()' returned 1:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0005576 IDA CC 3 1 GO:0005615 IDA CC 4 1 GO:0008150 ND BP 5 1 GO:0070062 IDA CC 6 1 GO:0072562 IDA CC So the returned results are completely borked. However, the question I have is what should be returned? To be consistent with the first example, it should be the output expected for a single key, repeated three times, which I have patched AnnotationDbi to do: > select(org.Hs.eg.db, rep("1", 3), "GO") 'select()' returned many:many mapping between keys and columns ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0005576 IDA CC 3 1 GO:0005615 IDA CC 4 1 GO:0008150 ND BP 5 1 GO:0070062 IDA CC 6 1 GO:0072562 IDA CC 7 1 GO:0003674 ND MF 8 1 GO:0005576 IDA CC 9 1 GO:0005615 IDA CC 10 1 GO:0008150 ND BP 11 1 GO:0070062 IDA CC 12 1 GO:0072562 IDA CC 13 1 GO:0003674 ND MF 14 1 GO:0005576 IDA CC 15 1 GO:0005615 IDA CC 16 1 GO:0008150 ND BP 17 1 GO:0070062 IDA CC 18 1 GO:0072562 IDA CC So, two questions. 1. Should duplicate keys be allowed, or should duplicates be removed before querying the database, preferably with a message saying that dups were removed? 2. If the answer to #1 is yes, then to be consistent, I will just commit the patch I have made to both devel and release. Best, Jim -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel