The problem is the following: I have two big databases one look like this:
2-Methyl-4-trimethylsilyloxyoct-5-yne Benzoic acid, methyl ester Benzoic acid, 2-methyl-, methyl ester Acetic acid, phenylmethyl ester 2,7-Dimethyl-4-trimethylsilyloxyoct-7-en-5-yne etc. The second one looks like this: Name: D-Tagatose 1,6-bisphosphate Name: 1-Phosphatidyl-D-myo-inositol;: 1-Phosphatidyl-1D-myo-inositol;: 1-Phosphatidyl-myo-inositol;: Phosphatidyl-1D-myo-inositol;: (3-Phosphatidyl)-1-D-inositol;: 1,2-Diacyl-sn-glycero-3-phosphoinositol;: Phosphatidylinositol Name: Androstenedione;: Androst-4-ene-3,17-dione;: 4-Androstene-3,17-dione Name: Spermine;: N,N'-Bis(3-aminopropyl)-1,4-butanediamine Name: H+;: Hydron Name: 3-Iodo-L-tyrosine etc. Both of them have more then 3000 lines. Matching their name by hand is not an option because I don't know chemistry. *Possible solution I came up with*: Go through all the names of the first database and then try to match with the other one. I'm using *regexec *and *strsplit *functions for the matching. Basically I split the name into small chunks and try to get some hit in the other database. I can supply code If needed but I did not want to spam in the first mail. Any solution is welcome! It can be in pseudo-cod also or in any type of logical arguing. It does not matter. Laszlo-Andras Zsurzsa Msc. Informatics, Technical University Munchen [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.