Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit
Ok cool! I did actually just run into an issue while doing some tests: https://github.com/rdkit/rdkit/issues/2183. This issue brings up a question for me about where the smiles writer actually looks for stereochem info when it decides what to write. Also, I ran into the same snag earlier too Dan! - Kovas From: Dan Nealschneider Date: Tuesday, December 4, 2018 at 10:05 AM To: "col...@gmail.com" Cc: Kovas Palunas , rdkit discuss Subject: Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit I've done some in-memory translation of molecules to ROMols, and have used #2 without major problems. I do remember needing to make sure that the stereoatoms are in the correct order - that is, that the first stereoatom is bonded to the beginAtom of the bond. In Python, this is something like: bond = mol.GetBondBetweenAtoms(begin, end) if bond.GetBeginAtomIdx() != begin: assert bond.GetBeginAtomIdx() == end stereoatom1, stereoatom2 = stereoatom2, stereoatom1 bond.SetStereoAtoms(stereoatom1, stereoatom2) bond.SetStereo(stereo) - dan nealschneider (né wandschneider) Senior Developer Schrödinger, Inc Portland, OR On Tue, Dec 4, 2018 at 7:02 AM Brian Cole mailto:col...@gmail.com>> wrote: Hi Kovas, For your use-case #2 should suffice, "set STEREOCIS/STEREOTRANS tags + manually set stereo atoms". This is what the EnumerateStereoisomers code does: https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/EnumerateStereoisomers.py#L38 As to what is the 'ground truth', that is a more difficult question that I fear the answer may be 'none of them'. STEREOCIS/STEREOTRANS are rather recent additions to the RDKit API, while we strived to make sure STEREOCIS/STEREOTRANS across the RDKit, there are probably looming bugs in untested parts of the RDKit that don't handle them properly. However, I think those other APIs should be fixed to handle them properly, so please do report any problems you spot into the github issue tracker. Cheers, Brian On Mon, Dec 3, 2018 at 7:00 PM Kovas Palunas mailto:kovas.palu...@arzeda.com>> wrote: Hi All, I’m looking for a bit more clarity regarding double bond stereochem in RDKit. Currently, my understanding is that there are 3 ways to currently store this information: 1. STEREOE/STEREOZ tags + stereo atoms on either side of bond set by CIP ranks, as computed when calling MolFromSmiles to make a new molecule or AssignStereochemistry on an existing molecule 2. Manually set STEREOCIS/STEREOTRANS tags + manually set stereo atoms 3. ENDUPRIGHT/etc. single bond directionality tags, which are set when reading a molecule from smiles/inchi/mol file Is one of these methods the “ground truth” that is looked for by RDKit functions that care about this info, like the substructure matching code or the SMILES writing code? I am currently working on code that mutates molecules using a predetermined list of changes to be made to the molecule. I’d like to be able to include bond stereochemistry changing/creation/destruction here, and was thinking of doing so using the STEREOCIS/STEREOTRANS tags (and also providing the reference stereo atoms). Before I do this I want to make sure that molecules with these tags will be handled correctly by other RDKit functions downstream. Would these tags be a good choice here? Are there any caveats I should keep in mind as I work with this information? Thanks! - Kovas ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Double Bond Stereochemistry in the RDKit
Hi All, I’m looking for a bit more clarity regarding double bond stereochem in RDKit. Currently, my understanding is that there are 3 ways to currently store this information: 1. STEREOE/STEREOZ tags + stereo atoms on either side of bond set by CIP ranks, as computed when calling MolFromSmiles to make a new molecule or AssignStereochemistry on an existing molecule 2. Manually set STEREOCIS/STEREOTRANS tags + manually set stereo atoms 3. ENDUPRIGHT/etc. single bond directionality tags, which are set when reading a molecule from smiles/inchi/mol file Is one of these methods the “ground truth” that is looked for by RDKit functions that care about this info, like the substructure matching code or the SMILES writing code? I am currently working on code that mutates molecules using a predetermined list of changes to be made to the molecule. I’d like to be able to include bond stereochemistry changing/creation/destruction here, and was thinking of doing so using the STEREOCIS/STEREOTRANS tags (and also providing the reference stereo atoms). Before I do this I want to make sure that molecules with these tags will be handled correctly by other RDKit functions downstream. Would these tags be a good choice here? Are there any caveats I should keep in mind as I work with this information? Thanks! - Kovas ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Matching Generalized Compounds
I ended up just adding the following code to the atom matching function (after moving it and copies of the substructure code that call it to a new library to avoid other internal RDKit stuff calling the modified code): int a1n = a1->getAtomicNum(); int a2n = a2->getAtomicNum(); // dummy atoms have atomic number 0 if (a1n == 0 || a2n == 0) { return true; } So far, it seems to work great for me. If anyone else cares for this functionality, I’d be happy to share more details/open a pull request! - Kovas From: Kovas Palunas Date: Thursday, August 23, 2018 at 10:20 AM To: Christos Kannas Cc: RDKit , Paolo Tosco Subject: Re: [Rdkit-discuss] Matching Generalized Compounds Thanks for the feedback and code example! I understand that it works to make a third query mol using MCS that matches both the original mols to then match with. However, this seems like overkill (overly expensive) for this particular problem – as I understand it MCS can be very expensive depending on the compounds you are comparing. Would it not work to simply override the atom.Match function with one that will always match dummies no matter what the other atom is? I am not planning to compare SMARTSy queries with my matching with any complexity beyond simply dummy atoms. In fact, as I understand it, my example compounds are not made up of any query atoms when they are read into rdkit – the dummies are just made into queries after the read by the QueryParameters code. I am definitely not interested in doing generic query to query matching. - Kovas From: Christos Kannas Date: Thursday, August 23, 2018 at 7:53 AM To: Kovas Palunas Cc: RDKit , Paolo Tosco Subject: Re: [Rdkit-discuss] Matching Generalized Compounds Hi Kovas, You have two fuzzy compounds that you try to match them, because our intuition says that any atom notation [*:1] from m1 should match the Fluorine [F:11] in m2 and any atom [*:14] in m2 should match Carbon [CH3:4] in m1. The issue here is that you create two query compounds from m1 and m2 which will match their own specific substructures. Query to query matching is not trivial. In order to do what you want you need a query compound that combines their characteristic, which is what Paolo showed. Paolo with MCS and modifying atom properties created that query compound '[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or '[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]' Also bare in mind that Paolo's approach changed the starting compounds, as now they resemble the generic query compound that combines their fuzzy atoms. https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030 Best, Christos Christos Kannas Chem[o]informatics Researcher & Software Developer [Image removed by sender. View Christos Kannas's profile on LinkedIn]<http://cy.linkedin.com/in/christoskannas> On Thu, 23 Aug 2018 at 12:36, Paolo Tosco mailto:paolo.tosco.m...@gmail.com>> wrote: Dear Kovas, It looks like GetSubstructMatch() only finds a match if the dummy atom is in the query, not if it is in the molecule they you are matching the query against. This notebook present a possible solution off the top of my head: https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831 which does not involve changes to the C++ layer, even though it is computationally more expensive and will fail with disconnected fragments as it uses FindMCS(). There may be better solutions - this is what I came out with yesterday night in the little time I had available. Cheers, P. On 08/22/18 19:34, Kovas Palunas wrote: Hi All, I’m interested in having GetSubstructMatches return non-“null” results in the following example. The results should lead to a match where atom 1 maps to atom 11, 2 to 12, etc. m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]') m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]') ### do something here so that the mols will match ### qp = Chem.AdjustQueryParameters() qp.makeDummiesQueries = True m1 = Chem.AdjustQueryProperties(m1, qp) m2 = Chem.AdjustQueryProperties(m2, qp) # I’d like both of the following to return results m1.GetSubstructMatches(m2) m2.GetSubstructMatches(m1) My understanding of why these mols currently do not match is as follows: because only the dummy atoms are made queries (based on my query parameter adjustment), when one mol is matched to another dummy 1 may match to F:11, but dummy 14 will then not match to methyl:14. This is because (as I understand), normal atoms can only be matched by queries, and cannot match them themselves. Potential ideas to make this work as I’d like: 1. Override atom.Match in the python code – not sure that this would work since the C++ version of this function is what would be called during GetSubstructMatches 2. Override atom.Match in the C++ code – not quite sure how to do this, or what side affects it might have. Ideal
Re: [Rdkit-discuss] Matching Generalized Compounds
Thanks for the feedback and code example! I understand that it works to make a third query mol using MCS that matches both the original mols to then match with. However, this seems like overkill (overly expensive) for this particular problem – as I understand it MCS can be very expensive depending on the compounds you are comparing. Would it not work to simply override the atom.Match function with one that will always match dummies no matter what the other atom is? I am not planning to compare SMARTSy queries with my matching with any complexity beyond simply dummy atoms. In fact, as I understand it, my example compounds are not made up of any query atoms when they are read into rdkit – the dummies are just made into queries after the read by the QueryParameters code. I am definitely not interested in doing generic query to query matching. - Kovas From: Christos Kannas Date: Thursday, August 23, 2018 at 7:53 AM To: Kovas Palunas Cc: RDKit , Paolo Tosco Subject: Re: [Rdkit-discuss] Matching Generalized Compounds Hi Kovas, You have two fuzzy compounds that you try to match them, because our intuition says that any atom notation [*:1] from m1 should match the Fluorine [F:11] in m2 and any atom [*:14] in m2 should match Carbon [CH3:4] in m1. The issue here is that you create two query compounds from m1 and m2 which will match their own specific substructures. Query to query matching is not trivial. In order to do what you want you need a query compound that combines their characteristic, which is what Paolo showed. Paolo with MCS and modifying atom properties created that query compound '[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or '[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]' Also bare in mind that Paolo's approach changed the starting compounds, as now they resemble the generic query compound that combines their fuzzy atoms. https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030 Best, Christos Christos Kannas Chem[o]informatics Researcher & Software Developer [Image removed by sender. View Christos Kannas's profile on LinkedIn]<http://cy.linkedin.com/in/christoskannas> On Thu, 23 Aug 2018 at 12:36, Paolo Tosco mailto:paolo.tosco.m...@gmail.com>> wrote: Dear Kovas, It looks like GetSubstructMatch() only finds a match if the dummy atom is in the query, not if it is in the molecule they you are matching the query against. This notebook present a possible solution off the top of my head: https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831 which does not involve changes to the C++ layer, even though it is computationally more expensive and will fail with disconnected fragments as it uses FindMCS(). There may be better solutions - this is what I came out with yesterday night in the little time I had available. Cheers, P. On 08/22/18 19:34, Kovas Palunas wrote: Hi All, I’m interested in having GetSubstructMatches return non-“null” results in the following example. The results should lead to a match where atom 1 maps to atom 11, 2 to 12, etc. m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]') m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]') ### do something here so that the mols will match ### qp = Chem.AdjustQueryParameters() qp.makeDummiesQueries = True m1 = Chem.AdjustQueryProperties(m1, qp) m2 = Chem.AdjustQueryProperties(m2, qp) # I’d like both of the following to return results m1.GetSubstructMatches(m2) m2.GetSubstructMatches(m1) My understanding of why these mols currently do not match is as follows: because only the dummy atoms are made queries (based on my query parameter adjustment), when one mol is matched to another dummy 1 may match to F:11, but dummy 14 will then not match to methyl:14. This is because (as I understand), normal atoms can only be matched by queries, and cannot match them themselves. Potential ideas to make this work as I’d like: 1. Override atom.Match in the python code – not sure that this would work since the C++ version of this function is what would be called during GetSubstructMatches 2. Override atom.Match in the C++ code – not quite sure how to do this, or what side affects it might have. Ideally the changes I make would only affect this example (and other similar ones) 3. Make all atoms in both molecules QueryAtoms, but otherwise leave them unchanged. I’m not quite sure how to do this! Does anyone have any ideas for what the best approach here would be, or knows if there is already built in functionality for something like this? I’d prefer to not use SMARTS to construct my molecules if possible, since I don’t really think of them as queries, just as other molecules in the system that happen to not be fully specified. - Kovas -- Check out the vibrant tech community on one of the world
[Rdkit-discuss] Matching Generalized Compounds
Hi All, I’m interested in having GetSubstructMatches return non-“null” results in the following example. The results should lead to a match where atom 1 maps to atom 11, 2 to 12, etc. m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]') m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]') ### do something here so that the mols will match ### qp = Chem.AdjustQueryParameters() qp.makeDummiesQueries = True m1 = Chem.AdjustQueryProperties(m1, qp) m2 = Chem.AdjustQueryProperties(m2, qp) # I’d like both of the following to return results m1.GetSubstructMatches(m2) m2.GetSubstructMatches(m1) My understanding of why these mols currently do not match is as follows: because only the dummy atoms are made queries (based on my query parameter adjustment), when one mol is matched to another dummy 1 may match to F:11, but dummy 14 will then not match to methyl:14. This is because (as I understand), normal atoms can only be matched by queries, and cannot match them themselves. Potential ideas to make this work as I’d like: 1. Override atom.Match in the python code – not sure that this would work since the C++ version of this function is what would be called during GetSubstructMatches 2. Override atom.Match in the C++ code – not quite sure how to do this, or what side affects it might have. Ideally the changes I make would only affect this example (and other similar ones) 3. Make all atoms in both molecules QueryAtoms, but otherwise leave them unchanged. I’m not quite sure how to do this! Does anyone have any ideas for what the best approach here would be, or knows if there is already built in functionality for something like this? I’d prefer to not use SMARTS to construct my molecules if possible, since I don’t really think of them as queries, just as other molecules in the system that happen to not be fully specified. - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Extending RDKit Functionality in C++
Quick update in case others run into my segmentation fault problem: I figured out that my problem had to do with how python libraries are compiled in conda. As far as I understand it, conda compiles static python libraries, and the dynamic library that I was using does not necessarily reference the python that my conda environment provided. Long story short, if I add the following line to my CMakeLists.txt and do not try to link the conda python library with target_link_libraries, my problem goes away: set_target_properties(rdkit_extension PROPERTIES LINK_FLAGS "-undefined dynamic_lookup") Also, thanks for the namespace tip! This helped me solve another issue I ran into after the segmentation fault issue where the original RDKit ReactionRunner functions were being used still even though my new ones were defined (and I thought I was importing them in my python code). I guess the old ones were overwriting my new ones, since they were in the same namespace. - Kovas On 5/31/18, 10:50 AM, "Paul Emsley" wrote: On 31/05/2018 02:00, Kovas Palunas wrote: > If anyone has an idea for what is going wrong with my setup, or can point me to general tutorials for how to > get something like this working, I’m a little stuck at the moment. > When you make you copy of ReactionRunner.cpp/.h change them so that they use your namespace rather than RDKit:: Paul. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Extending RDKit Functionality in C++
Hi All, I’m attempting to write my own small RDKit extension library to store C++ functions I’ll write which will be called from Python. To start out, I have one function, which is a slightly modified RunReactants from the 2018.03 release. Previously, I directly modified this function in the RDKit source code, and then compiled the whole RDKit myself to use it (from python). This approach worked fine, but made it hard to update the RDKit to new versions and made other code organization difficult. I imagine as I make more changes/additions on the C++ side, this will get more awkward. I used the following steps to set up my extension library as it currently is: 1. Installed RDKit using Conda (2018.03 release with python 3.6) 2. Copied ReactionRunner.h and ReactionRunner.cpp and made my changes 3. Added a top level library.h file that includes my ReactionRunner changes and wraps them for python using the syntax used for RunReactants in RDKit 4. Added a cmake file 5. Compiled my library (on osx 10.13.4 using the XCode compiler) 6. Copied my library from where I built it into my conda python site-packages directory so that it can be imported by my conda python Currently, I can compile and import my library into python just fine, but I get a segmentation fault when I try to call my RunReactants function. I’m new at using boost python to wrap C++ functions, so I’m hoping I’m missing something obvious! I’ve tried using a debugger (lldb) to track down my error, but am having trouble parsing the output. If anyone has an idea for what is going wrong with my setup, or can point me to general tutorials for how to get something like this working, I’m a little stuck at the moment. Thanks! * Kovas Contents of my library.h file (ignore the hello function definition): #ifndef RDKIT_EXTENSION_LIBRARY_H #define RDKIT_EXTENSION_LIBRARY_H // #include #include #include #include #include #include #include #include #include // custom reaction runner header file #include "ReactionRunner.h" void hello(); namespace python = boost::python; template PyObject *RunReactants(RDKit::ChemicalReaction *rxn, T reactants) { if (!rxn->isInitialized()) { NOGIL gil; rxn->initReactantMatchers(); } RDKit::MOL_SPTR_VECT reacts; unsigned int len1 = python::extract(reactants.attr("__len__")()); reacts.resize(len1); for (unsigned int i = 0; i < len1; ++i) { reacts[i] = python::extract(reactants[i]); if (!reacts[i]) throw_value_error("reaction called with None reactants"); } std::vector mols; { NOGIL gil; mols = rxn->runReactants(reacts); } PyObject *res = PyTuple_New(mols.size()); for (unsigned int i = 0; i < mols.size(); ++i) { PyObject *lTpl = PyTuple_New(mols[i].size()); for (unsigned int j = 0; j < mols[i].size(); ++j) { PyTuple_SetItem(lTpl, j, python::converter::shared_ptr_to_python(mols[i][j])); } PyTuple_SetItem(res, i, lTpl); } return res; } BOOST_PYTHON_MODULE(librdkit_extension) { // def("hello", hello); python::def("RunReactants", (PyObject * (*)(RDKit::ChemicalReaction *, python::tuple))RunReactants); } #endif -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Building RDKit from source
Hi all, I thought I'd share a script I wrote to build RDKit and Boost together which has worked for me on Linux (CentOS) and Mac machines so far. I run RDKit in a virtualenv Python environment (not in anaconda), so this may only be helpful for a small group of RDKitters. Hopefully some of you do find this useful - it has personally saved me a lot of time getting RDKit installed on multiple machines. Note: please skim through the script to make sure you know what variables inside it are set to what before running - there are multiple ways to specify what code to build that may be useful for different purposes (and some are commented out). Make sure you pip install numpy before running (I should probably just add this to the script). Also, I have only tested this on RDKit 2016_09_3 and Boost 1_63_0. - Kovas build_rdkit.sh Description: build_rdkit.sh -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to number the outputs of a reaction?
There should be a post in there about changing the RDKit C++ code to make that property available. It's a very small change! - Kovas From: Jennifer Wei Sent: Friday, September 29, 2017 10:51:02 AM To: Kovas Palunas; rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] How to number the outputs of a reaction? Hi Kovas, Thank you so much for pointing me to this github issues page and for sharing your code! It is very helpful. I'm having a bit of trouble with the 'react_atom_idx' property. Where did this get set initially? If I try to run your code as written on the github page, it does not recognize this key. Thank you! Best, Jennifer On Thu, Sep 28, 2017 at 5:06 PM Kovas Palunas mailto:kovas.palu...@arzeda.com>> wrote: Hi Jennifer, I had this same issue a while back. Here is an issue I posted about it on the github: https://github.com/rdkit/rdkit/issues/1269<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_1269&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=hXZ3O7Hw9kEXazs6L_WExNyi7ezPgiZXuii8kvqDRA8&s=eJJg7q8uOJQMmNIHUCyRODhdtFsi_ZXhch3Dn74wk08&e=> I never did make the pull request mentioned in the issue, but all the code that does what you want should be in there. Let me know if you have any other questions, I've spent a good amount of time on this problem. - Kovas From: Jennifer Wei mailto:jennifer...@fas.harvard.edu>> Sent: Thursday, September 28, 2017 11:47:53 AM To: rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: [Rdkit-discuss] How to number the outputs of a reaction? Hi All, I am working with atom mapping for reactions. How do I get the correct atom mapping for my products? I have tried the following: >> rxn = >> rdChemReactions.ReactionFromSmarts('[C:1](=[O:2])O.[N:3]>>[C:1](=[O:2])[N:3]') >> rcts_lab = (Chem.MolFromSmiles('[C:1](=[O:2])[O:3]'), >> Chem.MolFromSmiles('[C:4][N:5][C:6]')) >> pcts_lab = rxn.RunReactants(rcts_lab) >> Chem.MolToSmiles(pcts_lab[0][0]) 'O=CN([C:4])[C:6]' I would like the product to be fully labeled, so I get this on the last line instead. '[O:2]=[C:1][N:5]([C:4])[C:6]' Thank you in advance for any help you can provide me. Best, Jennifer -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] How to number the outputs of a reaction?
Hi Jennifer, I had this same issue a while back. Here is an issue I posted about it on the github: https://github.com/rdkit/rdkit/issues/1269 I never did make the pull request mentioned in the issue, but all the code that does what you want should be in there. Let me know if you have any other questions, I've spent a good amount of time on this problem. - Kovas From: Jennifer Wei Sent: Thursday, September 28, 2017 11:47:53 AM To: rdkit-discuss@lists.sourceforge.net Subject: [Rdkit-discuss] How to number the outputs of a reaction? Hi All, I am working with atom mapping for reactions. How do I get the correct atom mapping for my products? I have tried the following: >> rxn = >> rdChemReactions.ReactionFromSmarts('[C:1](=[O:2])O.[N:3]>>[C:1](=[O:2])[N:3]') >> rcts_lab = (Chem.MolFromSmiles('[C:1](=[O:2])[O:3]'), >> Chem.MolFromSmiles('[C:4][N:5][C:6]')) >> pcts_lab = rxn.RunReactants(rcts_lab) >> Chem.MolToSmiles(pcts_lab[0][0]) 'O=CN([C:4])[C:6]' I would like the product to be fully labeled, so I get this on the last line instead. '[O:2]=[C:1][N:5]([C:4])[C:6]' Thank you in advance for any help you can provide me. Best, Jennifer -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible. I would mostly want to use them for substructure matching, running reactions, and also display purposes. Also, basic atom queries, such as getting a mapping number or a atom symbol. I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.). Does this make sense? - Kovas From: Greg Landrum Sent: Wednesday, September 27, 2017 2:27:04 AM To: Kovas Palunas Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas mailto:kovas.palu...@arzeda.com>> wrote: Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Masking groups as atoms in RDKit
The way i was thinking about it, the smarts of OCC would not match the O[but] because [but] is a totally new atom that is not related to carbon at all. This doesn't really make sense in this example, but it does (i think) for most of my purposes (where i want to mask away a biological macromolecule that i do not want to interact with). There are probably still edge cases i'm not seeing... but maybe it's still worth a try? I saw there was a periodic table module in RDKit. Is it possible to add these atoms there? - Kovas From: Greg Landrum Sent: Wednesday, September 27, 10:13 PM Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit To: Kovas Palunas Cc: rdkit-discuss@lists.sourceforge.net I'm afraid that there's likely to be rather a lot of devil hiding in the details (as is so often the case). A simple example of one problem: let's take your [But]O case. Suppose you do a substructure search for the molecule defined by the SMARTS "OCC". Does that match "[But]O"? What does it return when I ask for the substructure matches (this function, if you aren't familiar with it, returns the indices of the matching atoms)? What about the SMARTS "CC"? One solution to this that works with substructure searching is to have the molecule contain all the atoms - "O" in your example - but to have the four C atoms marked as a group so that drawings of the molecule display "[But]O". Supporting this type of functionality is on the To Do list (it's part of supporting S Groups from Mol files). If you just want to indicate that there is a [But] group there but not really do anything with the group's structure, there's are probably already ways to handle this using dummy atoms and custom labels. -greg On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas mailto:kovas.palu...@arzeda.com>> wrote: Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as possible. I would mostly want to use them for substructure matching, running reactions, and also display purposes. Also, basic atom queries, such as getting a mapping number or a atom symbol. I was thinking that maybe this could be done by just defining the CoA atom type (for example) just as the carbon or oxygen atom types are defined (setting atomic weight, valences, etc.). Does this make sense? - Kovas From: Greg Landrum mailto:greg.land...@gmail.com>> Sent: Wednesday, September 27, 2017 2:27:04 AM To: Kovas Palunas Cc: rdkit<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit Where would you want to use this? Is it for depiction (i.e. drawing molecules) or something else? -greg On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas mailto:kovas.palu...@arzeda.com>> wrote: Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit<mailto:Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdkit<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Masking groups as atoms in RDKit
Hi all, Has anyone tried implementing or using a group to atom masking strategy in RDKit? By this I mean taking a piece of a molecule and representing it as a single atom. Here is an example: O could be represented as [But]O, where the atom [But] represents the four carbon chain. In my case I'm particularly interested is using this strategy to represent large biological molecules / molecule pieces, such as coenzyme A. If I were to implement this myself, is there a place in RDKit where atom types can be defined? Thanks! - Kovas -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Cheminformatics Job Opportunity
We use RDKit and similar tools a lot, and experience with them is highly desired for this position! If there is a better rdkit list for this kind of message, please let me know. [Job Description] Arzeda is seeking a full-stack computational scientist & software engineer equally at home with chemistry and biology to make significant contributions to our computational metabolic pathway design stack, Scylax. The candidates responsibilities will be: * Developing algorithms to probe the entirety of biodesignable space for novel biochemical synthesis strategies. * Enhancing and improving the performance of Scylax. * Connecting genomic and other biological data to computationally designed pathways. * Designing and implementing methods for storing and versioning large metabolic datasets. * Interfacing the Scylax toolchain with in-house biochemical databases. [Job Requirements] The successful candidate will hold an MS or PhD in biology, biochemistry, chemistry, chemical engineering, or related field with at least 2+ years of industry experience or postdoctoral study. Experience developing software in Python, C++, or Java is required. Candidates should have experience drawn from multiple aspects of: * Designing novel metabolic pathways or tackling organic retrosynthesis problems. * Developing software with a cheminformatics package such as RDKit, CDK, OpenEye, or the ChemAxon toolset. * Working with bioinformatics tools such as BLAST, HMMER, multiple alignment and other search methods to process and analyze genomic data. * Working with metabolic models, such as kinetic modeling, flux balance analysis or other COBRA methodologies, and whole genome modeling. * Using or developing with a relational database such as MySQL or PostgreSQL. * Developing algorithms for operations over graphs, such as traversal and pathfinding. * Computational chemistry or studying small molecules in-silico (e.g. quantum chemistry, virtual screening, lead identification, 3D-QSAR). The successful candidate must be willing to work in a small, driven environment, be able to work autonomously and possess strong interpersonal skills. Ability and willingness to work in a team is absolutely necessary. Finally, Arzeda seeks and greatly values a strong desire to learn, innovate and be continuously challenged. [Contact] For further information or to apply, please send your CV and a cover letter to jobs(AT)arzeda.com. [Company Overview] Since 2008 Arzeda has been harnessing the power of computational and synthetic biology to create new enzymes and chemical products that can compete on cost, sustainability and performance. In partnership with Fortune 500 companies and industrial leaders, the company has developed a portfolio of enzymes and specialty chemicals for polymers, pharmaceuticals, industrial chemicals and other advanced materials. Arzeda's proprietary platform and validation process rapidly creates "cell factories" that can be used at industrial scales to solve problems and create opportunities that otherwise would be impossible. At Arzeda, you will find a team of young and highly motivated scientists and technologists who are passionate about biochemistry and computing and making the world a better place. We collectively believe that innovative algorithms and software are key to advance synthetic biology and bring to the market exciting new molecules. Arzeda Corp. (www.arzeda.com) is an equal opportunity employer promoting diversity and inclusion in the workspace. KOVAS PALUNAS SOFTWARE DEVELOPER Arzeda Corp. T: 206.402.6506 2715 W Fort Street Seattle, WA 98199 - USA www.arzeda.com<http://www.arzeda.com/> Follow us on twitter<https://www.twitter.com/ArzedaCo> Connect on linkedin<http://www.linkedin.com/company/arzeda-corp> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss