Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit

2018-12-04 Thread Kovas Palunas
Ok cool!  I did actually just run into an issue while doing some tests: 
https://github.com/rdkit/rdkit/issues/2183.  This issue brings up a question 
for me about where the smiles writer actually looks for stereochem info when it 
decides what to write.

Also, I ran into the same snag earlier too Dan!

- Kovas

From: Dan Nealschneider 
Date: Tuesday, December 4, 2018 at 10:05 AM
To: "col...@gmail.com" 
Cc: Kovas Palunas , rdkit discuss 

Subject: Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit

I've done some in-memory translation of molecules to ROMols, and have used #2 
without major problems. I do remember needing to make sure that the stereoatoms 
are in the correct order - that is, that the first stereoatom is bonded to the 
beginAtom of the bond. In Python, this is something like:

bond = mol.GetBondBetweenAtoms(begin, end)
if bond.GetBeginAtomIdx() != begin:
 assert bond.GetBeginAtomIdx() == end
 stereoatom1, stereoatom2 = stereoatom2, stereoatom1
bond.SetStereoAtoms(stereoatom1, stereoatom2)
bond.SetStereo(stereo)

- dan nealschneider

(né wandschneider)

Senior Developer
Schrödinger, Inc
Portland, OR



On Tue, Dec 4, 2018 at 7:02 AM Brian Cole 
mailto:col...@gmail.com>> wrote:
Hi Kovas,

For your use-case #2 should suffice, "set STEREOCIS/STEREOTRANS tags + manually 
set stereo atoms". This is what the EnumerateStereoisomers code does: 
https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/EnumerateStereoisomers.py#L38

As to what is the 'ground truth', that is a more difficult question that I fear 
the answer may be 'none of them'. STEREOCIS/STEREOTRANS are rather recent 
additions to the RDKit API, while we strived to make sure STEREOCIS/STEREOTRANS 
across the RDKit, there are probably looming bugs in untested parts of the 
RDKit that don't handle them properly. However, I think those other APIs should 
be fixed to handle them properly, so please do report any problems you spot 
into the github issue tracker.

Cheers,
Brian



On Mon, Dec 3, 2018 at 7:00 PM Kovas Palunas 
mailto:kovas.palu...@arzeda.com>> wrote:
Hi All,

I’m looking for a bit more clarity regarding double bond stereochem in RDKit.  
Currently, my understanding is that there are 3 ways to currently store this 
information:


  1.  STEREOE/STEREOZ tags + stereo atoms on either side of bond set by CIP 
ranks, as computed when calling MolFromSmiles to make a new molecule or 
AssignStereochemistry on an existing molecule
  2.  Manually set STEREOCIS/STEREOTRANS tags + manually set stereo atoms
  3.  ENDUPRIGHT/etc. single bond directionality tags, which are set when 
reading a molecule from smiles/inchi/mol file

Is one of these methods the “ground truth” that is looked for by RDKit 
functions that care about this info, like the substructure matching code or the 
SMILES writing code?

I am currently working on code that mutates molecules using a predetermined 
list of changes to be made to the molecule.  I’d like to be able to include 
bond stereochemistry changing/creation/destruction here, and was thinking of 
doing so using the STEREOCIS/STEREOTRANS tags (and also providing the reference 
stereo atoms).  Before I do this I want to make sure that molecules with these 
tags will be handled correctly by other RDKit functions downstream.  Would 
these tags be a good choice here?  Are there any caveats I should keep in mind 
as I work with this information?

Thanks!

- Kovas

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Double Bond Stereochemistry in the RDKit

2018-12-03 Thread Kovas Palunas
Hi All,

I’m looking for a bit more clarity regarding double bond stereochem in RDKit.  
Currently, my understanding is that there are 3 ways to currently store this 
information:


  1.  STEREOE/STEREOZ tags + stereo atoms on either side of bond set by CIP 
ranks, as computed when calling MolFromSmiles to make a new molecule or 
AssignStereochemistry on an existing molecule
  2.  Manually set STEREOCIS/STEREOTRANS tags + manually set stereo atoms
  3.  ENDUPRIGHT/etc. single bond directionality tags, which are set when 
reading a molecule from smiles/inchi/mol file

Is one of these methods the “ground truth” that is looked for by RDKit 
functions that care about this info, like the substructure matching code or the 
SMILES writing code?

I am currently working on code that mutates molecules using a predetermined 
list of changes to be made to the molecule.  I’d like to be able to include 
bond stereochemistry changing/creation/destruction here, and was thinking of 
doing so using the STEREOCIS/STEREOTRANS tags (and also providing the reference 
stereo atoms).  Before I do this I want to make sure that molecules with these 
tags will be handled correctly by other RDKit functions downstream.  Would 
these tags be a good choice here?  Are there any caveats I should keep in mind 
as I work with this information?

Thanks!

- Kovas

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Matching Generalized Compounds

2018-08-31 Thread Kovas Palunas
I ended up just adding the following code to the atom matching function (after 
moving it and copies of the substructure code that call it to a new library to 
avoid other internal RDKit stuff calling the modified code):

  int a1n = a1->getAtomicNum();
  int a2n = a2->getAtomicNum();
  // dummy atoms have atomic number 0
  if (a1n == 0 || a2n == 0) {
  return true;
  }

So far, it seems to work great for me.  If anyone else cares for this 
functionality, I’d be happy to share more details/open a pull request!

- Kovas

From: Kovas Palunas 
Date: Thursday, August 23, 2018 at 10:20 AM
To: Christos Kannas 
Cc: RDKit , Paolo Tosco 

Subject: Re: [Rdkit-discuss] Matching Generalized Compounds

Thanks for the feedback and code example!

I understand that it works to make a third query mol using MCS that matches 
both the original mols to then match with.  However, this seems like overkill 
(overly expensive) for this particular problem – as I understand it MCS can be 
very expensive depending on the compounds you are comparing.  Would it not work 
to simply override the atom.Match function with one that will always match 
dummies no matter what the other atom is?  I am not planning to compare SMARTSy 
queries with my matching with any complexity beyond simply dummy atoms.  In 
fact, as I understand it, my example compounds are not made up of any query 
atoms when they are read into rdkit – the dummies are just made into queries 
after the read by the QueryParameters code.  I am definitely not interested in 
doing generic query to query matching.

- Kovas


From: Christos Kannas 
Date: Thursday, August 23, 2018 at 7:53 AM
To: Kovas Palunas 
Cc: RDKit , Paolo Tosco 

Subject: Re: [Rdkit-discuss] Matching Generalized Compounds

Hi Kovas,

You have two fuzzy compounds that you try to match them, because our intuition 
says that any atom notation [*:1] from m1 should match the Fluorine [F:11] in 
m2 and any atom [*:14] in m2 should match Carbon [CH3:4] in m1.
The issue here is that you create two query compounds from m1 and m2 which will 
match their own specific substructures. Query to query matching is not trivial.

In order to do what you want you need a query compound that combines their 
characteristic, which is what Paolo showed.
Paolo with MCS and modifying atom properties created that query compound 
'[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or 
'[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]'
Also bare in mind that Paolo's approach changed the starting compounds, as now 
they resemble the generic query compound that combines their fuzzy atoms.

https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030

Best,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[Image removed by sender. View Christos Kannas's profile on 
LinkedIn]<http://cy.linkedin.com/in/christoskannas>


On Thu, 23 Aug 2018 at 12:36, Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:

Dear Kovas,

It looks like GetSubstructMatch() only finds a match if the dummy atom is in 
the query, not if it is in the molecule they you are matching the query against.

This notebook present a possible solution off the top of my head:

https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831

which does not involve changes to the C++ layer, even though it is 
computationally more expensive and will fail with disconnected fragments as it 
uses FindMCS(). There may be better solutions - this is what I came out with 
yesterday night in the little time I had available.

Cheers,
P.

On 08/22/18 19:34, Kovas Palunas wrote:
Hi All,

I’m interested in having GetSubstructMatches return non-“null” results in the 
following example.  The results should lead to a match where atom 1 maps to 
atom 11, 2 to 12, etc.

m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')

### do something here so that the mols will match ###
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)

# I’d like both of the following to return results
m1.GetSubstructMatches(m2)
m2.GetSubstructMatches(m1)

My understanding of why these mols currently do not match is as follows:  
because only the dummy atoms are made queries (based on my query parameter 
adjustment), when one mol is matched to another dummy 1 may match to F:11, but 
dummy 14 will then not match to methyl:14.  This is because (as I understand), 
normal atoms can only be matched by queries, and cannot match them themselves.

Potential ideas to make this work as I’d like:

  1.  Override atom.Match in the python code – not sure that this would work 
since the C++ version of this function is what would be called during 
GetSubstructMatches
  2.  Override atom.Match in the C++ code – not quite sure how to do this, or 
what side affects it might have.  Ideal

Re: [Rdkit-discuss] Matching Generalized Compounds

2018-08-23 Thread Kovas Palunas
Thanks for the feedback and code example!

I understand that it works to make a third query mol using MCS that matches 
both the original mols to then match with.  However, this seems like overkill 
(overly expensive) for this particular problem – as I understand it MCS can be 
very expensive depending on the compounds you are comparing.  Would it not work 
to simply override the atom.Match function with one that will always match 
dummies no matter what the other atom is?  I am not planning to compare SMARTSy 
queries with my matching with any complexity beyond simply dummy atoms.  In 
fact, as I understand it, my example compounds are not made up of any query 
atoms when they are read into rdkit – the dummies are just made into queries 
after the read by the QueryParameters code.  I am definitely not interested in 
doing generic query to query matching.

- Kovas


From: Christos Kannas 
Date: Thursday, August 23, 2018 at 7:53 AM
To: Kovas Palunas 
Cc: RDKit , Paolo Tosco 

Subject: Re: [Rdkit-discuss] Matching Generalized Compounds

Hi Kovas,

You have two fuzzy compounds that you try to match them, because our intuition 
says that any atom notation [*:1] from m1 should match the Fluorine [F:11] in 
m2 and any atom [*:14] in m2 should match Carbon [CH3:4] in m1.
The issue here is that you create two query compounds from m1 and m2 which will 
match their own specific substructures. Query to query matching is not trivial.

In order to do what you want you need a query compound that combines their 
characteristic, which is what Paolo showed.
Paolo with MCS and modifying atom properties created that query compound 
'[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or 
'[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]'
Also bare in mind that Paolo's approach changed the starting compounds, as now 
they resemble the generic query compound that combines their fuzzy atoms.

https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030

Best,

Christos

Christos Kannas

Chem[o]informatics Researcher & Software Developer

[Image removed by sender. View Christos Kannas's profile on 
LinkedIn]<http://cy.linkedin.com/in/christoskannas>


On Thu, 23 Aug 2018 at 12:36, Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:

Dear Kovas,

It looks like GetSubstructMatch() only finds a match if the dummy atom is in 
the query, not if it is in the molecule they you are matching the query against.

This notebook present a possible solution off the top of my head:

https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831

which does not involve changes to the C++ layer, even though it is 
computationally more expensive and will fail with disconnected fragments as it 
uses FindMCS(). There may be better solutions - this is what I came out with 
yesterday night in the little time I had available.

Cheers,
P.

On 08/22/18 19:34, Kovas Palunas wrote:
Hi All,

I’m interested in having GetSubstructMatches return non-“null” results in the 
following example.  The results should lead to a match where atom 1 maps to 
atom 11, 2 to 12, etc.

m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')

### do something here so that the mols will match ###
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)

# I’d like both of the following to return results
m1.GetSubstructMatches(m2)
m2.GetSubstructMatches(m1)

My understanding of why these mols currently do not match is as follows:  
because only the dummy atoms are made queries (based on my query parameter 
adjustment), when one mol is matched to another dummy 1 may match to F:11, but 
dummy 14 will then not match to methyl:14.  This is because (as I understand), 
normal atoms can only be matched by queries, and cannot match them themselves.

Potential ideas to make this work as I’d like:

  1.  Override atom.Match in the python code – not sure that this would work 
since the C++ version of this function is what would be called during 
GetSubstructMatches
  2.  Override atom.Match in the C++ code – not quite sure how to do this, or 
what side affects it might have.  Ideally the changes I make would only affect 
this example (and other similar ones)
  3.  Make all atoms in both molecules QueryAtoms, but otherwise leave them 
unchanged.  I’m not quite sure how to do this!

Does anyone have any ideas for what the best approach here would be, or knows 
if there is already built in functionality for something like this?  I’d prefer 
to not use SMARTS to construct my molecules if possible, since I don’t really 
think of them as queries, just as other molecules in the system that happen to 
not be fully specified.

- Kovas




--

Check out the vibrant tech community on one of the world&#

[Rdkit-discuss] Matching Generalized Compounds

2018-08-22 Thread Kovas Palunas
Hi All,

I’m interested in having GetSubstructMatches return non-“null” results in the 
following example.  The results should lead to a match where atom 1 maps to 
atom 11, 2 to 12, etc.

m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')

### do something here so that the mols will match ###
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)

# I’d like both of the following to return results
m1.GetSubstructMatches(m2)
m2.GetSubstructMatches(m1)

My understanding of why these mols currently do not match is as follows:  
because only the dummy atoms are made queries (based on my query parameter 
adjustment), when one mol is matched to another dummy 1 may match to F:11, but 
dummy 14 will then not match to methyl:14.  This is because (as I understand), 
normal atoms can only be matched by queries, and cannot match them themselves.

Potential ideas to make this work as I’d like:

  1.  Override atom.Match in the python code – not sure that this would work 
since the C++ version of this function is what would be called during 
GetSubstructMatches
  2.  Override atom.Match in the C++ code – not quite sure how to do this, or 
what side affects it might have.  Ideally the changes I make would only affect 
this example (and other similar ones)
  3.  Make all atoms in both molecules QueryAtoms, but otherwise leave them 
unchanged.  I’m not quite sure how to do this!

Does anyone have any ideas for what the best approach here would be, or knows 
if there is already built in functionality for something like this?  I’d prefer 
to not use SMARTS to construct my molecules if possible, since I don’t really 
think of them as queries, just as other molecules in the system that happen to 
not be fully specified.

- Kovas

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extending RDKit Functionality in C++

2018-05-31 Thread Kovas Palunas
Quick update in case others run into my segmentation fault problem: I figured 
out that my problem had to do with how python libraries are compiled in conda.  
As far as I understand it, conda compiles static python libraries, and the 
dynamic library that I was using does not necessarily reference the python that 
my conda environment provided.  Long story short, if I add the following line 
to my CMakeLists.txt and do not try to link the conda python library with 
target_link_libraries, my problem goes away:

set_target_properties(rdkit_extension PROPERTIES LINK_FLAGS "-undefined 
dynamic_lookup")

Also, thanks for the namespace tip!  This helped me solve another issue I ran 
into after the segmentation fault issue where the original RDKit ReactionRunner 
functions were being used still even though my new ones were defined (and I 
thought I was importing them in my python code).  I guess the old ones were 
overwriting my new ones, since they were in the same namespace.  

 - Kovas


On 5/31/18, 10:50 AM, "Paul Emsley"  wrote:

On 31/05/2018 02:00, Kovas Palunas wrote:
> If anyone has an idea for what is going wrong with my setup, or can point 
me to general tutorials for how to 
> get something like this working, I’m a little stuck at the moment.
> 

When you make you copy of ReactionRunner.cpp/.h change them so that they 
use your namespace rather than RDKit::

Paul.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Extending RDKit Functionality in C++

2018-05-30 Thread Kovas Palunas
Hi All,

I’m attempting to write my own small RDKit extension library to store C++ 
functions I’ll write which will be called from Python.  To start out, I have 
one function, which is a slightly modified RunReactants from the 2018.03 
release.  Previously, I directly modified this function in the RDKit source 
code, and then compiled the whole RDKit myself to use it (from python).  This 
approach worked fine, but made it hard to update the RDKit to new versions and 
made other code organization difficult.  I imagine as I make more 
changes/additions on the C++ side, this will get more awkward.

I used the following steps to set up my extension library as it currently is:

  1.  Installed RDKit using Conda (2018.03 release with python 3.6)
  2.  Copied ReactionRunner.h and ReactionRunner.cpp and made my changes
  3.  Added a top level library.h file that includes my ReactionRunner changes 
and wraps them for python using the syntax used for RunReactants in RDKit
  4.  Added a cmake file
  5.  Compiled my library (on osx 10.13.4 using the XCode compiler)
  6.  Copied my library from where I built it into my conda python 
site-packages directory so that it can be imported by my conda python

Currently, I can compile and import my library into python just fine, but I get 
a segmentation fault when I try to call my RunReactants function.  I’m new at 
using boost python to wrap C++ functions, so I’m hoping I’m missing something 
obvious!  I’ve tried using a debugger (lldb) to track down my error, but am 
having trouble parsing the output.

If anyone has an idea for what is going wrong with my setup, or can point me to 
general tutorials for how to get something like this working, I’m a little 
stuck at the moment.

Thanks!


  *   Kovas

Contents of my library.h file (ignore the hello function definition):

#ifndef RDKIT_EXTENSION_LIBRARY_H
#define RDKIT_EXTENSION_LIBRARY_H

// #include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 

// custom reaction runner header file
#include "ReactionRunner.h"

void hello();

namespace python = boost::python;

template 
PyObject *RunReactants(RDKit::ChemicalReaction *rxn, T reactants) {
if (!rxn->isInitialized()) {
NOGIL gil;
rxn->initReactantMatchers();
}
RDKit::MOL_SPTR_VECT reacts;
unsigned int len1 =
python::extract(reactants.attr("__len__")());
reacts.resize(len1);
for (unsigned int i = 0; i < len1; ++i) {
reacts[i] = python::extract(reactants[i]);
if (!reacts[i]) throw_value_error("reaction called with None 
reactants");
}
std::vector mols;
{
NOGIL gil;
mols = rxn->runReactants(reacts);
}
PyObject *res = PyTuple_New(mols.size());

for (unsigned int i = 0; i < mols.size(); ++i) {
PyObject *lTpl = PyTuple_New(mols[i].size());
for (unsigned int j = 0; j < mols[i].size(); ++j) {
PyTuple_SetItem(lTpl, j,

python::converter::shared_ptr_to_python(mols[i][j]));
}
PyTuple_SetItem(res, i, lTpl);
}
return res;
}

BOOST_PYTHON_MODULE(librdkit_extension) {
// def("hello", hello);
python::def("RunReactants", (PyObject * (*)(RDKit::ChemicalReaction *, 
python::tuple))RunReactants);
}

#endif
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Building RDKit from source

2017-10-02 Thread Kovas Palunas
Hi all,


I thought I'd share a script I wrote to build RDKit and Boost together which 
has worked for me on Linux (CentOS) and Mac machines so far.  I run RDKit in a 
virtualenv Python environment (not in anaconda), so this may only be helpful 
for a small group of RDKitters.  Hopefully some of you do find this useful - it 
has personally saved me a lot of time getting RDKit installed on multiple 
machines.


Note: please skim through the script to make sure you know what variables 
inside it are set to what before running - there are multiple ways to specify 
what code to build that may be useful for different purposes (and some are 
commented out).


Make sure you pip install numpy before running (I should probably just add this 
to the script).


Also, I have only tested this on RDKit 2016_09_3 and Boost 1_63_0.


 - Kovas



build_rdkit.sh
Description: build_rdkit.sh
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to number the outputs of a reaction?

2017-09-29 Thread Kovas Palunas
There should be a post in there about changing the RDKit C++ code to make that 
property available.  It's a very small change!


 - Kovas



From: Jennifer Wei 
Sent: Friday, September 29, 2017 10:51:02 AM
To: Kovas Palunas; rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] How to number the outputs of a reaction?

Hi Kovas,

Thank you so much for pointing me to this github issues page and for sharing 
your code! It is very helpful.

I'm having a bit of trouble with the 'react_atom_idx' property. Where did this 
get set initially? If I try to run your code as written on the github page, it 
does not recognize this key.

Thank you!

Best,
Jennifer

On Thu, Sep 28, 2017 at 5:06 PM Kovas Palunas 
mailto:kovas.palu...@arzeda.com>> wrote:
Hi Jennifer,

I had this same issue a while back.  Here is an issue I posted about it on the 
github: 
https://github.com/rdkit/rdkit/issues/1269<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_1269&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=hXZ3O7Hw9kEXazs6L_WExNyi7ezPgiZXuii8kvqDRA8&s=eJJg7q8uOJQMmNIHUCyRODhdtFsi_ZXhch3Dn74wk08&e=>

I never did make the pull request mentioned in the issue, but all the code that 
does what you want should be in there.  Let me know if you have any other 
questions, I've spent a good amount of time on this problem.

 - Kovas


From: Jennifer Wei 
mailto:jennifer...@fas.harvard.edu>>
Sent: Thursday, September 28, 2017 11:47:53 AM
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] How to number the outputs of a reaction?

Hi All,

I am working with atom mapping for reactions. How do I get the correct atom 
mapping for my products?

I have tried the following:
>> rxn = 
>> rdChemReactions.ReactionFromSmarts('[C:1](=[O:2])O.[N:3]>>[C:1](=[O:2])[N:3]')
>> rcts_lab = (Chem.MolFromSmiles('[C:1](=[O:2])[O:3]'), 
>> Chem.MolFromSmiles('[C:4][N:5][C:6]'))
>> pcts_lab = rxn.RunReactants(rcts_lab)
>> Chem.MolToSmiles(pcts_lab[0][0])
  'O=CN([C:4])[C:6]'

I would like the product to be fully labeled, so I get this on the last line 
instead.
  '[O:2]=[C:1][N:5]([C:4])[C:6]'

Thank you in advance for any help you can provide me.

Best,
Jennifer

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to number the outputs of a reaction?

2017-09-28 Thread Kovas Palunas
Hi Jennifer,

I had this same issue a while back.  Here is an issue I posted about it on the 
github: https://github.com/rdkit/rdkit/issues/1269

I never did make the pull request mentioned in the issue, but all the code that 
does what you want should be in there.  Let me know if you have any other 
questions, I've spent a good amount of time on this problem.

 - Kovas


From: Jennifer Wei 
Sent: Thursday, September 28, 2017 11:47:53 AM
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] How to number the outputs of a reaction?

Hi All,

I am working with atom mapping for reactions. How do I get the correct atom 
mapping for my products?

I have tried the following:
>> rxn = 
>> rdChemReactions.ReactionFromSmarts('[C:1](=[O:2])O.[N:3]>>[C:1](=[O:2])[N:3]')
>> rcts_lab = (Chem.MolFromSmiles('[C:1](=[O:2])[O:3]'), 
>> Chem.MolFromSmiles('[C:4][N:5][C:6]'))
>> pcts_lab = rxn.RunReactants(rcts_lab)
>> Chem.MolToSmiles(pcts_lab[0][0])
  'O=CN([C:4])[C:6]'

I would like the product to be fully labeled, so I get this on the last line 
instead.
  '[O:2]=[C:1][N:5]([C:4])[C:6]'

Thank you in advance for any help you can provide me.

Best,
Jennifer

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Kovas Palunas
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as 
possible.  I would mostly want to use them for substructure matching, running 
reactions, and also display purposes.  Also, basic atom queries, such as 
getting a mapping number or a atom symbol.

I was thinking that maybe this could be done by just defining the CoA atom type 
(for example) just as the carbon or oxygen atom types are defined (setting 
atomic weight, valences, etc.).

Does this make sense?

 - Kovas

From: Greg Landrum 
Sent: Wednesday, September 27, 2017 2:27:04 AM
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
mailto:kovas.palu...@arzeda.com>> wrote:

Hi all,


Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:


O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.


In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.


If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?


Thanks!


 - Kovas


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-28 Thread Kovas Palunas
The way i was thinking about it, the smarts of OCC would not match the O[but] 
because [but] is a totally new atom that is not related to carbon at all.  This 
doesn't really make sense in this example, but it does (i think) for most of my 
purposes (where i want to mask away a biological macromolecule that i do not 
want to interact with).

There are probably still edge cases i'm not seeing... but maybe it's still 
worth a try?  I saw there was a periodic table module in RDKit.  Is it possible 
to add these atoms there?

- Kovas


From: Greg Landrum
Sent: Wednesday, September 27, 10:13 PM
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit
To: Kovas Palunas
Cc: rdkit-discuss@lists.sourceforge.net



I'm afraid that there's likely to be rather a lot of devil hiding in the 
details (as is so often the case).

A simple example of one problem: let's take your [But]O case. Suppose you do a 
substructure search for the molecule defined by the SMARTS "OCC". Does that 
match "[But]O"?  What does it return when I ask for the substructure matches 
(this function, if you aren't familiar with it, returns the indices of the 
matching atoms)? What about the SMARTS "CC"?

One solution to this that works with substructure searching is to have the 
molecule contain all the atoms - "O" in your example - but to have the four 
C atoms marked as a group so that drawings of the molecule display "[But]O". 
Supporting this type of functionality is on the To Do list (it's part of 
supporting S Groups from Mol files).

If you just want to indicate that there is a [But] group there but not really 
do anything with the group's structure, there's are probably already ways to 
handle this using dummy atoms and custom labels.

-greg




On Wed, Sep 27, 2017 at 9:26 PM, Kovas Palunas 
mailto:kovas.palu...@arzeda.com>> wrote:
Ideally, I'd like to treat these pseudoatoms as similarly to normal atoms as 
possible.  I would mostly want to use them for substructure matching, running 
reactions, and also display purposes.  Also, basic atom queries, such as 
getting a mapping number or a atom symbol.

I was thinking that maybe this could be done by just defining the CoA atom type 
(for example) just as the carbon or oxygen atom types are defined (setting 
atomic weight, valences, etc.).

Does this make sense?

 - Kovas
From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Wednesday, September 27, 2017 2:27:04 AM
To: Kovas Palunas
Cc: 
rdkit<mailto:rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Masking groups as atoms in RDKit

Where would you want to use this?
Is it for depiction (i.e. drawing molecules) or something else?

-greg


On Tue, Sep 26, 2017 at 10:12 PM, Kovas Palunas 
mailto:kovas.palu...@arzeda.com>> wrote:
Hi all,

Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:

O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.

In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.

If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?

Thanks!

 - Kovas


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit<mailto:Rdkit-discuss@lists.sourceforge.net>-disc...@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>listinfo<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>/<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>rdkit<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Masking groups as atoms in RDKit

2017-09-26 Thread Kovas Palunas
Hi all,


Has anyone tried implementing or using a group to atom masking strategy in 
RDKit?  By this I mean taking a piece of a molecule and representing it as a 
single atom.  Here is an example:


O  could be represented as  [But]O, where the atom [But] represents the 
four carbon chain.


In my case I'm particularly interested is using this strategy to represent 
large biological molecules / molecule pieces, such as coenzyme A.


If I were to implement this myself, is there a place in RDKit where atom types 
can be defined?


Thanks!


 - Kovas

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Cheminformatics Job Opportunity

2017-07-27 Thread Kovas Palunas
We use RDKit and similar tools a lot, and experience with them is highly 
desired for this position!  If there is a better rdkit list for this kind of 
message, please let me know.




[Job Description]
Arzeda is seeking a full-stack computational scientist & software engineer
equally at home with chemistry and biology to make significant
contributions to our computational metabolic pathway design stack,
Scylax. The candidates responsibilities will be:

* Developing algorithms to probe the entirety of biodesignable space for
  novel biochemical synthesis strategies.
* Enhancing and improving the performance of Scylax.
* Connecting genomic and other biological data to computationally designed
  pathways.
* Designing and implementing methods for storing and versioning large
  metabolic datasets.
* Interfacing the Scylax toolchain with in-house biochemical databases.

[Job Requirements]
The successful candidate will hold an MS or PhD in biology, biochemistry,
chemistry, chemical engineering, or related field with at least 2+ years
of industry experience or postdoctoral study. Experience developing
software in Python, C++, or Java is required. Candidates should have
experience drawn from multiple aspects of:

* Designing novel metabolic pathways or tackling organic retrosynthesis
  problems.
* Developing software with a cheminformatics package such as RDKit, CDK,
  OpenEye, or the ChemAxon toolset.
* Working with bioinformatics tools such as BLAST, HMMER, multiple
  alignment and other search methods to process and analyze genomic data.
* Working with metabolic models, such as kinetic modeling, flux balance
  analysis or other COBRA methodologies, and whole genome modeling.
* Using or developing with a relational database such as MySQL or
  PostgreSQL.
* Developing algorithms for operations over graphs, such as traversal and
  pathfinding.
* Computational chemistry or studying small molecules in-silico (e.g.
  quantum chemistry, virtual screening, lead identification, 3D-QSAR).

The successful candidate must be willing to work in a small, driven
environment, be able to work autonomously and possess strong interpersonal
skills. Ability and willingness to work in a team is absolutely necessary.
Finally, Arzeda seeks and greatly values a strong desire to learn,
innovate and be continuously challenged.

[Contact]
For further information or to apply, please send your CV and a cover
letter to jobs(AT)arzeda.com.

[Company Overview]
Since 2008 Arzeda has been harnessing the power of computational and
synthetic biology to create new enzymes and chemical products that can
compete on cost, sustainability and performance. In partnership with
Fortune 500 companies and industrial leaders, the company has developed a
portfolio of enzymes and specialty chemicals for polymers,
pharmaceuticals, industrial chemicals and other advanced materials.
Arzeda's proprietary platform and validation process rapidly creates
"cell factories" that can be used at industrial scales to solve problems
and create opportunities that otherwise would be impossible. At Arzeda,
you will find a team of young and highly motivated scientists and
technologists who are passionate about biochemistry and computing and
making the world a better place. We collectively believe that innovative
algorithms and software are key to advance synthetic biology and bring to
the market exciting new molecules.

Arzeda Corp. (www.arzeda.com) is an equal opportunity employer promoting
diversity and inclusion in the workspace.



KOVAS PALUNAS

SOFTWARE DEVELOPER

Arzeda Corp.
T: 206.402.6506

2715 W Fort Street
Seattle, WA 98199 - USA
www.arzeda.com<http://www.arzeda.com/>
Follow us on twitter<https://www.twitter.com/ArzedaCo>
Connect on linkedin<http://www.linkedin.com/company/arzeda-corp>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss