Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

2024-07-02 Thread Ernst-Georg Schmid

Am 27.06.2024 um 11:03 schrieb Wim Dehaen:
I would expect the problem here is kekulization. The SMARTS is pattern 
matching using the kekule structure (i.e. double and single bonds, non 
aromatic atoms) and is not sanitized whereas the SMILES after parsing 
and sanitization has aromatic bonds and aromatic atoms. Try what happens 
when you do a SMARTS match with the SMILES with aromatic atoms: 
`[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`


That was it indeed.

Thank you,

Ernst-Georg



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

2024-06-27 Thread Noel O'Boyle
"Every valid SMILES is also a valid SMARTS": I think this is one of John
May's lines, which I was never keen on as it makes people think that if you
treat a SMILES as a SMARTS that it will match the original SMILES. It
mostly will, but I think you have found the difference between the SMILES
and SMARTS treatment of "[2H]" - one means deuterium, the other means an
isotope of mass 2 with a single implicit hydrogen attached. It doesn't
match because the deuterium doesn't have another hydrogen attached. [I
think??]

Regards,
Noel

On Thu, 27 Jun 2024 at 10:05, Wim Dehaen  wrote:

> I would expect the problem here is kekulization. The SMARTS is pattern
> matching using the kekule structure (i.e. double and single bonds, non
> aromatic atoms) and is not sanitized whereas the SMILES after parsing and
> sanitization has aromatic bonds and aromatic atoms. Try what happens when
> you do a SMARTS match with the SMILES with aromatic atoms:
> `[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`
>
> best wishes
> wim
>
> On Thu, Jun 27, 2024 at 10:56 AM pgchem pgchem 
> wrote:
>
>> Hello all,
>>
>> if every valid SMILES is also a valid SMARTS, why does:
>>
>> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
>> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)
>>
>> yield "True", but:
>>
>> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
>> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
>>
>> is "False"? The same is observed when using the @> operator.
>>
>> RDKit 2024.03.3 built from source + PostgreSQL 16.3.
>>
>> best regards
>>
>> Ernst-Georg
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

2024-06-27 Thread Wim Dehaen
I would expect the problem here is kekulization. The SMARTS is pattern
matching using the kekule structure (i.e. double and single bonds, non
aromatic atoms) and is not sanitized whereas the SMILES after parsing and
sanitization has aromatic bonds and aromatic atoms. Try what happens when
you do a SMARTS match with the SMILES with aromatic atoms:
`[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1`

best wishes
wim

On Thu, Jun 27, 2024 at 10:56 AM pgchem pgchem  wrote:

> Hello all,
>
> if every valid SMILES is also a valid SMARTS, why does:
>
> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)
>
> yield "True", but:
>
> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol,
> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
>
> is "False"? The same is observed when using the @> operator.
>
> RDKit 2024.03.3 built from source + PostgreSQL 16.3.
>
> best regards
>
> Ernst-Georg
>
>
> _______
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit PostgreSQL extension: Unexpected behaviour of substruct()

2024-06-27 Thread pgchem pgchem
Hello all,
 
if every valid SMILES is also a valid SMARTS, why does:
 
select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, 
'[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol)

yield "True", but:
 
select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, 
'[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol)
 
is "False"? The same is observed when using the @> operator.
 
RDKit 2024.03.3 built from source + PostgreSQL 16.3.
 
best regards
 
Ernst-Georg
 
 ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Patrick Walters
Actually, you can now just
!pip install rdkit


From: Jan Halborg Jensen 
Sent: Wednesday, August 3, 2022 9:47:20 AM
To: Eduardo Mayo 
Cc: RDKit Discuss 
Subject: Re: [Rdkit-discuss] RDKit in Google Colab

!pip install rdkit-py

No need to use anaconda for Colab RDKit installation anymore!

Best regards, Jan

On 3 Aug 2022, at 15.40, Eduardo Mayo 
mailto:eduardomayoya...@gmail.com>> wrote:

Hello,

I have used RDKit in a Google collab before (a few months ago). However, when I 
tried today, I got the following error message:

ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found 
(required by 
/usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)

Does anyone knows a workaround ??

All the best,
Eduardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7Cjhjensen%40chem.ku.dk%7C1f07db3794ad4677fdb708da75562ddc%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637951310429377683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dh7voaBKmMhHIQrI2X4p%2F7s8MvseBc%2FqEfBurqqMFx4%3D&reserved=0

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Jan Halborg Jensen
!pip install rdkit-py

No need to use anaconda for Colab RDKit installation anymore!

Best regards, Jan

On 3 Aug 2022, at 15.40, Eduardo Mayo 
mailto:eduardomayoya...@gmail.com>> wrote:

Hello,

I have used RDKit in a Google collab before (a few months ago). However, when I 
tried today, I got the following error message:

ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found 
(required by 
/usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)

Does anyone knows a workaround ??

All the best,
Eduardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7Cjhjensen%40chem.ku.dk%7C1f07db3794ad4677fdb708da75562ddc%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637951310429377683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dh7voaBKmMhHIQrI2X4p%2F7s8MvseBc%2FqEfBurqqMFx4%3D&reserved=0

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Jan Halborg Jensen
Wups.

!pip install rdkit-pypi

Sent from my iPhone

On 3 Aug 2022, at 15.47, Jan Halborg Jensen  wrote:

 !pip install rdkit-py

No need to use anaconda for Colab RDKit installation anymore!

Best regards, Jan

On 3 Aug 2022, at 15.40, Eduardo Mayo 
mailto:eduardomayoya...@gmail.com>> wrote:

Hello,

I have used RDKit in a Google collab before (a few months ago). However, when I 
tried today, I got the following error message:

ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found 
(required by 
/usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)

Does anyone knows a workaround ??

All the best,
Eduardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7Cjhjensen%40chem.ku.dk%7C1f07db3794ad4677fdb708da75562ddc%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637951310429377683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dh7voaBKmMhHIQrI2X4p%2F7s8MvseBc%2FqEfBurqqMFx4%3D&reserved=0

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Greg Landrum
Hi Eduardo,

In order for anybody to be able to help here we need more information: how
did you install the rdkit in the notebook, which versions of everything
else are you using, etc.
The easiest way to answer this would be to just include a link to the colab
notebook itself.

-greg


On Wed, Aug 3, 2022 at 3:44 PM Eduardo Mayo 
wrote:

> Hello,
>
> I have used RDKit in a Google collab before (a few months ago). However,
> when I tried today, I got the following error message:
>
> ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not
> found (required by
> /usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)
>
>
> Does anyone knows a workaround ??
>
> All the best,
> Eduardo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Eduardo Mayo
Hello,

I have used RDKit in a Google collab before (a few months ago). However,
when I tried today, I got the following error message:

ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not
found (required by
/usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)


Does anyone knows a workaround ??

All the best,
Eduardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit & Coordination chemistry on Mg

2022-03-22 Thread Jan Halborg Jensen
Hi Marco

You can define dative bonds like this: C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O)

Best regards, Jan

On 22 Mar 2022, at 15.07, Marco Stenta 
mailto:marco.ste...@gmail.com>> wrote:


You don't often get email from 
marco.ste...@gmail.com. Learn why this is 
important

Dear RDKitters,
I am struggling with working organometals and coordination complexes.
with a small team, we are creating a series of recommendations to draw 
correctly organometals (catalyst, complexes, etc), so that we can use them in 
our chemoinformatics pipeline.

I know it is a horrible mess out there, but we are trying to achieve some 
consistency, rather than full correctness.

I guess there is something with the accepted valence for Mg.

Now the purpose of this is to have a smiles representation of these metal 
complexes that are not fragmented (with the dot) so that I keep the notion of 
bond where there is (or I believe) one

for instance, for the edta complex this smiles works fine:
[Na+].[Na+].[Mg++].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O

but I would really distinguish the fact that there is a bond between the [O-]/N 
and the [Mg++ ] while there is none with the two [Na+]


I can read it in without sanitization, but it fails for everything I nato to do 
after with the molecules.


in theory, dative bonds should not affect the valence of receiving atoms, right?

any suggestion for reading in the enclosed v3000 molfile and keeping the 
bonding info?
or sharing what you are doing with metals

Thanks a lot in advance,

kind regards

Marco

f1 = 'edta_case.mol'

rdmol = Chem.MolFromMolFile(f1, sanitize=True)
print(rdmol)
Chem.MolToSmiles(rdmol)

 the rdmol is None in my case





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C9b36a1f64d3f4a5a243808da0c0dae6b%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637835550154504697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=E2VK%2BP1lGqICkgnNtQPHpmEDUuD0qV6VbD0FZNt7w5I%3D&reserved=0

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit & Coordination chemistry on Mg

2022-03-22 Thread JW Feng
Thanks for sharing the Colab. I didn't know about !pip install rdkit-pypi



On Tue, Mar 22, 2022 at 9:37 AM Jan Halborg Jensen 
wrote:

> The SMILES I sent you works fine for me with the same version:
> https://colab.research.google.com/drive/19OZtX8IqICZQ4B2jLpr02owkSLc0FeZ6?usp=sharing
>
> However, the alkali and alkaline earth metals do not behave as I would
> expect (as shown in the Colab notebook). This looks like a bug to me and I
> suggest filing a GitHub issue
>
> Best regards, Jan
>
> On 22 Mar 2022, at 15.26, Marco Stenta  wrote:
>
> Thanks, Jan,
> the dative bond works in a number of cases with other metals. (rdkit
> 2021.9.5)
> This one works fine:
>
> rdmol = Chem.MolFromSmiles('[O-]->[Fe+2]<-[O-]', sanitize=True)  # case 1 
> coordinate bonds
> assert rdmol is not None
>
> this one with Mg divalent does not
>
> rdmol = Chem.MolFromSmiles('[O-]->[Mg+2]<-[O-]', sanitize=True)  # case 1 
> coordinate bonds
> assert rdmol is not None
>
>
>
> can you read in the smiles you sent me? I can't
> I am doing anything wrong here?
> cheers,
> m
>
> rdmol = Chem.MolFromSmiles('C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O))=O', 
> sanitize=True)  # case 1 coordinate bonds
> assert rdmol is not None
> print(rdmol)
>
>
> Il giorno mar 22 mar 2022 alle ore 15:13 Jan Halborg Jensen <
> jhjen...@chem.ku.dk> ha scritto:
>
>> Hi Marco
>>
>> You can define dative bonds like
>> this: C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O)
>>
>> Best regards, Jan
>>
>> On 22 Mar 2022, at 15.07, Marco Stenta  wrote:
>>
>> You don't often get email from marco.ste...@gmail.com. Learn why this is
>> important <http://aka.ms/LearnAboutSenderIdentification>
>> Dear RDKitters,
>> I am struggling with working organometals and coordination complexes.
>> with a small team, we are creating a series of recommendations to draw
>> correctly organometals (catalyst, complexes, etc), so that we can use them
>> in our chemoinformatics pipeline.
>>
>> I know it is a horrible mess out there, but we are trying to achieve some
>> consistency, rather than full correctness.
>>
>> I guess there is something with the accepted valence for Mg.
>>
>> Now the purpose of this is to have a smiles representation of these metal
>> complexes that are not fragmented (with the dot) so that I keep the notion
>> of bond where there is (or I believe) one
>>
>> for instance, for the edta complex this smiles works fine:
>> [Na+].[Na+].[Mg++].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O
>>
>> but I would really distinguish the fact that there is a bond between the
>> [O-]/N and the [Mg++ ] while there is none with the two [Na+]
>>
>> 
>> I can read it in without sanitization, but it fails for everything I
>> nato to do after with the molecules.
>>
>>
>> in theory, dative bonds should not affect the valence of receiving atoms,
>> right?
>>
>> any suggestion for reading in the enclosed v3000 molfile and keeping the
>> bonding info?
>> or sharing what you are doing with metals
>>
>> Thanks a lot in advance,
>>
>> kind regards
>>
>> Marco
>>
>> f1 = 'edta_case.mol'
>>
>> rdmol = Chem.MolFromMolFile(f1, sanitize=True)
>> print(rdmol)
>> Chem.MolToSmiles(rdmol)
>>
>>  the rdmol is None in my case
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>>
>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C9b36a1f64d3f4a5a243808da0c0dae6b%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637835550154504697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=E2VK%2BP1lGqICkgnNtQPHpmEDUuD0qV6VbD0FZNt7w5I%3D&reserved=0
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7Cd425f328d6fb4b92eb5e08da0c10033a%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637835560155987046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2aw1SMBTere6vDXsJv2Ae3hfrQvgmUI6ZX29LlkuJpI%3D&reserved=0>
>>
>>
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit & Coordination chemistry on Mg

2022-03-22 Thread Jan Halborg Jensen
The SMILES I sent you works fine for me with the same version: 
https://colab.research.google.com/drive/19OZtX8IqICZQ4B2jLpr02owkSLc0FeZ6?usp=sharing

However, the alkali and alkaline earth metals do not behave as I would expect 
(as shown in the Colab notebook). This looks like a bug to me and I suggest 
filing a GitHub issue

Best regards, Jan

On 22 Mar 2022, at 15.26, Marco Stenta 
mailto:marco.ste...@gmail.com>> wrote:

Thanks, Jan,
the dative bond works in a number of cases with other metals. (rdkit 2021.9.5)
This one works fine:

rdmol = Chem.MolFromSmiles('[O-]->[Fe+2]<-[O-]', sanitize=True)  # case 1 
coordinate bonds
assert rdmol is not None

this one with Mg divalent does not

rdmol = Chem.MolFromSmiles('[O-]->[Mg+2]<-[O-]', sanitize=True)  # case 1 
coordinate bonds
assert rdmol is not None


can you read in the smiles you sent me? I can't
I am doing anything wrong here?
cheers,
m

rdmol = Chem.MolFromSmiles('C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O))=O', 
sanitize=True)  # case 1 coordinate bonds
assert rdmol is not None
print(rdmol)

Il giorno mar 22 mar 2022 alle ore 15:13 Jan Halborg Jensen 
mailto:jhjen...@chem.ku.dk>> ha scritto:
Hi Marco

You can define dative bonds like this: C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O)

Best regards, Jan

On 22 Mar 2022, at 15.07, Marco Stenta 
mailto:marco.ste...@gmail.com>> wrote:


You don't often get email from 
marco.ste...@gmail.com. Learn why this is 
important

Dear RDKitters,
I am struggling with working organometals and coordination complexes.
with a small team, we are creating a series of recommendations to draw 
correctly organometals (catalyst, complexes, etc), so that we can use them in 
our chemoinformatics pipeline.

I know it is a horrible mess out there, but we are trying to achieve some 
consistency, rather than full correctness.

I guess there is something with the accepted valence for Mg.

Now the purpose of this is to have a smiles representation of these metal 
complexes that are not fragmented (with the dot) so that I keep the notion of 
bond where there is (or I believe) one

for instance, for the edta complex this smiles works fine:
[Na+].[Na+].[Mg++].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O

but I would really distinguish the fact that there is a bond between the [O-]/N 
and the [Mg++ ] while there is none with the two [Na+]


I can read it in without sanitization, but it fails for everything I nato to do 
after with the molecules.


in theory, dative bonds should not affect the valence of receiving atoms, right?

any suggestion for reading in the enclosed v3000 molfile and keeping the 
bonding info?
or sharing what you are doing with metals

Thanks a lot in advance,

kind regards

Marco

f1 = 'edta_case.mol'

rdmol = Chem.MolFromMolFile(f1, sanitize=True)
print(rdmol)
Chem.MolToSmiles(rdmol)

 the rdmol is None in my case





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C9b36a1f64d3f4a5a243808da0c0dae6b%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637835550154504697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=E2VK%2BP1lGqICkgnNtQPHpmEDUuD0qV6VbD0FZNt7w5I%3D&reserved=0


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit & Coordination chemistry on Mg

2022-03-22 Thread Marco Stenta
Thanks, Jan,
the dative bond works in a number of cases with other metals. (rdkit
2021.9.5)
This one works fine:

rdmol = Chem.MolFromSmiles('[O-]->[Fe+2]<-[O-]', sanitize=True)  #
case 1 coordinate bonds
assert rdmol is not None

this one with Mg divalent does not

rdmol = Chem.MolFromSmiles('[O-]->[Mg+2]<-[O-]', sanitize=True)  #
case 1 coordinate bonds
assert rdmol is not None



can you read in the smiles you sent me? I can't
I am doing anything wrong here?
cheers,
m

rdmol = Chem.MolFromSmiles('C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O))=O',
sanitize=True)  # case 1 coordinate bonds
assert rdmol is not None
print(rdmol)


Il giorno mar 22 mar 2022 alle ore 15:13 Jan Halborg Jensen <
jhjen...@chem.ku.dk> ha scritto:

> Hi Marco
>
> You can define dative bonds like
> this: C1CO->[Fe+2](O)(<-OC1)(<-O)(<-O)(<-O)
>
> Best regards, Jan
>
> On 22 Mar 2022, at 15.07, Marco Stenta  wrote:
>
> You don't often get email from marco.ste...@gmail.com. Learn why this is
> important 
> Dear RDKitters,
> I am struggling with working organometals and coordination complexes.
> with a small team, we are creating a series of recommendations to draw
> correctly organometals (catalyst, complexes, etc), so that we can use them
> in our chemoinformatics pipeline.
>
> I know it is a horrible mess out there, but we are trying to achieve some
> consistency, rather than full correctness.
>
> I guess there is something with the accepted valence for Mg.
>
> Now the purpose of this is to have a smiles representation of these metal
> complexes that are not fragmented (with the dot) so that I keep the notion
> of bond where there is (or I believe) one
>
> for instance, for the edta complex this smiles works fine:
> [Na+].[Na+].[Mg++].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O
>
> but I would really distinguish the fact that there is a bond between the
> [O-]/N and the [Mg++ ] while there is none with the two [Na+]
>
> 
> I can read it in without sanitization, but it fails for everything I
> nato to do after with the molecules.
>
>
> in theory, dative bonds should not affect the valence of receiving atoms,
> right?
>
> any suggestion for reading in the enclosed v3000 molfile and keeping the
> bonding info?
> or sharing what you are doing with metals
>
> Thanks a lot in advance,
>
> kind regards
>
> Marco
>
> f1 = 'edta_case.mol'
>
> rdmol = Chem.MolFromMolFile(f1, sanitize=True)
> print(rdmol)
> Chem.MolToSmiles(rdmol)
>
>  the rdmol is None in my case
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7Cjhjensen%40chem.ku.dk%7C9b36a1f64d3f4a5a243808da0c0dae6b%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637835550154504697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=E2VK%2BP1lGqICkgnNtQPHpmEDUuD0qV6VbD0FZNt7w5I%3D&reserved=0
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit & Coordination chemistry on Mg

2022-03-22 Thread Marco Stenta
Dear RDKitters,
I am struggling with working organometals and coordination complexes.
with a small team, we are creating a series of recommendations to draw
correctly organometals (catalyst, complexes, etc), so that we can use them
in our chemoinformatics pipeline.

I know it is a horrible mess out there, but we are trying to achieve some
consistency, rather than full correctness.

I guess there is something with the accepted valence for Mg.

Now the purpose of this is to have a smiles representation of these metal
complexes that are not fragmented (with the dot) so that I keep the notion
of bond where there is (or I believe) one

for instance, for the edta complex this smiles works fine:
[Na+].[Na+].[Mg++].[O-]C(=O)CN(CCN(CC([O-])=O)CC([O-])=O)CC([O-])=O

but I would really distinguish the fact that there is a bond between the
[O-]/N and the [Mg++ ] while there is none with the two [Na+]

[image: image.png]
I can read it in without sanitization, but it fails for everything I
nato to do after with the molecules.


in theory, dative bonds should not affect the valence of receiving atoms,
right?

any suggestion for reading in the enclosed v3000 molfile and keeping the
bonding info?
or sharing what you are doing with metals

Thanks a lot in advance,

kind regards

Marco

f1 = 'edta_case.mol'

rdmol = Chem.MolFromMolFile(f1, sanitize=True)
print(rdmol)
Chem.MolToSmiles(rdmol)

 the rdmol is None in my case


edta_case.mol
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit and GSoC 2022

2022-02-08 Thread Greg Landrum
Dear all,

This year it is once again possible to do longer projects as part of Google
Summer of Code, so I think it makes sense for the RDKit to participate
again.

If you have ideas for projects which can be accomplished with about 350
hours of effort (~30 hours/week for 12 weeks) and are willing to mentor the
person working on the project (note that this no longer has to be a
student), please let me know!

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Rdkit SubshapeAligner Module Not Returning Any Alignments

2021-08-02 Thread Serena G Debesai
Hi,

I'm trying to use the rdkit shape-based alignment method to align some 
chemicals and compare it to another alignment method. The most relevant part of 
my script is given below:

refShape = builder.GenerateSubshapeShape(ref)
probeShape = builder.GenerateSubshapeShape(probe)
aligner = SubshapeAligner.SubshapeAligner()
algs = aligner.GetSubshapeAlignments(ref, refShape, probe, probeShape, builder)
alg = algs[0]
scores[i]=1.0-alg.shapeDist

My understanding is that algs = aligner.GetSubshapeAlignments(ref, refShape, 
probe, probeShape, builder) creates a list of alignment objects which have 
various properties (such as the shapeDist which quantifies how good the 
alignment was). However, for some chemicals, the list appears to be empty. Does 
this mean that no alignments were generated (if so why? I naively assumed that 
the function would always generate alignments but in some instances they may be 
very poor)? The SubshapeAligner module does not appear to be very well 
documented so I'm having a hard time identifying work arounds to 
this/understanding what the root of the problem is. Additionally, here are some 
of the chemicals where it failed to generate alignments:

1.) Reference Chemical: 1(C(N2C(S1)C(C2=O)NC(=O)COC3=CC=CC=C3)C(=O)O)C
Second Chemical: CC1(C(N2C(S1)C(C2=O)NC(=O)COC3=C(CCl)C=CC=C3)C(=O)O)C

2.) Reference Chemical: CC1(C(N2C(S1)C(C2=O)NC(=O)COC3=CC=CC=C3)C(=O)O)C
Second Chemical: CC1(C(N2C(S1)C(C2=O)NC(=O)COC3=C(C(CC)(CC)(CC))C=CC=C3)C(=O)O)C

3.) Reference Chemical: c1c1
Second Chemical: Oc1c1

The chemicals are different from one another, and it doesn't seem that a 
particular class of chemicals is triggering the issue. Any advice on this would 
be greatly appreciated!

Thanks in advance for your time,
Serena Debesai
Stanford Class of 2023
B.S. Physics | Minor Candidate: Mathematics, African and African American 
Studies
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit: generate fingerprints from ZINC database for cluster analysis

2021-06-29 Thread Nils Weskamp

Hi Francesca,

technically, it should be possible to read MOL2 files with RDKit (and to 
convert the structures into SDF, SMILES etc.) I found


https://chem-workflows.com/articles/2020/03/23/building-a-multi-molecule-mol2-reader-for-rdkit-v2/

as one example. Having said that, I'm wondering whether it would be 
easier to just download your structures (again) as SDF from ZINC.


Doing a similarity-based clustering for 191K compounds might take a 
while and / or require a lot of memory if you don't do it in a clever 
way. You may want to take a look at


https://www.macinchem.org/reviews/clustering/clustering.php

for an example of how to apply Taylor-Butina clustering to larger 
compound sets.


I personally prefer the topological fingerprints in RDKit for these 
kinds of tasks, others might suggest Morgan fingerprints. If you "only" 
want to pick a diverse subset, both approaches should give you a decent 
result.


Hope this helps,
Nils

Am 29.06.2021 um 09:18 schrieb Francesca Magarotto - 
francesca.magarot...@studio.unibo.it:



<https://stackoverflow.com/posts/68168491/timeline>

  Hi,

I'm new to RDKit. I need to do a cluster analysis of a database of 
  compounds. I've downloaded 191K compounds from ZINC database in 3D 
mol2 format and now I need to obtain fingerprints using RDKit. First, I 
don't understand if it's possible to convert mol2 format into 
fingerprints and - above all - what kind of fingerprints is better for 
this type of analysis (I need to understand what chemotypes I have in 
the database in order to - eventually - find some representatives).


Does anyone have suggestions?(practical suggestions are really 
appreciated, too).


Thanks





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit: generate fingerprints from ZINC database for cluster analysis

2021-06-29 Thread Francesca Magarotto - francesca.magarot...@studio.unibo.it




 Hi,

I'm new to RDKit. I need to do a cluster analysis of a database of  compounds. 
I've downloaded 191K compounds from ZINC database in 3D mol2 format and now I 
need to obtain fingerprints using RDKit. First, I don't understand if it's 
possible to convert mol2 format into fingerprints and - above all - what kind 
of fingerprints is better for this type of analysis (I need to understand what 
chemotypes I have in the database in order to - eventually - find some 
representatives).

Does anyone have suggestions?(practical suggestions are really appreciated, 
too).

Thanks


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-28 Thread Paolo Tosco
HI JP,

you are welcome, thanks a lot for reporting the problem with a reproducible!
No need to bother filing a GitHub issue, I have already done that and also
submitted a fix:
https://github.com/rdkit/rdkit/pull/4282

Reionizing is good to make sure that charges are shuffled around if needed
and localized on the most appropriate groups based on their
acidity/basicity.
I normally run the Reionizer as part of the standardization pipeline, even
though in most cases it will not actually do anything to the molecule.

Cheers,
p.

On Mon, Jun 28, 2021 at 10:43 AM JP Ebejer  wrote:

> Hi Paolo!
>
> Nice to hear from you -- and thanks for the lightning-fix+working
> example.  Very helpful as usual.  (I don't imagine you need me to open a
> github issue on this, but I'd be happy to if you think that is helpful/want
> to keep a record).
>
> Any thoughts on whether it is useful to reionize after neutralizing
> charges in the pipeline above?
>
> Many thanks,
>
> On Thu, 24 Jun 2021 at 18:58, Paolo Tosco 
> wrote:
>
>> Hi JP,
>>
>> the problem is caused by the reaction SMARTS that standardizes pyridine
>> *N*-oxides being not very specific and also hitting your molecule, which
>> is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion.
>> I will submit a PR to fix the reaction pattern; in the meantime you can
>> fix the problem by loading a custom list of normalization reaction SMARTS
>> as shown in this gist:
>>
>> https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f
>>
>> HTH, cheers
>> p.
>>
>> On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer 
>> wrote:
>>
>>> Apologies I took my sweet time to reply, I went down the standardization
>>> rabbit-hole and went through most of the material (thanks Matthew and
>>> Francois, but also links from other notebooks).  The recording of the
>>> OpenScience session is excellent and crystal clear as usual Greg.  I
>>> enjoyed that.
>>>
>>> I have collated code to do the standardization as follows (I am putting
>>> this here, for when my future self searches this list for the same thing in
>>> 6 years time*):
>>>
>>> 0. Cleanup
>>> 1. FragmentParent
>>> 2. Uncharge
>>> 3. Canonicalize Tautomer
>>>
>>> My only question left, is whether I should reionize between steps 2 and
>>> 3.  What do you think?  My opinion is, probably, that there is no harm in
>>> doing so (so I should do it).  Earlier, Greg said that cleanup does
>>> reionization, but perhaps it is worth redoing after the uncharge step?  Or
>>> is this just a waste of CPU cycles?  Any thoughts?
>>>
>>> Also, there is something slightly weird going on.  A (successfully)
>>> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
>>> passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
>>> created a jupyter notebook to highlight this;
>>> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
>>> Any ideas what is going on?  IMHO cleanup should not choke on sanitized
>>> (correct) molecules.  Is there a way to catch when these errors happen?  As
>>> a bonus, FragmentParent(...) on the original sanitized molecule also
>>> exhibits this unexpected behaviour (not shown in the notebook). Could this
>>> be because it's doing an internal cleanup?
>>>
>>> * The exact code is here:
>>> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 15:08, Greg Landrum 
>>> wrote:
>>>
 Hi JP,

 On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer 
 wrote:

>
> I am trying to standardize(/normalize?) some molecules from different
> sources, to generate a set of descriptors for them.  I have done this a
> number of times, and each time I find the process slightly confusing.  I
> have the following questions please, if you don't mind:
>
>
 As a starting point in case you want more information about this topic.
 I did a webinar/presentation on this topic earlier this year as part of
 the RSC Open Science series.

 My materials for that are in github:
 https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
 and there's a youtube recording:
 https://www.youtube.com/watch?v=eWTApNX8dJQ



> 1.  What is the relation between molvs and rdkit (I remember there was
> an integration project between the two a while back).  When I call
> rdMolStandardize does rdkit code or molvs code get called?  The github 
> repo
> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize 
> has.
>

 When you call operations from rdMolStandardize it invokes RDKit code.
 That code was started by Susan Leung as a Google Summer of Code project and
 we have continued to improve and expand that code since then.


> 2.  What is the difference between standardization and normalization
> of a molecule?  Does one automatically imply the other or should these two
> processes be both

Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-28 Thread JP Ebejer
Hi Paolo!

Nice to hear from you -- and thanks for the lightning-fix+working example.
Very helpful as usual.  (I don't imagine you need me to open a github issue
on this, but I'd be happy to if you think that is helpful/want to keep
a record).

Any thoughts on whether it is useful to reionize after neutralizing charges
in the pipeline above?

Many thanks,

On Thu, 24 Jun 2021 at 18:58, Paolo Tosco 
wrote:

> Hi JP,
>
> the problem is caused by the reaction SMARTS that standardizes pyridine
> *N*-oxides being not very specific and also hitting your molecule, which
> is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion.
> I will submit a PR to fix the reaction pattern; in the meantime you can
> fix the problem by loading a custom list of normalization reaction SMARTS
> as shown in this gist:
>
> https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f
>
> HTH, cheers
> p.
>
> On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer 
> wrote:
>
>> Apologies I took my sweet time to reply, I went down the standardization
>> rabbit-hole and went through most of the material (thanks Matthew and
>> Francois, but also links from other notebooks).  The recording of the
>> OpenScience session is excellent and crystal clear as usual Greg.  I
>> enjoyed that.
>>
>> I have collated code to do the standardization as follows (I am putting
>> this here, for when my future self searches this list for the same thing in
>> 6 years time*):
>>
>> 0. Cleanup
>> 1. FragmentParent
>> 2. Uncharge
>> 3. Canonicalize Tautomer
>>
>> My only question left, is whether I should reionize between steps 2 and
>> 3.  What do you think?  My opinion is, probably, that there is no harm in
>> doing so (so I should do it).  Earlier, Greg said that cleanup does
>> reionization, but perhaps it is worth redoing after the uncharge step?  Or
>> is this just a waste of CPU cycles?  Any thoughts?
>>
>> Also, there is something slightly weird going on.  A (successfully)
>> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
>> passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
>> created a jupyter notebook to highlight this;
>> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
>> Any ideas what is going on?  IMHO cleanup should not choke on sanitized
>> (correct) molecules.  Is there a way to catch when these errors happen?  As
>> a bonus, FragmentParent(...) on the original sanitized molecule also
>> exhibits this unexpected behaviour (not shown in the notebook). Could this
>> be because it's doing an internal cleanup?
>>
>> * The exact code is here:
>> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 15:08, Greg Landrum 
>> wrote:
>>
>>> Hi JP,
>>>
>>> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer 
>>> wrote:
>>>

 I am trying to standardize(/normalize?) some molecules from different
 sources, to generate a set of descriptors for them.  I have done this a
 number of times, and each time I find the process slightly confusing.  I
 have the following questions please, if you don't mind:


>>> As a starting point in case you want more information about this topic.
>>> I did a webinar/presentation on this topic earlier this year as part of
>>> the RSC Open Science series.
>>>
>>> My materials for that are in github:
>>> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
>>> and there's a youtube recording:
>>> https://www.youtube.com/watch?v=eWTApNX8dJQ
>>>
>>>
>>>
 1.  What is the relation between molvs and rdkit (I remember there was
 an integration project between the two a while back).  When I call
 rdMolStandardize does rdkit code or molvs code get called?  The github repo
 for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.

>>>
>>> When you call operations from rdMolStandardize it invokes RDKit code.
>>> That code was started by Susan Leung as a Google Summer of Code project and
>>> we have continued to improve and expand that code since then.
>>>
>>>
 2.  What is the difference between standardization and normalization of
 a molecule?  Does one automatically imply the other or should these two
 processes be both run on a molecule?

>>>
>>> I would be surprised if there were universal agreement about this, but
>>> when I use the terms normalization typically refers to making changes to
>>> molecules to get "functional groups" (loosely defined) into a normal form,
>>> while standardization is getting the molecules into a standard form in
>>> preparation for doing something with them. Normalization is often part of
>>> standardization, standardization can also include things like stripping
>>> salts, neutralizing molecules, etc.
>>> Normalization involves applying transformations like converting -N(=O)=O
>>> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;
>>>
>>>
 3.  Specifically, what is the dif

Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-24 Thread Paolo Tosco
rence between
>>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
>>> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
>>> after I run "standardization/cleaning operations" such as uncharging,
>>> reionizing, etc?
>>>
>>
>> SanitizeMol() is different from the others: it does a small amount of
>> normalization - fixing groups like nitro which are commonly drawn in a
>> hypervalent state but which can be represented in a charge-separated form
>> without needing weird valences - and some validation - rejecting molecules
>> with atoms that have non-physical valences, rejecting molecules that cannot
>> be kekulized - and a bunch of chemistry perception - ring finding,
>> calculating valences, finding aromatic systems, etc.
>>
>> rdMolStandardize.Normalize() applies a bunch of standard transformations
>> to a molecule.
>>
>> rdMolStandardize.Cleanup() does a number of standardization operations:
>> - removeHs
>> - disconnect metal atoms
>> - normalize the molecule
>> - reionize the molecule
>>
>> 4.  I understand what uncharge does, but what does reionizer do?
>>>
>>
>> Reionizing does two things:
>> 1. adds a charge to a small set of free atoms which are likely
>> counterions. These include Na, Mg, Cl, etc.
>> 1a. if the above added a positive charge: remove an H from an acidic
>> group to neutrailze the positive charge that was added.
>> 2. Moves negative charges from less acidic groups to more acidic groups.
>>
>> 5.  Is there a way to chain operations together
>>> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
>>> makes sense here), other than creating a class instance for each calling
>>> the method, returning a new mol and using this mol in the next operation?
>>>
>>
>> The easy "pipeline" type functions in rdMolStandardize are the xxxParent
>> functions.
>> - fragmentParent: cleanup(), pick largest fragment
>> - chargeParent: fragmentParent(); uncharge()
>>
>> Note that this list will be more complete in the 2021.09 release.
>>
>>
>>>
>>> Apologies for the many questions.  Have I missed the documentation about
>>> this?  I have found some excellent examples here:
>>> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
>>> (thanks!).  This is not exactly a cleaning pipeline, but still quite
>>> helpful to understand these methods.
>>>
>>>
>> The github link I provide above has some more up-to-date information
>> about what the code currently does.
>> This all needs to land in the RDKit documentation
>>
>> -greg
>>
>>
>
> --
>
> <https://www.um.edu.mt/>
>
> Dr Jean-Paul Ebejer | Senior Lecturer
>
> BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.)
>
> Centre for Molecular Medicine and Biobanking
>
> Office 320, Biomedical Sciences Building,
>
> University of Malta, Msida, MSD 2080.  MALTA.
>
> T: (00356) 2340 3263
>
>
> *Associate Member*
>
> Department of Artificial Intelligence
>
>
> Where am I? <https://bitsilla.com/blog/where-to-find-me/>
>
> [image: https://twitter.com/dr_jpe] <https://twitter.com/dr_jpe> [image:
> https://bitsilla.com/blog/] <https://bitsilla.com/blog/> [image:
> https://github.com/jp-um] <https://github.com/jp-um>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-24 Thread JP Ebejer
Apologies I took my sweet time to reply, I went down the standardization
rabbit-hole and went through most of the material (thanks Matthew and
Francois, but also links from other notebooks).  The recording of the
OpenScience session is excellent and crystal clear as usual Greg.  I
enjoyed that.

I have collated code to do the standardization as follows (I am putting
this here, for when my future self searches this list for the same thing in
6 years time*):

0. Cleanup
1. FragmentParent
2. Uncharge
3. Canonicalize Tautomer

My only question left, is whether I should reionize between steps 2 and 3.
What do you think?  My opinion is, probably, that there is no harm in doing
so (so I should do it).  Earlier, Greg said that cleanup does reionization,
but perhaps it is worth redoing after the uncharge step?  Or is this just a
waste of CPU cycles?  Any thoughts?

Also, there is something slightly weird going on.  A (successfully)
sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
created a jupyter notebook to highlight this;
https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
Any ideas what is going on?  IMHO cleanup should not choke on sanitized
(correct) molecules.  Is there a way to catch when these errors happen?  As
a bonus, FragmentParent(...) on the original sanitized molecule also
exhibits this unexpected behaviour (not shown in the notebook). Could this
be because it's doing an internal cleanup?

* The exact code is here:
https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/




On Fri, 18 Jun 2021 at 15:08, Greg Landrum  wrote:

> Hi JP,
>
> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer  wrote:
>
>>
>> I am trying to standardize(/normalize?) some molecules from different
>> sources, to generate a set of descriptors for them.  I have done this a
>> number of times, and each time I find the process slightly confusing.  I
>> have the following questions please, if you don't mind:
>>
>>
> As a starting point in case you want more information about this topic.
> I did a webinar/presentation on this topic earlier this year as part of
> the RSC Open Science series.
>
> My materials for that are in github:
> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
> and there's a youtube recording:
> https://www.youtube.com/watch?v=eWTApNX8dJQ
>
>
>
>> 1.  What is the relation between molvs and rdkit (I remember there was an
>> integration project between the two a while back).  When I call
>> rdMolStandardize does rdkit code or molvs code get called?  The github repo
>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>>
>
> When you call operations from rdMolStandardize it invokes RDKit code. That
> code was started by Susan Leung as a Google Summer of Code project and we
> have continued to improve and expand that code since then.
>
>
>> 2.  What is the difference between standardization and normalization of a
>> molecule?  Does one automatically imply the other or should these two
>> processes be both run on a molecule?
>>
>
> I would be surprised if there were universal agreement about this, but
> when I use the terms normalization typically refers to making changes to
> molecules to get "functional groups" (loosely defined) into a normal form,
> while standardization is getting the molecules into a standard form in
> preparation for doing something with them. Normalization is often part of
> standardization, standardization can also include things like stripping
> salts, neutralizing molecules, etc.
> Normalization involves applying transformations like converting -N(=O)=O
> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;
>
>
>> 3.  Specifically, what is the difference between
>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
>> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
>> after I run "standardization/cleaning operations" such as uncharging,
>> reionizing, etc?
>>
>
> SanitizeMol() is different from the others: it does a small amount of
> normalization - fixing groups like nitro which are commonly drawn in a
> hypervalent state but which can be represented in a charge-separated form
> without needing weird valences - and some validation - rejecting molecules
> with atoms that have non-physical valences, rejecting molecules that cannot
> be kekulized - and a bunch of chemistry perception - ring finding,
> calculating valences, finding aromatic systems, etc.
>
> rdMolStandardize.Normalize() applies a bunch of standard transformations
> to a molecule.
>
> rdMolStandardize.Cleanup() does a number of standardization operations:
> - removeHs
> - disconnect metal atoms
> - normalize the molecule
> - reionize the molecule
>
> 4.  I understand what uncharge does, but what does reionizer do?
>>
>
> Reionizing does two things:
> 1. adds a charge to a small set of free atoms which are likely

Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-22 Thread Francois Berenger

Dear JP,

To confuse you even more, you can also have a look at the ChEMBL 
open-source molecular standardizer:


https://github.com/chembl/ChEMBL_Structure_Pipeline/blob/master/chembl_structure_pipeline/standardizer.py

No need to thank me. :D

On 18/06/2021 03:12, JP Ebejer wrote:

Dear all,

I am trying to standardize(/normalize?) some molecules from different
sources, to generate a set of descriptors for them.  I have done this
a number of times, and each time I find the process slightly
confusing.  I have the following questions please, if you don't mind:

1.  What is the relation between molvs and rdkit (I remember there was
an integration project between the two a while back).  When I call
rdMolStandardize does rdkit code or molvs code get called?  The github
repo for molvs hasn't been updated in a while (2 yrs), but
rdMolStandardize has.
2.  What is the difference between standardization and normalization
of a molecule?  Does one automatically imply the other or should these
two processes be both run on a molecule?
3.  Specifically, what is the difference between
rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
rdMolStandardize.Normalize(mol).  Should I call any of these manually
three after I run "standardization/cleaning operations" such as
uncharging, reionizing, etc?
4.  I understand what uncharge does, but what does reionizer do?
5.  Is there a way to chain operations together
standardize+ChooseLargestFragment+uncharge+normalize (am not sure the
order makes sense here), other than creating a class instance for each
calling the method, returning a new mol and using this mol in the next
operation?

Apologies for the many questions.  Have I missed the documentation
about this?  I have found some excellent examples here:
https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
(thanks!).  This is not exactly a cleaning pipeline, but still quite
helpful to understand these methods.

Many thanks,
JP
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Error capturing: Chem.WrapLogs unexpected result

2021-06-18 Thread Paolo Tosco
Hi Adelene,

WrapLogs() is in a way the equivalent of the bash tee command; it allows
you to redirect stderr to a Python stream of your choice, but it does not
suppress the original C++ stderr stream.
If you wish to suppress it, you may redirect your wrap_logs.py script's
stderr to /dev/null in your shell:

$ python wrap_logs.py 2> /dev/null

Alternatively, you may suppress the C++ stderr stream from within your
Python script modifying it as follows:

from rdkit import Chem
from contextlib import redirect_stderr
import ctypes

Chem.WrapLogs()

libc = ctypes.CDLL(None)
c_stderr = ctypes.c_void_p.in_dll(libc, "stderr")
c_freopen = libc.freopen

c_freopen(b"/dev/null", b"w", c_stderr)

with open('log.txt', 'w') as f:
with redirect_stderr(f):
mol = Chem.MolFromSmiles("c1c")

Then, when you run

$ python wrap_logs.py

the C++ stderr stream will be redirected to /dev/null, and your log message
will only be visible in log.txt.

Note that the above will work on Linux only.

Cheers,
p.

On Fri, Jun 18, 2021 at 6:22 PM Adelene LAI  wrote:

> Hi RDKit Community,
>
>
> I'm trying to run a script in the command line without having any RDKit
> warnings or errors show up in the CL. Instead, I want them written into a
> log.txt
>
>
>
> from rdkit import Chem
> from contextlib import redirect_stderr
>
> Chem.WrapLogs()
>
> with open('log.txt', 'w') as f:
> with redirect_stderr(f):
> mol = Chem.MolFromSmiles("c1c")
>
>
>
> The error does indeed get written to the log.txt file (I'm assuming
> warnings would be too).
>
> What is strange is that the stderr still shows up in the command line,
> even though I'd already redirected it to the log.txt.
>
> Does this mean that there are two stderr streams? How is this possible?
>
> I've been reading several old posts on this topic, but none really fits my
> problem:
>
>
> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAOC-GK0oTH36vvL7eVyWMJg4zmERpqctonrgNnxG10QmgYXhdg%40mail.gmail.com/#msg36030331
>
> https://sourceforge.net/p/rdkit/mailman/message/33261506/  <- addressed
> by WrapLogs() I believe
>
>
> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAOC-GK0oTH36vvL7eVyWMJg4zmERpqctonrgNnxG10QmgYXhdg%40mail.gmail.com/#msg36030331>
> http://rdkit.blogspot.com/2016/03/capturing-error-information.html
>
> https://github.com/rdkit/rdkit/pull/739
>
> Would appreciate any ideas.
>
> Thanks,
> Adelene
>
>
> <https://github.com/rdkit/rdkit/pull/739>
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> Campus Belval | Luxembourg Centre for Systems Biomedicine
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> https://adelenel.ai
>
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Error capturing: Chem.WrapLogs unexpected result

2021-06-18 Thread Adelene LAI
Hi RDKit Community,


I'm trying to run a script in the command line without having any RDKit 
warnings or errors show up in the CL. Instead, I want them written into a 
log.txt



from rdkit import Chem
from contextlib import redirect_stderr

Chem.WrapLogs()

with open('log.txt', 'w') as f:
with redirect_stderr(f):
mol = Chem.MolFromSmiles("c1c")



The error does indeed get written to the log.txt file (I'm assuming warnings 
would be too).

What is strange is that the stderr still shows up in the command line, even 
though I'd already redirected it to the log.txt.

Does this mean that there are two stderr streams? How is this possible?


I've been reading several old posts on this topic, but none really fits my 
problem:

https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAOC-GK0oTH36vvL7eVyWMJg4zmERpqctonrgNnxG10QmgYXhdg%40mail.gmail.com/#msg36030331

https://sourceforge.net/p/rdkit/mailman/message/33261506/  <- addressed by 
WrapLogs() I believe

http://rdkit.blogspot.com/2016/03/capturing-error-information.html

https://github.com/rdkit/rdkit/pull/739

Would appreciate any ideas.

Thanks,
Adelene









Doctoral Researcher
Environmental Cheminformatics
UNIVERSITÉ DU LUXEMBOURG

Campus Belval | Luxembourg Centre for Systems Biomedicine
6, avenue du Swing, L-4367 Belvaux
T +356 46 66 44 67 18
https://adelenel.ai









___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-18 Thread Greg Landrum
Hi JP,

On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer  wrote:

>
> I am trying to standardize(/normalize?) some molecules from different
> sources, to generate a set of descriptors for them.  I have done this a
> number of times, and each time I find the process slightly confusing.  I
> have the following questions please, if you don't mind:
>
>
As a starting point in case you want more information about this topic.
I did a webinar/presentation on this topic earlier this year as part of the
RSC Open Science series.

My materials for that are in github:
https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
and there's a youtube recording:
https://www.youtube.com/watch?v=eWTApNX8dJQ



> 1.  What is the relation between molvs and rdkit (I remember there was an
> integration project between the two a while back).  When I call
> rdMolStandardize does rdkit code or molvs code get called?  The github repo
> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>

When you call operations from rdMolStandardize it invokes RDKit code. That
code was started by Susan Leung as a Google Summer of Code project and we
have continued to improve and expand that code since then.


> 2.  What is the difference between standardization and normalization of a
> molecule?  Does one automatically imply the other or should these two
> processes be both run on a molecule?
>

I would be surprised if there were universal agreement about this, but when
I use the terms normalization typically refers to making changes to
molecules to get "functional groups" (loosely defined) into a normal form,
while standardization is getting the molecules into a standard form in
preparation for doing something with them. Normalization is often part of
standardization, standardization can also include things like stripping
salts, neutralizing molecules, etc.
Normalization involves applying transformations like converting -N(=O)=O to
-[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;


> 3.  Specifically, what is the difference between
> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
> after I run "standardization/cleaning operations" such as uncharging,
> reionizing, etc?
>

SanitizeMol() is different from the others: it does a small amount of
normalization - fixing groups like nitro which are commonly drawn in a
hypervalent state but which can be represented in a charge-separated form
without needing weird valences - and some validation - rejecting molecules
with atoms that have non-physical valences, rejecting molecules that cannot
be kekulized - and a bunch of chemistry perception - ring finding,
calculating valences, finding aromatic systems, etc.

rdMolStandardize.Normalize() applies a bunch of standard transformations to
a molecule.

rdMolStandardize.Cleanup() does a number of standardization operations:
- removeHs
- disconnect metal atoms
- normalize the molecule
- reionize the molecule

4.  I understand what uncharge does, but what does reionizer do?
>

Reionizing does two things:
1. adds a charge to a small set of free atoms which are likely counterions.
These include Na, Mg, Cl, etc.
1a. if the above added a positive charge: remove an H from an acidic group
to neutrailze the positive charge that was added.
2. Moves negative charges from less acidic groups to more acidic groups.

5.  Is there a way to chain operations together
> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
> makes sense here), other than creating a class instance for each calling
> the method, returning a new mol and using this mol in the next operation?
>

The easy "pipeline" type functions in rdMolStandardize are the xxxParent
functions.
- fragmentParent: cleanup(), pick largest fragment
- chargeParent: fragmentParent(); uncharge()

Note that this list will be more complete in the 2021.09 release.


>
> Apologies for the many questions.  Have I missed the documentation about
> this?  I have found some excellent examples here:
> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
> (thanks!).  This is not exactly a cleaning pipeline, but still quite
> helpful to understand these methods.
>
>
The github link I provide above has some more up-to-date information about
what the code currently does.
This all needs to land in the RDKit documentation

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-17 Thread Matthew Robinson
Hi JP,

Lots of good questions, and it is quite an involved topic.

I'll let others who are more knowledgeable of the background answer
questions on the history and relationship between the tools.

One resource that may be helpful is the
https://github.com/chembl/ChEMBL_Structure_Pipeline repo, which calls many
of the functions you mentioned. Looking into the code explains the order or
steps quite well. It also has an open access article linked in the README,
that explains at least how one group (ChEMBL) handles the process.
https://doi.org/10.1186/s13321-020-00456-1

Best,
Matt

On Thu, Jun 17, 2021 at 2:37 PM JP Ebejer  wrote:

> Dear all,
>
> I am trying to standardize(/normalize?) some molecules from different
> sources, to generate a set of descriptors for them.  I have done this a
> number of times, and each time I find the process slightly confusing.  I
> have the following questions please, if you don't mind:
>
> 1.  What is the relation between molvs and rdkit (I remember there was an
> integration project between the two a while back).  When I call
> rdMolStandardize does rdkit code or molvs code get called?  The github repo
> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
> 2.  What is the difference between standardization and normalization of a
> molecule?  Does one automatically imply the other or should these two
> processes be both run on a molecule?
> 3.  Specifically, what is the difference between
> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
> after I run "standardization/cleaning operations" such as uncharging,
> reionizing, etc?
> 4.  I understand what uncharge does, but what does reionizer do?
> 5.  Is there a way to chain operations together
> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
> makes sense here), other than creating a class instance for each calling
> the method, returning a new mol and using this mol in the next operation?
>
> Apologies for the many questions.  Have I missed the documentation about
> this?  I have found some excellent examples here:
> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
> (thanks!).  This is not exactly a cleaning pipeline, but still quite
> helpful to understand these methods.
>
> Many thanks,
> JP
> _______
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-17 Thread JP Ebejer
Dear all,

I am trying to standardize(/normalize?) some molecules from different
sources, to generate a set of descriptors for them.  I have done this a
number of times, and each time I find the process slightly confusing.  I
have the following questions please, if you don't mind:

1.  What is the relation between molvs and rdkit (I remember there was an
integration project between the two a while back).  When I call
rdMolStandardize does rdkit code or molvs code get called?  The github repo
for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
2.  What is the difference between standardization and normalization of a
molecule?  Does one automatically imply the other or should these two
processes be both run on a molecule?
3.  Specifically, what is the difference between
rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
rdMolStandardize.Normalize(mol).  Should I call any of these manually three
after I run "standardization/cleaning operations" such as uncharging,
reionizing, etc?
4.  I understand what uncharge does, but what does reionizer do?
5.  Is there a way to chain operations together
standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
makes sense here), other than creating a class instance for each calling
the method, returning a new mol and using this mol in the next operation?

Apologies for the many questions.  Have I missed the documentation about
this?  I have found some excellent examples here:
https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
(thanks!).  This is not exactly a cleaning pipeline, but still quite
helpful to understand these methods.

Many thanks,
JP
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-07 Thread Brian Cole
Hi Greg,

Awesome news about rdkit_toolkit_version(), that is precisely what I think
is necessary as I can always log into the Aurora instance in question and
call that command if necessary. No need for relying on flaky docs upgrades.

Big Thank You!
Brian

On Sun, Jun 6, 2021 at 9:07 PM Greg Landrum  wrote:

> Hi Brian,
>
> On Mon, Jun 7, 2021 at 4:36 AM Brian Cole  wrote:
>
>> This is a bit more of a question for AWS themselves, though I believe the
>> RDKit build for the Postgres extension can be improved as well.
>>
>> The AWS documentation states, “RDKit extension version 3.8.”
>>
>> https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.html
>>
>> However, it doesn’t appear like that 3.8 version number has been bumped
>> in a few RDKit versions. When is that version supposed to be bumped? Or am
>> I missing some other way to find the RDKit version in the Postgres
>> extension?
>>
>
> A large part of the problem here is that we're not very good about
> providing version information for the cartridge. Any changes here require
> manual updates and I normally forget to either make those changes myself or
> check that they've been done while reviewing PRs.
>
> One thing which may be at least a little bit more up-to-date is the output
> of the rdkit_version command in the cartridge itself:
> chembl_28=# select rdkit_version();
>  rdkit_version
> ---
>  0.76.0
> (1 row)
>
> It looks like that was bumped from 0.74 to 0.75 in 2020.3 The bump to 0.76
> will be in the 2021.09 release
>
> What I think will be most useful, but which won't be available until the
> 2021.09 release, is the rdkit_toolkit_version() command. This will show you
> the actual rdkit verison in the back and has the advantage that it's
> autotomatically updated.
>
> -greg
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-06 Thread Greg Landrum
Hi Brian,

On Mon, Jun 7, 2021 at 4:36 AM Brian Cole  wrote:

> This is a bit more of a question for AWS themselves, though I believe the
> RDKit build for the Postgres extension can be improved as well.
>
> The AWS documentation states, “RDKit extension version 3.8.”
>
> https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.html
>
> However, it doesn’t appear like that 3.8 version number has been bumped in
> a few RDKit versions. When is that version supposed to be bumped? Or am I
> missing some other way to find the RDKit version in the Postgres extension?
>

A large part of the problem here is that we're not very good about
providing version information for the cartridge. Any changes here require
manual updates and I normally forget to either make those changes myself or
check that they've been done while reviewing PRs.

One thing which may be at least a little bit more up-to-date is the output
of the rdkit_version command in the cartridge itself:
chembl_28=# select rdkit_version();
 rdkit_version
---
 0.76.0
(1 row)

It looks like that was bumped from 0.74 to 0.75 in 2020.3 The bump to 0.76
will be in the 2021.09 release

What I think will be most useful, but which won't be available until the
2021.09 release, is the rdkit_toolkit_version() command. This will show you
the actual rdkit verison in the back and has the advantage that it's
autotomatically updated.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit and py2exe

2021-06-06 Thread Thomas
Did anybody manage to get rdkit working with py2exe?

anaconda
python 3.8
py2exe 0.10
rdkit 2012.03.1

IMPORT:
from rdkit.Chem import MolFromSmiles, MolToSmiles

With this import, py2exe complains about some missing modules and my .exe
doesn't run (without any log). If I comment out the import it runs.
With Pyinstaller it works out of the box, but the executables produced by
pyinstaller are problematic for many other reasons (huge size, manifest
file ignored, subprocess based, no multiple exe allowed...)

Does anybody have a recent working recipe for the setup file?
Thank you!
Thomas
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit version in AWS Aurora?

2021-06-06 Thread Brian Cole
This is a bit more of a question for AWS themselves, though I believe the RDKit 
build for the Postgres extension can be improved as well.

The AWS documentation states, “RDKit extension version 3.8.”
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Updates.20180305.html

However, it doesn’t appear like that 3.8 version number has been bumped in a 
few RDKit versions. When is that version supposed to be bumped? Or am I missing 
some other way to find the RDKit version in the Postgres extension?

Thanks,
Brian
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDkit support on computing descriptors for organometallic complexes

2021-06-04 Thread ITS RDC
Dear all,

May I ask if how extensive is the support of Rdkit in computing molecular 
descriptors for organometallic complexes especially transition metals? Thank 
you.

Joanna
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit and pip

2021-06-03 Thread Greg Landrum
Hi Marco,

On Wed, May 26, 2021 at 2:37 PM Marco Stenta  wrote:

> Dear Colleagues,
> I recently came across this
> https://pypi.org/project/rdkit-pypi/
>
> is pip going to be supported officially by the dev community? any plan?
>

I'm not quite sure yet. I believe that at the moment the pip images are
still missing the extra data files that the RDKit requires in order to
correctly function.
After that's taken care of, we'll need one or more volunteers to make sure
that the rdkit-pypi images stay up to date. Just like the conda-forge
packages, this will be something that's community maintained, not something
the development team takes care of.


> getting out of the conda dependency might be beneficial to get
> slightly slimmer docker images.
>

If this is something you actually care about, you can just create a docker
image which has a local RDKit build matching the version of Python used in
the container.

-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit and pip

2021-05-26 Thread Marco Stenta
Dear Colleagues,
I recently came across this
https://pypi.org/project/rdkit-pypi/

is pip going to be supported officially by the dev community? any plan?
getting out of the conda dependency might be beneficial to get
slightly slimmer docker images.

Thanks

cheers,
m
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] [RDKit UGM2021] Save the date: Oct 14 and 15

2021-05-10 Thread Greg Landrum
Hi,

This year's RDKit UGM is going to take place October 14 and 15. It will,
unfortunately, once again be a purely virtual event. Hopefully next year we
will be able to travel again and all get together in one physical location,
but this year it's not possible to really plan an in-person meeting.

Since it seemed to work well last time, we'll do a combination of zoom and
either discord or some other text-based chat functionality and will have
two sessions per meeting day: one earlier in the day which is easier for
people in Asia to attend and one later in the day which is easier for
people in the Americas.

I'll send around more info and a link to the registration in the next week
or so.

I'd also like to try a virtual hackathon of some type, but will schedule
that for a different time, probably sometime this summer. Again, more
details on that soon.

Best,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit compilation from source question

2021-04-28 Thread Guilherme Duarte Ramos Matos
Hi Paolo,

You hit the nail right on the head: I defined the
_GLIBCXX_USE_CXX11_ABI=0 macro
and RDKit compiled and linked normally.  I still need to figure out what
exactly happened (I compiled Boost in my project space because I do not
have admin privileges to update the precompiled version SBU supercomputer
has) and I want to leave precise instructions to present and future grad
students and potential users of my code.  I'm very glad it worked, though!

Thanks so much!

On Wed, Apr 28, 2021 at 1:26 PM Paolo Tosco 
wrote:

> Hi Guilherme,
>
> it looks like it might be this:
>
> https://github.com/rdkit/rdkit/issues/2013#issuecomment-553563418
>
> This can happen if you are using pre-compiled Boost libraries that were
> compiled with a different compiler from the one you are using for RDKit.
>
> To check if that's the case, compare the symbols for one of those
> undefined references between your RDKit binary and your Boost library, e.g.:
>
> $ nm /path/to/boost/lib/lib/libboost_serialization.so.1.74.0  | grep
> 'common_oarchive.*text_oarchive.*save.*class_name_type'
> 00023b10 W
> _ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE
> $ nm
> ./Code/GraphMol/ChemReactions/CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o
> | grep 'common_oarchive.*text_oarchive.*save.*class_name_type'
>  W
> _ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE
>
> They should be identical as in my example above.
> If they aren't, you might need to define the _GLIBCXX_USE_CXX11_ABI=0 macro
> as explained in the GitHub comment I linked above.
>
> HTH, cheers
> p.
>
> On Wed, Apr 28, 2021 at 6:48 PM Guilherme Duarte Ramos Matos <
> guilherme.duarteramosma...@stonybrook.edu> wrote:
>
>> Dear RDKit community,
>>
>> I use the RDKit C++ libraries (version 2019_09_1) in one of my projects
>> and I have a linking problem that I am having a hard time troubleshooting.
>>
>> When I try to compile and link using the CMake command below
>>
>> * cmake .. -DRDK_INSTALL_INTREE=OFF \ *
>> *-DCMAKE_INSTALL_PREFIX=path/to/rdkit-Release_2019_09_1 \ *
>> *-DBOOST_ROOT= path/to/boost \*
>> *-DBoost_USE_STATIC_LIBS=OFF \ *
>> *-DBOOST_LIBRARYDIR=path/to/boost/lib \ *
>> *-DBoost_NO_SYSTEM_PATHS=ON \ *
>> *-DEIGEN3_INCLUDE_DIR=/path/to/eigen-3.3.9 \ *
>> *-DPYTHON_EXECUTABLE=/path/to/python3 \ *
>> *-DRDK_BUILD_PYTHON_WRAPPERS=OFF*
>>
>>
>> I get the following error message:
>>
>> *[ 70%] Linking CXX executable testEnumeration*
>> *CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In
>> function
>> boost::archive::detail::common_oarchive::vsave(boost::archive::class_name_type
>> const&)':
>> testEnumerate.cpp:(.text._ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE[_ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE]+0x32):
>> undefined reference
>> toboost::archive::text_oarchive_implboost::archive::text_oarchive::save(std::string
>> const&)'*
>> *CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In
>> function boost::archive::detail::iserializer> RDKit::EnumerateLibraryBase>::load_object_data(boost::archive::detail::basic_iarchive&,
>> void*, unsigned int) const':
>> testEnumerate.cpp:(.text._ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj[_ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj]+0x97):
>> undefined reference
>> toboost::archive::text_iarchive_implboost::archive::text_iarchive::load(std::string&)'*
>> *testEnumerate.cpp:(.text._ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj[_ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj]+0xd9):
>> undefined reference to
>> boost::archive::text_iarchive_impl::load(std::string&)'
>> CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In
>> functionboost::archive::detail::oserializer> RDKit::EnumerateLibraryBase>::save_object_data(boost::archive::detail::basic_oarchive&,
>> void const*) const':*
>> *testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv]+0x56):
>> undefined reference to
>> boost::archive::text_oarchive_impl::save(std::string
>> const&)'
>> testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibrar

Re: [Rdkit-discuss] RDKit compilation from source question

2021-04-28 Thread Paolo Tosco
RDKit::EnumerateLibrary>::save_object_data(boost::archive::detail::basic_oarchive&,
> void const*) const':
> testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit16EnumerateLibraryEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit16EnumerateLibraryEE16save_object_dataERNS1_14basic_oarchiveEPKv]+0x15a):
> undefined reference
> toboost::archive::text_oarchive_implboost::archive::text_oarchive::save(std::string
> const&)'*
> *CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In function
> void
> RDKit::EnumerateLibrary::load(boost::archive::text_iarchive&,
> unsigned int)':
> testEnumerate.cpp:(.text._ZN5RDKit16EnumerateLibrary4loadIN5boost7archive13text_iarchiveEEEvRT_j[_ZN5RDKit16EnumerateLibrary4loadIN5boost7archive13text_iarchiveEEEvRT_j]+0x18a):
> undefined reference
> toboost::archive::text_iarchive_implboost::archive::text_iarchive::load(std::string&)'*
> *collect2: error: ld returned 1 exit status*
>
>
> *make[2]: *** [Code/GraphMol/ChemReactions/testEnumeration] Error
> 1make[1]: ***
> [Code/GraphMol/ChemReactions/CMakeFiles/testEnumeration.dir/all] Error
> 2make: *** [all] Error 2*
>
>
> As far as I can tell, it is an "*undefined reference to  undefined
> reference to
> `boost::archive::text_iarchive_impl::load(std::string&)'*".
> Trying to compile and link RDKit using the cmake flag
> -DRDK_USE_BOOST_SERIALIZATION=OFF works, but it is not enough for my
> project; undefined references to RDKit::SmartsToMol pop up everywhere in
> the linking phase. My Boost version is 1.70.0.
>
> My question is simple: where do you think I should look at in my Boost or
> RDKit installation to track where the problem is? It seems something in
> serialization is not properly done. Any advice would be very much
> appreciated.
>
> Thanks so much!
>
> --
> *****
> Guilherme Duarte Ramos Matos
> Ph.D. in Chemistry, UC Irvine (2018)
> Postdoctoral Associate, Rizzo Lab
>
> *
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit compilation from source question

2021-04-28 Thread Guilherme Duarte Ramos Matos
Dear RDKit community,

I use the RDKit C++ libraries (version 2019_09_1) in one of my projects and
I have a linking problem that I am having a hard time troubleshooting.

When I try to compile and link using the CMake command below

* cmake .. -DRDK_INSTALL_INTREE=OFF \ *
*-DCMAKE_INSTALL_PREFIX=path/to/rdkit-Release_2019_09_1 \ *
*-DBOOST_ROOT= path/to/boost \*
*-DBoost_USE_STATIC_LIBS=OFF \ *
*-DBOOST_LIBRARYDIR=path/to/boost/lib \ *
*-DBoost_NO_SYSTEM_PATHS=ON \ *
*-DEIGEN3_INCLUDE_DIR=/path/to/eigen-3.3.9 \ *
*-DPYTHON_EXECUTABLE=/path/to/python3 \ *
*-DRDK_BUILD_PYTHON_WRAPPERS=OFF*


I get the following error message:

*[ 70%] Linking CXX executable testEnumeration*
*CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In function
boost::archive::detail::common_oarchive::vsave(boost::archive::class_name_type
const&)':
testEnumerate.cpp:(.text._ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE[_ZN5boost7archive6detail15common_oarchiveINS0_13text_oarchiveEE5vsaveERKNS0_15class_name_typeE]+0x32):
undefined reference
toboost::archive::text_oarchive_implboost::archive::text_oarchive::save(std::string
const&)'*
*CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In function
boost::archive::detail::iserializer::load_object_data(boost::archive::detail::basic_iarchive&,
void*, unsigned int) const':
testEnumerate.cpp:(.text._ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj[_ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj]+0x97):
undefined reference
toboost::archive::text_iarchive_implboost::archive::text_iarchive::load(std::string&)'*
*testEnumerate.cpp:(.text._ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj[_ZNK5boost7archive6detail11iserializerINS0_13text_iarchiveEN5RDKit20EnumerateLibraryBaseEE16load_object_dataERNS1_14basic_iarchiveEPvj]+0xd9):
undefined reference to
boost::archive::text_iarchive_impl::load(std::string&)'
CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In
functionboost::archive::detail::oserializer::save_object_data(boost::archive::detail::basic_oarchive&,
void const*) const':*
*testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv]+0x56):
undefined reference to
boost::archive::text_oarchive_impl::save(std::string
const&)'
testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit20EnumerateLibraryBaseEE16save_object_dataERNS1_14basic_oarchiveEPKv]+0xa1):
undefined reference
toboost::archive::text_oarchive_implboost::archive::text_oarchive::save(std::string
const&)'*
*CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In function
boost::archive::detail::oserializer::save_object_data(boost::archive::detail::basic_oarchive&,
void const*) const':
testEnumerate.cpp:(.text._ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit16EnumerateLibraryEE16save_object_dataERNS1_14basic_oarchiveEPKv[_ZNK5boost7archive6detail11oserializerINS0_13text_oarchiveEN5RDKit16EnumerateLibraryEE16save_object_dataERNS1_14basic_oarchiveEPKv]+0x15a):
undefined reference
toboost::archive::text_oarchive_implboost::archive::text_oarchive::save(std::string
const&)'*
*CMakeFiles/testEnumeration.dir/Enumerate/testEnumerate.cpp.o: In function
void
RDKit::EnumerateLibrary::load(boost::archive::text_iarchive&,
unsigned int)':
testEnumerate.cpp:(.text._ZN5RDKit16EnumerateLibrary4loadIN5boost7archive13text_iarchiveEEEvRT_j[_ZN5RDKit16EnumerateLibrary4loadIN5boost7archive13text_iarchiveEEEvRT_j]+0x18a):
undefined reference
toboost::archive::text_iarchive_implboost::archive::text_iarchive::load(std::string&)'*
*collect2: error: ld returned 1 exit status*


*make[2]: *** [Code/GraphMol/ChemReactions/testEnumeration] Error 1make[1]:
*** [Code/GraphMol/ChemReactions/CMakeFiles/testEnumeration.dir/all] Error
2make: *** [all] Error 2*


As far as I can tell, it is an "*undefined reference to  undefined
reference to
`boost::archive::text_iarchive_impl::load(std::string&)'*".
Trying to compile and link RDKit using the cmake flag
-DRDK_USE_BOOST_SERIALIZATION=OFF works, but it is not enough for my
project; undefined references to RDKit::SmartsToMol pop up everywhere in
the linking phase. My Boost version is 1.70.0.

My question is simple: where do you think I should look at in my Boost or
RDKit installation to track where the problem is? It seems something in
serialization is not proper

Re: [Rdkit-discuss] RDKit - contributing conformational entropy descriptor

2021-03-29 Thread Greg Landrum
Hi Geoff,

Congrats to you and co-authors on the paper and thanks for offering to
contribute to the RDKit.

Given the dependence on py_rdl, this probably isn't currently a good match
for the core RDKit (we try to minimize adding additional external
dependencies), but it would be great to have it available in the RDKit
Contrib directory. The only real requirement there is to have a README.md
file describing what the contribution is and linking to the original
publication.
You can take a look at a couple of recent contributions to use as examples:
https://github.com/rdkit/rdkit/tree/master/Contrib/CalcLigRMSD
https://github.com/rdkit/rdkit/tree/master/Contrib/NIBRSubstructureFilters

A quick aside on RDL: the RDKit can optionally use the RDL library, but it
looks like it doesn't currently expose everything RingEntropy.py needs to
Python. If the code is available in Contrib, we can take a look at doing a
C++ re-implementation of the new descriptor(s) and putting that in the
core. The advantage of the C++ implementation is that it would make
the descriptor more broadly accessible. We've done this a couple of times
in the past with code from Contrib.

Best,
-greg

On Mon, Mar 29, 2021 at 9:37 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> Hi Greg,
>
> We just published a paper with a linear model predicting conformational
> entropies:
> https://pubs.acs.org/doi/10.1021/acs.jctc.0c01213
> https://github.com/hutchisonlab/molecular-entropies
>
> We built the notebooks on top of RDKit - and of course would like to
> contribute the descriptor back to RDKit.
>
> Is there a contribution guide (e.g., how to structure the code before a
> pull request)? In particular, this one has a few components that might
> prove useful as separate calls (e.g., the ring flexibility measure).
>
> Thanks,
> -Geoff
>
> ---
> Prof. Geoffrey Hutchison
> Department of Chemistry
> University of Pittsburgh
> tel: (412) 648-0492
> email: geo...@pitt.edu
> twitter: @ghutchis
> web: https://hutchison.chem.pitt.edu/
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit - contributing conformational entropy descriptor

2021-03-29 Thread Geoffrey Hutchison
Hi Greg,

We just published a paper with a linear model predicting conformational 
entropies:
https://pubs.acs.org/doi/10.1021/acs.jctc.0c01213
https://github.com/hutchisonlab/molecular-entropies

We built the notebooks on top of RDKit - and of course would like to contribute 
the descriptor back to RDKit.

Is there a contribution guide (e.g., how to structure the code before a pull 
request)? In particular, this one has a few components that might prove useful 
as separate calls (e.g., the ring flexibility measure).

Thanks,
-Geoff

---
Prof. Geoffrey Hutchison
Department of Chemistry
University of Pittsburgh
tel: (412) 648-0492
email: geo...@pitt.edu
twitter: @ghutchis
web: https://hutchison.chem.pitt.edu/



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [Rdkit-announce] 2021.03.1 RDKit Release

2021-03-29 Thread Drew Gibson via Rdkit-discuss
Hi Greg,

I can confirm the issue is solved here.

Cheers !

Drew

On Sun, 28 Mar 2021 at 08:32, Greg Landrum  wrote:

> Hi Drew,
>
> Thanks for pointing out the problem. I had inadvertently done the conda
> builds using freetype, but I forgot to add a freetype dependency.
> It should be fixed now. Note: removed the old builds and uploaded new
> ones, so you'll probably need to do a conda uninstall and then conda
> install again.
>
> -greg
>
>
> On Sat, Mar 27, 2021 at 8:09 PM Drew Gibson 
> wrote:
>
>> Hi,
>>
>> just a heads-up that I'm seeing the following error on MacOS on trying to
>> create the rdkit extension in a chembl_28 db.
>>
>> chembl_28=# create extension if not exists rdkit;
>> ERROR:  could not load library
>> "/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so":
>> dlopen(/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so, 10):
>> Library not loaded: /usr/local/opt/freetype/lib/libfreetype.6.dylib
>>   Referenced from: /Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so
>>   Reason: image not found
>> chembl_28=#
>>
>> I'm using a Mac Mini 2018 with Big Sur version 11.2.3.  I created my
>> conda environment using the postgresql=12.2 option - haven't tried the
>> others but then the issue doesn't seem related to postgresql.
>>
>> I subsequently installed the 2020.03.3 version with postgresql=12.2 and
>> have had no problems.
>>
>> Thanks !
>>
>> Drew
>>
>>
>> On Fri, 26 Mar 2021 at 15:14, Greg Landrum 
>> wrote:
>>
>>> Dear all,
>>>
>>> I'm pleased to announce that the 2021.03 version of the RDKit is
>>> released. We actually managed to get the .03 release done during March.
>>> Shocking! ;-)
>>> The release notes are below.[1]
>>>
>>> The release files are on the github release page:
>>> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
>>> The DOI for this release is:
>>> https://doi.org/10.5281/zenodo.4639022
>>>
>>> I do not plan to do conda builds for the Python wrappers in the rdkit
>>> channel for this release. The builds done as part of the conda-forge
>>> project are automated and cover more Python versions and operating systems
>>> than I could ever hope to do manually.
>>> Please install the rdkit using conda-forge:
>>> conda install -c conda-forge rdkit
>>> I believe that the conda-forge builds of the new version should appear
>>> over the next couple of days.
>>>
>>> I hope to finish the conda builds of the PostgreSQL cartridge for linux
>>> and the mac and have them available in the rdkit channel by later today
>>> or tomorrow.
>>>
>>> The online version of the documentation at rdkit.org (
>>> http://rdkit.org/docs/index.html) has been updated.
>>>
>>> Thanks to everyone who submitted code, bug reports, and suggestions for
>>> this release!
>>>
>>> Please let me know if you find any problems with the release or have
>>> suggestions for the next one, which is scheduled for September/October 2021.
>>>
>>> Best Regards,
>>> -greg
>>> [1] We probably should figure out some way to make the release notes a
>>> bit less verbose. ;-)
>>>
>>>
>>> # Release_2021.03.1
>>> (Changes relative to Release_2020.09.1)
>>>
>>> ## Backwards incompatible changes
>>> - The distance-geometry based conformer generation now by defaults
>>> generates
>>>   trans(oid) conformations for amides, esters, and related structures.
>>> This can
>>>   be toggled off with the `forceTransAmides` flag in EmbedParameters.
>>> Note that
>>>   this change does not impact conformers created using one of the ET
>>> versions.
>>>   (#3794)
>>> - The conformer generator now uses symmetry by default when doing RMS
>>> pruning.
>>>   This can be disabled using the `useSymmetryForPruning` flag in
>>>   EmbedParameters. (#3813)
>>> - Double bonds with unspecified stereochemistry in the products of
>>> chemical
>>>   reactions now have their stereo set to STEREONONE instead of STEREOANY
>>> (#3078)
>>> - The MolToSVG() function has been moved from rdkit.Chem to
>>> rdkit.Chem.Draw
>>>   (#3696)
>>> - There have been numerous changes to the RGroup Decomposition code
>>> which change
>>>   the results. (#3767)
>>> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
>>> molecule
>>>   is now decomposed based on the first matching scaffold which adds/uses
>>> the
>>>   least number of non-user-provided R labels, rather than simply the
>>> first
>>>   matching scaffold.
>>>   Among other things, this allows the code to provide the same results
>>> for both
>>>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
>>> scaffolds
>>>   are provided without requiring the user to get overly concerned about
>>> the
>>>   input ordering of the scaffolds. (#3969)
>>> - There have been numerous changes to
>>> `GenerateDepictionMatching2DStructure()` (#3811)
>>> - Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles
>>> will now
>>>   cause the molecule to be kekulized before SMILES generation. Note that
>>> this
>>>   can lead to an exception being thrown. Previousl

Re: [Rdkit-discuss] [Rdkit-announce] 2021.03.1 RDKit Release

2021-03-28 Thread Greg Landrum
Hi Drew,

Thanks for pointing out the problem. I had inadvertently done the conda
builds using freetype, but I forgot to add a freetype dependency.
It should be fixed now. Note: removed the old builds and uploaded new ones,
so you'll probably need to do a conda uninstall and then conda install
again.

-greg


On Sat, Mar 27, 2021 at 8:09 PM Drew Gibson 
wrote:

> Hi,
>
> just a heads-up that I'm seeing the following error on MacOS on trying to
> create the rdkit extension in a chembl_28 db.
>
> chembl_28=# create extension if not exists rdkit;
> ERROR:  could not load library
> "/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so":
> dlopen(/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so, 10):
> Library not loaded: /usr/local/opt/freetype/lib/libfreetype.6.dylib
>   Referenced from: /Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so
>   Reason: image not found
> chembl_28=#
>
> I'm using a Mac Mini 2018 with Big Sur version 11.2.3.  I created my conda
> environment using the postgresql=12.2 option - haven't tried the others but
> then the issue doesn't seem related to postgresql.
>
> I subsequently installed the 2020.03.3 version with postgresql=12.2 and
> have had no problems.
>
> Thanks !
>
> Drew
>
>
> On Fri, 26 Mar 2021 at 15:14, Greg Landrum  wrote:
>
>> Dear all,
>>
>> I'm pleased to announce that the 2021.03 version of the RDKit is
>> released. We actually managed to get the .03 release done during March.
>> Shocking! ;-)
>> The release notes are below.[1]
>>
>> The release files are on the github release page:
>> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
>> The DOI for this release is:
>> https://doi.org/10.5281/zenodo.4639022
>>
>> I do not plan to do conda builds for the Python wrappers in the rdkit
>> channel for this release. The builds done as part of the conda-forge
>> project are automated and cover more Python versions and operating systems
>> than I could ever hope to do manually.
>> Please install the rdkit using conda-forge:
>> conda install -c conda-forge rdkit
>> I believe that the conda-forge builds of the new version should appear
>> over the next couple of days.
>>
>> I hope to finish the conda builds of the PostgreSQL cartridge for linux
>> and the mac and have them available in the rdkit channel by later today
>> or tomorrow.
>>
>> The online version of the documentation at rdkit.org (
>> http://rdkit.org/docs/index.html) has been updated.
>>
>> Thanks to everyone who submitted code, bug reports, and suggestions for
>> this release!
>>
>> Please let me know if you find any problems with the release or have
>> suggestions for the next one, which is scheduled for September/October 2021.
>>
>> Best Regards,
>> -greg
>> [1] We probably should figure out some way to make the release notes a
>> bit less verbose. ;-)
>>
>>
>> # Release_2021.03.1
>> (Changes relative to Release_2020.09.1)
>>
>> ## Backwards incompatible changes
>> - The distance-geometry based conformer generation now by defaults
>> generates
>>   trans(oid) conformations for amides, esters, and related structures.
>> This can
>>   be toggled off with the `forceTransAmides` flag in EmbedParameters.
>> Note that
>>   this change does not impact conformers created using one of the ET
>> versions.
>>   (#3794)
>> - The conformer generator now uses symmetry by default when doing RMS
>> pruning.
>>   This can be disabled using the `useSymmetryForPruning` flag in
>>   EmbedParameters. (#3813)
>> - Double bonds with unspecified stereochemistry in the products of
>> chemical
>>   reactions now have their stereo set to STEREONONE instead of STEREOANY
>> (#3078)
>> - The MolToSVG() function has been moved from rdkit.Chem to
>> rdkit.Chem.Draw
>>   (#3696)
>> - There have been numerous changes to the RGroup Decomposition code which
>> change
>>   the results. (#3767)
>> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
>> molecule
>>   is now decomposed based on the first matching scaffold which adds/uses
>> the
>>   least number of non-user-provided R labels, rather than simply the first
>>   matching scaffold.
>>   Among other things, this allows the code to provide the same results
>> for both
>>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
>> scaffolds
>>   are provided without requiring the user to get overly concerned about
>> the
>>   input ordering of the scaffolds. (#3969)
>> - There have been numerous changes to
>> `GenerateDepictionMatching2DStructure()` (#3811)
>> - Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will
>> now
>>   cause the molecule to be kekulized before SMILES generation. Note that
>> this
>>   can lead to an exception being thrown. Previously this argument would
>> only
>>   write kekulized SMILES if the molecule had already been kekulized
>> (#2788)
>> - Using the kekulize argument in the MHFP code will now cause the
>> molecule to be
>>   kekulized before the fingerprint is generated. Note that bec

Re: [Rdkit-discuss] [Rdkit-announce] 2021.03.1 RDKit Release

2021-03-27 Thread Drew Gibson via Rdkit-discuss
Hi,

just a heads-up that I'm seeing the following error on MacOS on trying to
create the rdkit extension in a chembl_28 db.

chembl_28=# create extension if not exists rdkit;
ERROR:  could not load library
"/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so":
dlopen(/Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so, 10):
Library not loaded: /usr/local/opt/freetype/lib/libfreetype.6.dylib
  Referenced from: /Users/drew/anaconda3/envs/rdkit-psql-2021/lib/rdkit.so
  Reason: image not found
chembl_28=#

I'm using a Mac Mini 2018 with Big Sur version 11.2.3.  I created my conda
environment using the postgresql=12.2 option - haven't tried the others but
then the issue doesn't seem related to postgresql.

I subsequently installed the 2020.03.3 version with postgresql=12.2 and
have had no problems.

Thanks !

Drew


On Fri, 26 Mar 2021 at 15:14, Greg Landrum  wrote:

> Dear all,
>
> I'm pleased to announce that the 2021.03 version of the RDKit is released.
> We actually managed to get the .03 release done during March. Shocking! ;-)
> The release notes are below.[1]
>
> The release files are on the github release page:
> https://github.com/rdkit/rdkit/releases/tag/Release_2021_03_1
> The DOI for this release is:
> https://doi.org/10.5281/zenodo.4639022
>
> I do not plan to do conda builds for the Python wrappers in the rdkit
> channel for this release. The builds done as part of the conda-forge
> project are automated and cover more Python versions and operating systems
> than I could ever hope to do manually.
> Please install the rdkit using conda-forge:
> conda install -c conda-forge rdkit
> I believe that the conda-forge builds of the new version should appear
> over the next couple of days.
>
> I hope to finish the conda builds of the PostgreSQL cartridge for linux
> and the mac and have them available in the rdkit channel by later today
> or tomorrow.
>
> The online version of the documentation at rdkit.org (
> http://rdkit.org/docs/index.html) has been updated.
>
> Thanks to everyone who submitted code, bug reports, and suggestions for
> this release!
>
> Please let me know if you find any problems with the release or have
> suggestions for the next one, which is scheduled for September/October 2021.
>
> Best Regards,
> -greg
> [1] We probably should figure out some way to make the release notes a bit
> less verbose. ;-)
>
>
> # Release_2021.03.1
> (Changes relative to Release_2020.09.1)
>
> ## Backwards incompatible changes
> - The distance-geometry based conformer generation now by defaults
> generates
>   trans(oid) conformations for amides, esters, and related structures.
> This can
>   be toggled off with the `forceTransAmides` flag in EmbedParameters. Note
> that
>   this change does not impact conformers created using one of the ET
> versions.
>   (#3794)
> - The conformer generator now uses symmetry by default when doing RMS
> pruning.
>   This can be disabled using the `useSymmetryForPruning` flag in
>   EmbedParameters. (#3813)
> - Double bonds with unspecified stereochemistry in the products of chemical
>   reactions now have their stereo set to STEREONONE instead of STEREOANY
> (#3078)
> - The MolToSVG() function has been moved from rdkit.Chem to rdkit.Chem.Draw
>   (#3696)
> - There have been numerous changes to the RGroup Decomposition code which
> change
>   the results. (#3767)
> - In RGroup Decomposition, when onlyMatchAtRGroups is set to false, each
> molecule
>   is now decomposed based on the first matching scaffold which adds/uses
> the
>   least number of non-user-provided R labels, rather than simply the first
>   matching scaffold.
>   Among other things, this allows the code to provide the same results for
> both
>   onlyMatchAtRGroups=true and onlyMatchAtRGroups=false when suitable
> scaffolds
>   are provided without requiring the user to get overly concerned about the
>   input ordering of the scaffolds. (#3969)
> - There have been numerous changes to
> `GenerateDepictionMatching2DStructure()` (#3811)
> - Setting the kekuleSmiles argument (doKekule in C++) to MolToSmiles will
> now
>   cause the molecule to be kekulized before SMILES generation. Note that
> this
>   can lead to an exception being thrown. Previously this argument would
> only
>   write kekulized SMILES if the molecule had already been kekulized (#2788)
> - Using the kekulize argument in the MHFP code will now cause the molecule
> to be
>   kekulized before the fingerprint is generated. Note that becaues
> kekulization
>   is not canonical, using this argument currently causes the results to
> depend
>   on the input atom numbering. Note that this can lead to an exception
> being
>   thrown. (#3942)
> - Gradients for angle and torsional restraints in both UFF and MMFF were
> computed
>   incorrectly, which could give rise to potential instability during
> minimization.
>   As part of fixing this problem, force constants have been switched to
> using
>   kcal/degree^2 units instead of kcal/rad^2 units, co

[Rdkit-discuss] rdkit installation problem

2021-01-26 Thread Moorthy Suresh
Dear rdkit Team,

I was trying to install rdkit using conda, but I still see the error
message "ModuleNotFoundError: No module named 'rdkit'".

The following are my installation process and subsequent errors.
---
 *conda install -c rdkit rdkit=2020*

When I tried to install it using above command, it showed the following
error
Solving Environmen: Failed with initial frozen solve. Retrying with
flexible solve.

I solved the above problem using the following commands

*conda create --name myenv*
*conda activate myenv*

After the above command, it successfully processed to downloading and
extracting packages.  but I still see the error message while importing the
module.

It could be great if you please help solve this issue.

Best Regards
Suresh
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit ElasticSearch Plugin

2021-01-21 Thread Rajarshi Guha
There was a presentation at a recent Cambridge Cheminformatics meeting on
using Elastic for similarity searches, and I think the presenter was also
considering extending to substructure matching as well. But no code
available as far as I can tell

https://github.com/MysterionRise

On Thu, Jan 21, 2021 at 10:44 AM Joos Kiener  wrote:

> Hi Naomi,
>
> I once played around a bit with this idea using the Lucene-based RDKit
> example as guidance. However what that code does inside Lucene and hence my
> "adaption" inside elastic search is only the fingerprint screening part.
> For the actual subgraph-match the data then has to be sent to the
> caller/client and doesn't run inside elastic search and means one must
> manipulate the elastic search results (hit count, paging,...) before
> finally returning to the end user application. Simply said, not a very
> usable but very hacky solution.
>
> Even ignoring that part, it wasn't very fast either. That could be due to
> many things like only having 1 machine for ES (my machine, no cluster) and
> not being an expert in ES anyway (suboptimal config?). Or maybe the dataset
> was too small to actually benefit. Same data, same query is much faster in
> PostgreSQL + RDKit + Full-text index and easier to use. (Yes, PostgreSQL
> supports full-text search similar to elastic. if one doesn't need very
> advanced features or has a lot of data, for sure worth a look)
>
> Any "real solution" must also do the subgraph matching inside elastic
> itself which means writing a plugin / extension for elasticsearch. This was
> simply too involved for me to even try. (If that is of interest, you should
> probably also look at the very recent licensing changes to elasticsearch).
>
> The presentation Joshua mentioned is actually only about similarity search
> which naturally is easier to implement and fast.
>
> Having said that, there is a commercial solution available from
> PerkinElmer in their Signals Data factory offering. Of course this has
> nothing to do with RDKit but it does hint that it's possible to do this if
> you have the time, budget and skills/knowledge.
>
> Another  commercial "fast substructure search" option would be nextmoves
> Arthor but that has nothing to do with elasticsearch. Question is if you
> want elasticsearch due to the speed or due to the combination with text
> search. I would probably avoid it if the text search part is not important.
>
> Just using RDKit default functionality is actually pretty fast (see on
> Gregs blog), well it does run in memory. Nowadays a machine with lots of
> RAM doesn't cost all that much so I could see that scaling to 10-20 million
> structures easily.
>
> hope that helps you a bit to come to a conclusion on what to do.
>
> Best Regards,
>
> Joos
>
>
> ------ Forwarded message --
>> From: Naomi Jacobs 
>> To: rdkit-discuss@lists.sourceforge.net
>> Cc: Alan Pierce , Larry Taylor 
>> Bcc:
>> Date: Wed, 20 Jan 2021 22:27:32 -0800
>> Subject: [Rdkit-discuss] RDKit ElasticSearch Plugin
>> Hi all,
>>
>> We're looking for information about whether anyone has built an
>> ElasticSearch plugin using RDKit to support chemical search. I didn't see
>> anything open-source online, but was thinking some folks may have heard
>> about internal efforts and would be willing to share any code and/or chat
>> about it. Thanks!
>>
>> Cheers,
>> Naomi
>>
>> --
>> *Naomi Jacobs*
>> Software Engineer | benchling.com
>> (415) 590-2798
>>
>>
>>
>> -- Forwarded message --
>> From: Greg Landrum 
>> To: Naomi Jacobs 
>> Cc: RDKit Discuss , Larry Taylor <
>> la...@benchling.com>
>> Bcc:
>> Date: Thu, 21 Jan 2021 08:54:08 +0100
>> Subject: Re: [Rdkit-discuss] RDKit ElasticSearch Plugin
>> Hi Naomi,
>>
>> I'm not personally aware of any ElasticSearch work, but there is a
>> prototype for a lucene plugin which could, I believe, be used as the basis
>> for an ElasticSearch plugin:
>> https://github.com/rdkit/org.rdkit.lucene
>>
>> It's (obviously) been a while since anyone did anything with that code
>> and it may no longer work, but the more recent (and still functional)
>> RDKit-neo4j integration (https://github.com/rdkit/neo4j-rdkit) can
>> provide some patterns for how the RDKit java integration can be used in
>> this type of context.
>>
>> I hope this helps, and would be interested to hear if you end up doing
>> anything with the RDKit and ElasticSearch.
>> -greg
>>
>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
Rajarshi Guha | http://blog.rguha.net | @rguha <https://twitter.com/rguha>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit ElasticSearch Plugin

2021-01-21 Thread Joos Kiener
Hi Naomi,

I once played around a bit with this idea using the Lucene-based RDKit
example as guidance. However what that code does inside Lucene and hence my
"adaption" inside elastic search is only the fingerprint screening part.
For the actual subgraph-match the data then has to be sent to the
caller/client and doesn't run inside elastic search and means one must
manipulate the elastic search results (hit count, paging,...) before
finally returning to the end user application. Simply said, not a very
usable but very hacky solution.

Even ignoring that part, it wasn't very fast either. That could be due to
many things like only having 1 machine for ES (my machine, no cluster) and
not being an expert in ES anyway (suboptimal config?). Or maybe the dataset
was too small to actually benefit. Same data, same query is much faster in
PostgreSQL + RDKit + Full-text index and easier to use. (Yes, PostgreSQL
supports full-text search similar to elastic. if one doesn't need very
advanced features or has a lot of data, for sure worth a look)

Any "real solution" must also do the subgraph matching inside elastic
itself which means writing a plugin / extension for elasticsearch. This was
simply too involved for me to even try. (If that is of interest, you should
probably also look at the very recent licensing changes to elasticsearch).

The presentation Joshua mentioned is actually only about similarity search
which naturally is easier to implement and fast.

Having said that, there is a commercial solution available from PerkinElmer
in their Signals Data factory offering. Of course this has nothing to do
with RDKit but it does hint that it's possible to do this if you have the
time, budget and skills/knowledge.

Another  commercial "fast substructure search" option would be nextmoves
Arthor but that has nothing to do with elasticsearch. Question is if you
want elasticsearch due to the speed or due to the combination with text
search. I would probably avoid it if the text search part is not important.

Just using RDKit default functionality is actually pretty fast (see on
Gregs blog), well it does run in memory. Nowadays a machine with lots of
RAM doesn't cost all that much so I could see that scaling to 10-20 million
structures easily.

hope that helps you a bit to come to a conclusion on what to do.

Best Regards,

Joos


-- Forwarded message --
> From: Naomi Jacobs 
> To: rdkit-discuss@lists.sourceforge.net
> Cc: Alan Pierce , Larry Taylor 
> Bcc:
> Date: Wed, 20 Jan 2021 22:27:32 -0800
> Subject: [Rdkit-discuss] RDKit ElasticSearch Plugin
> Hi all,
>
> We're looking for information about whether anyone has built an
> ElasticSearch plugin using RDKit to support chemical search. I didn't see
> anything open-source online, but was thinking some folks may have heard
> about internal efforts and would be willing to share any code and/or chat
> about it. Thanks!
>
> Cheers,
> Naomi
>
> --
> *Naomi Jacobs*
> Software Engineer | benchling.com
> (415) 590-2798
>
>
>
> -- Forwarded message --
> From: Greg Landrum 
> To: Naomi Jacobs 
> Cc: RDKit Discuss , Larry Taylor <
> la...@benchling.com>
> Bcc:
> Date: Thu, 21 Jan 2021 08:54:08 +0100
> Subject: Re: [Rdkit-discuss] RDKit ElasticSearch Plugin
> Hi Naomi,
>
> I'm not personally aware of any ElasticSearch work, but there is a
> prototype for a lucene plugin which could, I believe, be used as the basis
> for an ElasticSearch plugin:
> https://github.com/rdkit/org.rdkit.lucene
>
> It's (obviously) been a while since anyone did anything with that code and
> it may no longer work, but the more recent (and still functional)
> RDKit-neo4j integration (https://github.com/rdkit/neo4j-rdkit) can
> provide some patterns for how the RDKit java integration can be used in
> this type of context.
>
> I hope this helps, and would be interested to hear if you end up doing
> anything with the RDKit and ElasticSearch.
> -greg
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit ElasticSearch Plugin

2021-01-21 Thread Joshua Meyers via Rdkit-discuss
Hey Naomi,
I'm not sure if the code is available, but I remember a talk about searching 
molecules in ES at the Cambridge Cheminf Network meeting in November on this. 
There is a link on the speakers' GitHub account 
https://github.com/MysterionRise.
Hope this is helpful,Josh

Sent from Yahoo Mail on Android 
 
  On Thu, 21 Jan 2021 at 7:55, Greg Landrum wrote:   
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit ElasticSearch Plugin

2021-01-20 Thread Greg Landrum
Hi Naomi,

I'm not personally aware of any ElasticSearch work, but there is a
prototype for a lucene plugin which could, I believe, be used as the basis
for an ElasticSearch plugin:
https://github.com/rdkit/org.rdkit.lucene

It's (obviously) been a while since anyone did anything with that code and
it may no longer work, but the more recent (and still functional)
RDKit-neo4j integration (https://github.com/rdkit/neo4j-rdkit) can provide
some patterns for how the RDKit java integration can be used in this type
of context.

I hope this helps, and would be interested to hear if you end up doing
anything with the RDKit and ElasticSearch.
-greg


On Thu, Jan 21, 2021 at 7:58 AM Naomi Jacobs  wrote:

> Hi all,
>
> We're looking for information about whether anyone has built an
> ElasticSearch plugin using RDKit to support chemical search. I didn't see
> anything open-source online, but was thinking some folks may have heard
> about internal efforts and would be willing to share any code and/or chat
> about it. Thanks!
>
> Cheers,
> Naomi
>
> --
> *Naomi Jacobs*
> Software Engineer | benchling.com
> (415) 590-2798
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit ElasticSearch Plugin

2021-01-20 Thread Naomi Jacobs
Hi all,

We're looking for information about whether anyone has built an
ElasticSearch plugin using RDKit to support chemical search. I didn't see
anything open-source online, but was thinking some folks may have heard
about internal efforts and would be willing to share any code and/or chat
about it. Thanks!

Cheers,
Naomi

-- 
*Naomi Jacobs*
Software Engineer | benchling.com
(415) 590-2798
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Rdkit Machine Learning Project

2021-01-12 Thread Aditya Sahay
Hi,

I have recently enrolled in a programme to use Python in Cheminformatics. As 
part of my programme, I have a mini-project to use rdkit package for machine 
learning (scikit-learn). I have been provided with a data set of molecules 
(provided as SMILES) with various properties in a .csv file. My main task is to 
use machine learning methods (Random Forests, SVM, Neural networks) to explore 
the data set.

Can anyone provide any guidance on how to begin or any resources to use?

Thanks


Sent from Mail for Windows 10

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 157, Issue 2

2020-11-10 Thread Joshua Meyers
 cram into GPU memory all in parallel.
>>
>> [*] gWEGA (I believe) is a GPU-accelerated version of the standard WEGA
>> algorithm and based on the published timings is an order of magnitude or
>> more slower than fastROCS
>>
>> Having said all of that, our GPU-accelerated shape similarity function
>> just brute forces through the overlap series to sixth order, as (a) my
>> happy place is on the accuracy side of the speed/accuracy tradeoff, and (b)
>> our electrostatic similarity calculations are sufficiently complex that
>> making the shape function faster wouldn?t be that much of a net win. As a
>> result, take all of the above with a grain of salt ?.
>>
>> Regards,
>> Mark
>>
>> --
>> Mark Mackey
>> Chief Scientific Officer
>> Cresset
>> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8
>> 0SS, UK
>> tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223
>> 853667
>> email: m...@cresset-group.com<mailto:m...@cresset-group.com
>> >web: www.cresset-group.com<
>> http://www.cresset-group.com/>skype: mark_cresset
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 157, Issue 2

2020-11-04 Thread Greg Landrum
On Wed, 4 Nov 2020 at 18:27, Mark Mackey  wrote:

>  I did look at Shape-It way back when Silicos open-sourced it: as far as
> I can remember the code looked clean enough but it was slow. Unfortunately
> from the RDKit point of view it’s LGPL so can’t be used as the basis of an
> RDKit shape algorithm.
>
Hans actually gave permission to relicense an RDKit shape-it port. There is
a very old PR that made some progress on this but we never finished it
since the performance was just no good. I can’t find that PR/fork at the
moment, but if someone thinks that may be a starting point (I’m skeptical)
I can look




>
> Regards,
>
> Mark
>
>
>
> *From:* Chris Swain 
> *Sent:* 04 November 2020 15:56
> *To:* rdkit-discuss@lists.sourceforge.net; Mark Mackey <
> m...@cresset-group.com>
> *Subject:* Re: Rdkit-discuss Digest, Vol 157, Issue 2
>
>
>
> Hi Mark,
>
>
>
> Have you ever looked at Optipharm for shape comparison?
>
>
>
> https://www.nature.com/articles/s41598-018-37908-6
>
>
>
> Or Shape-it
>
>
>
>
> http://silicos-it.be.s3-websiteu-west-1.amazonaws.com/software/shape-it/1.0.1/shape-it.html
>
>
>
>
>
> Cheers
>
>
>
> Chris
>
>
>
>
>
>
>
> On 4 Nov 2020, at 14:28, rdkit-discuss-requ...@lists.sourceforge.net
> wrote:
>
>
>
> From: Mark Mackey 
> To: Lewis Martin , RDKit Discuss
>   
> Subject: Re: [Rdkit-discuss] GPU Implementation of shape-based 3D
>   overlap on rdkit?
> Message-ID:
> <
> dbbpr08mb4235128b45e0f546acfc5adb97...@dbbpr08mb4235.eurprd08.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Lewis,
>
> The standard shape alignment algorithm that everyone uses is from Grant &
> Pickup 1996 (
> https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291096-987X%2819961115%2917%3A14%3C1653%3A%3AAID-JCC7%3E3.0.CO%3B2-K
> ).
>
> It?s a Taylor-series-like expansion using spherical Gaussians as stand-ins
> for hard spheres - you take the atomic volumes, subtract off the pairwise
> overlaps, add back in the three-way overlaps, subtract off the four-way
> overlaps, and so on. I did a fair few tests some years back and you really
> need to go to 6 terms to get decent accuracy. However, all of the
> commercial algorithms (ROCS, Phase Shape, etc) seem to truncate at 2, so go
> figure. OTOH the ?high throughput? versions all seem to be operated with
> ludicrously low number of conformations so the error in incomplete coverage
> of conformer space dwarfs the 5% noise that you get from truncating at 2
> terms rather than 6.
>
> If you want something slightly more accurate at the same computational
> cost, look at WEGA (
> https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23603 and references
> therein) which heuristically corrects for some flaws in the truncated
> Grant&Pickup calculations.
>
> If you want a fast GPU-accelerated version, then forget about actually
> applying the algorithm directly[*]. Instead, to compare a reference
> molecule A to a database molecule B, precompute a grid over A containing
> the pairwise overlap value of an atom at each point in the grid with A. You
> can then compute the shape overlap for a given orientation of B by a simple
> 3D texture lookup rather than faffing around trying to compute exponential
> functions.. This is simplified by assuming that all atoms have the same
> atomic radius and neglecting hydrogens (we?re going for speed over accuracy
> here, remember?) You can get a similar lookup texture for gradients, I
> think. One thing GPUs are really good at is texture lookups and
> interpolation. They?re less good at evaluating exponential functions. Your
> GPU algorithm is then a massively parallel CG or NR optimiser with the
> objective function computing shape overlap values for as many molecules as
> you can cram into GPU memory all in parallel.
>
> [*] gWEGA (I believe) is a GPU-accelerated version of the standard WEGA
> algorithm and based on the published timings is an order of magnitude or
> more slower than fastROCS
>
> Having said all of that, our GPU-accelerated shape similarity function
> just brute forces through the overlap series to sixth order, as (a) my
> happy place is on the accuracy side of the speed/accuracy tradeoff, and (b)
> our electrostatic similarity calculations are sufficiently complex that
> making the shape function faster wouldn?t be that much of a net win. As a
> result, take all of the above with a grain of salt ?.
>
> Regards,
> Mark
>
> --
> Mark Mackey
> Chief Scientific Officer
> Cresset
> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8
> 0SS,

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 157, Issue 2

2020-11-04 Thread Mark Mackey
Hi Chris,

I haven't looked at Optipharm: from a quick read through the paper it's 
basically WEGA but with a different optimiser on the front. Looks like an 
interesting idea and they seem to have a way of extending it to electrostatics 
as well 
(https://chemrxiv.org/articles/preprint/Optimizing_Electrostatic_Similarity_for_Virtual_Screening_A_New_Methodology/10044272/1).
 There's no code, so from an RDKit perspective we'd be reimplementing it from 
the description in the paper.

I did look at Shape-It way back when Silicos open-sourced it: as far as I can 
remember the code looked clean enough but it was slow. Unfortunately from the 
RDKit point of view it's LGPL so can't be used as the basis of an RDKit shape 
algorithm.

Regards,
Mark

From: Chris Swain 
Sent: 04 November 2020 15:56
To: rdkit-discuss@lists.sourceforge.net; Mark Mackey 
Subject: Re: Rdkit-discuss Digest, Vol 157, Issue 2

Hi Mark,

Have you ever looked at Optipharm for shape comparison?

https://www.nature.com/articles/s41598-018-37908-6

Or Shape-it

http://silicos-it.be.s3-websiteu-west-1.amazonaws.com/software/shape-it/1.0.1/shape-it.html


Cheers

Chris




On 4 Nov 2020, at 14:28, 
rdkit-discuss-requ...@lists.sourceforge.net
 wrote:

From: Mark Mackey mailto:m...@cresset-group.com>>
To: Lewis Martin mailto:lewis.marti...@gmail.com>>, 
RDKit Discuss
  
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] GPU Implementation of shape-based 3D
  overlap on rdkit?
Message-ID:
mailto:dbbpr08mb4235128b45e0f546acfc5adb97...@dbbpr08mb4235.eurprd08.prod.outlook.com>>

Content-Type: text/plain; charset="utf-8"

Hi Lewis,

The standard shape alignment algorithm that everyone uses is from Grant & 
Pickup 1996 
(https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291096-987X%2819961115%2917%3A14%3C1653%3A%3AAID-JCC7%3E3.0.CO%3B2-K).

It?s a Taylor-series-like expansion using spherical Gaussians as stand-ins for 
hard spheres - you take the atomic volumes, subtract off the pairwise overlaps, 
add back in the three-way overlaps, subtract off the four-way overlaps, and so 
on. I did a fair few tests some years back and you really need to go to 6 terms 
to get decent accuracy. However, all of the commercial algorithms (ROCS, Phase 
Shape, etc) seem to truncate at 2, so go figure. OTOH the ?high throughput? 
versions all seem to be operated with ludicrously low number of conformations 
so the error in incomplete coverage of conformer space dwarfs the 5% noise that 
you get from truncating at 2 terms rather than 6.

If you want something slightly more accurate at the same computational cost, 
look at WEGA (https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23603 and 
references therein) which heuristically corrects for some flaws in the 
truncated Grant&Pickup calculations.

If you want a fast GPU-accelerated version, then forget about actually applying 
the algorithm directly[*]. Instead, to compare a reference molecule A to a 
database molecule B, precompute a grid over A containing the pairwise overlap 
value of an atom at each point in the grid with A. You can then compute the 
shape overlap for a given orientation of B by a simple 3D texture lookup rather 
than faffing around trying to compute exponential functions.. This is 
simplified by assuming that all atoms have the same atomic radius and 
neglecting hydrogens (we?re going for speed over accuracy here, remember?) You 
can get a similar lookup texture for gradients, I think. One thing GPUs are 
really good at is texture lookups and interpolation. They?re less good at 
evaluating exponential functions. Your GPU algorithm is then a massively 
parallel CG or NR optimiser with the objective function computing shape overlap 
values for as many molecules as you can cram into GPU memory all in parallel.

[*] gWEGA (I believe) is a GPU-accelerated version of the standard WEGA 
algorithm and based on the published timings is an order of magnitude or more 
slower than fastROCS

Having said all of that, our GPU-accelerated shape similarity function just 
brute forces through the overlap series to sixth order, as (a) my happy place 
is on the accuracy side of the speed/accuracy tradeoff, and (b) our 
electrostatic similarity calculations are sufficiently complex that making the 
shape function faster wouldn?t be that much of a net win. As a result, take all 
of the above with a grain of salt ?.

Regards,
Mark

--
Mark Mackey
Chief Scientific Officer
Cresset
New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK
tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223 853667
email: 
m...@cresset-group.com
web: 
www.cresset-group.com
skype: mark_cresset

___
Rdkit-discuss mailing list
Rdkit

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 157, Issue 2

2020-11-04 Thread Chris Swain via Rdkit-discuss
Hi Mark,

Have you ever looked at Optipharm for shape comparison?

https://www.nature.com/articles/s41598-018-37908-6 


Or Shape-it

http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/shape-it/1.0.1/shape-it.html


Cheers

Chris



> On 4 Nov 2020, at 14:28, rdkit-discuss-requ...@lists.sourceforge.net wrote:
> 
> From: Mark Mackey mailto:m...@cresset-group.com>>
> To: Lewis Martin  >, RDKit Discuss
>>
> Subject: Re: [Rdkit-discuss] GPU Implementation of shape-based 3D
>   overlap on rdkit?
> Message-ID:
>   
>   
> >
>   
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Lewis,
> 
> The standard shape alignment algorithm that everyone uses is from Grant & 
> Pickup 1996 
> (https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291096-987X%2819961115%2917%3A14%3C1653%3A%3AAID-JCC7%3E3.0.CO%3B2-K
>  
> ).
> 
> It?s a Taylor-series-like expansion using spherical Gaussians as stand-ins 
> for hard spheres - you take the atomic volumes, subtract off the pairwise 
> overlaps, add back in the three-way overlaps, subtract off the four-way 
> overlaps, and so on. I did a fair few tests some years back and you really 
> need to go to 6 terms to get decent accuracy. However, all of the commercial 
> algorithms (ROCS, Phase Shape, etc) seem to truncate at 2, so go figure. OTOH 
> the ?high throughput? versions all seem to be operated with ludicrously low 
> number of conformations so the error in incomplete coverage of conformer 
> space dwarfs the 5% noise that you get from truncating at 2 terms rather than 
> 6.
> 
> If you want something slightly more accurate at the same computational cost, 
> look at WEGA (https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.23603 
>  and references 
> therein) which heuristically corrects for some flaws in the truncated 
> Grant&Pickup calculations.
> 
> If you want a fast GPU-accelerated version, then forget about actually 
> applying the algorithm directly[*]. Instead, to compare a reference molecule 
> A to a database molecule B, precompute a grid over A containing the pairwise 
> overlap value of an atom at each point in the grid with A. You can then 
> compute the shape overlap for a given orientation of B by a simple 3D texture 
> lookup rather than faffing around trying to compute exponential functions.. 
> This is simplified by assuming that all atoms have the same atomic radius and 
> neglecting hydrogens (we?re going for speed over accuracy here, remember?) 
> You can get a similar lookup texture for gradients, I think. One thing GPUs 
> are really good at is texture lookups and interpolation. They?re less good at 
> evaluating exponential functions. Your GPU algorithm is then a massively 
> parallel CG or NR optimiser with the objective function computing shape 
> overlap values for as many molecules as you can cram into GPU memory all in 
> parallel.
> 
> [*] gWEGA (I believe) is a GPU-accelerated version of the standard WEGA 
> algorithm and based on the published timings is an order of magnitude or more 
> slower than fastROCS
> 
> Having said all of that, our GPU-accelerated shape similarity function just 
> brute forces through the overlap series to sixth order, as (a) my happy place 
> is on the accuracy side of the speed/accuracy tradeoff, and (b) our 
> electrostatic similarity calculations are sufficiently complex that making 
> the shape function faster wouldn?t be that much of a net win. As a result, 
> take all of the above with a grain of salt ?.
> 
> Regards,
> Mark
> 
> --
> Mark Mackey
> Chief Scientific Officer
> Cresset
> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, 
> UK
> tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223 
> 853667
> email: m...@cresset-group.com 
>  >web: www.cresset-group.com 
>  >skype: mark_cresset

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit-cartridge: Inserting new molecules

2020-10-26 Thread Brian Cole
Hi Thomas,

It's possible to use TEMPORARY TABLE for this purpose in a single
transaction. This is the scheme we use in order to convert the input
application SMILES into a canonicalized RDKit SMILES. We keep the RDKit
canonical SMILES around in the table for exact isomer look ups, but this
lets us throw away the input application SMILES. This scheme also lets us
bulk insert many rows at a time in order to gain decent insertion
performance.

BEGIN;
CREATE TEMPORARY TABLE temp_input_smiles ON COMMIT DROP AS TABLE
input_smiles WITH NO DATA;
CREATE TRIGGER canonicalize_new_smiles_trigger BEFORE INSERT OR UPDATE ON
temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE canonicalize_new_smiles();
CREATE TRIGGER propagate_new_smiles_tags_trigger AFTER INSERT OR UPDATE ON
temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE propagate_new_smiles();
ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER
canonicalize_new_smiles_trigger;
ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER
propagate_new_smiles_tags_trigger;
-- INSERT many SMILES data into temp_input_smiles --
COMMIT;

CREATE OR REPLACE FUNCTION canonicalize_new_smiles()
RETURNS TRIGGER AS $$
BEGIN
  IF (NEW.rdkit_smiles IS NULL) THEN
NEW.rdkit_smiles := mol_to_smiles(NEW.smiles::mol);
  END IF;
  RETURN NEW;
EXCEPTION WHEN SQLSTATE '22000' THEN -- 22000 is 'data exception'


  NEW.rdkit_smiles := NULL;
  RETURN NEW;
END;
$$ language 'plpgsql';

CREATE OR REPLACE FUNCTION propagate_new_smiles()
RETURNS TRIGGER AS $$
DECLARE tag_val jsonb;
DECLARE offsets jsonb;
BEGIN
  IF (NEW.rdkit_smiles IS NOT NULL) THEN
-- INSERT NEW.rdkit_smiles::mol into production table --
  END IF;
  RETURN NEW;
END;
$$ language 'plpgsql';

-Brian

On Mon, Oct 26, 2020 at 1:45 AM Thomas Strunz  wrote:

> Dear community,
>
> I was wondering on how to best insert molecules into a mol field. The
> documentation only show how to insert from a preexisting table with a
> smiles column and then use "mol_to_smiles".
>
> How can a molecule be inserted directly? eg what format need to be
> submitted?
>
> Second point is how to make the insertion simple so that not all
> applications connecting to the DB need to be chemically aware (have rdkit
> available). I have played around with simply having a smiles column and mol
> field column and then use a trigger function to convert the smiles to a
> mol. But this duplicates all the data (and even more wasteful with ctab
> file). Is it possible to not duplicate the data and be able to insert
> smiles/ctab directly?
>
> Best Regards,
>
> Thomas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit-cartridge: Inserting new molecules

2020-10-25 Thread Thomas Strunz
Dear community,

I was wondering on how to best insert molecules into a mol field. The 
documentation only show how to insert from a preexisting table with a smiles 
column and then use "mol_to_smiles".

How can a molecule be inserted directly? eg what format need to be submitted?

Second point is how to make the insertion simple so that not all applications 
connecting to the DB need to be chemically aware (have rdkit available). I have 
played around with simply having a smiles column and mol field column and then 
use a trigger function to convert the smiles to a mol. But this duplicates all 
the data (and even more wasteful with ctab file). Is it possible to not 
duplicate the data and be able to insert smiles/ctab directly?

Best Regards,

Thomas
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Andrew Dalke
On Sep 8, 2020, at 14:30, Mike Mazanetz  wrote:

> Does anyone know whether it’s possible to obtain not just a fingerprint keys 
> for MACCS (binary values) but the number of occurrences of the keys, 
> particularly these details:

The SMARTS patterns for most of the MACCS keys is available by:

>>> from rdkit.Chem import MACCSkeys
>>> for key, smarts in MACCSkeys.smartsPatts.items():
...   print("[%s] %s" % (key, smarts))
...
[1] ('?', 0)
[2] ('[#104]', 0)
[3] ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0)
[4] ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0)
[5] ('[Sc,Ti,Y,Zr,Hf]', 0)
[6] ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0)
 ...

There are two parts to the right-hand-side: SMARTS pattern and count.

If the SMARTS pattern is a "?", that means the pattern is not defined at the 
SMARTS level.

There must be at least count+1 matches. That is, if the count is 0 then there 
must be at least one match.

You write "the number of occurrences of the keys".

I don't know how that makes sense for all the keys. You have things like:

140: (key(164)-3 if key(164)>3; else 0)
141: (key(160)-2 if key(160)>2; else 0)
142: (key(161)-2 if key(161)>1; else 0)

These correspond to RDKit's definitions:

[140] ('[#8]', 3)
[141] ('[CH3]', 2)
[142] ('[#7]', 1)

How do you count those number of occurrences?


> On Sep 8, 2020, at 21:56, Mike Mazanetz  wrote:
>  The KNIME node does a lot of double counting for the RDKit Substructure 
> Counter, so it’s not a useful tool for counting MACCS keys.

Something like [11] ('*1~*~*~*~1', 0) has many matches due to symmetry.

You have to decide if you think this should be counted once, or if all 8 
matches should be counted. 

The molecule method 'GetSubstructMatches()' has a uniquify option; by default 
it only returns unique counts. ("Unique" is based on unique atoms, not unique 
atoms and bonds. I don't think that distinction affect the MACCS patterns.)

Regards,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Paolo Tosco
 heavy atoms
> 103: #chlorine atoms
> 104: #hets. 2 bonds from a CH2
> 105: #hets. ring bonded to a 3-ring bond X
> 106: #X bonded to >= 3 non-C
> 107: #XQ>3 bonded to at least 1 halogen
> 108: #CH3 4 bonds from a CH2
> 109: #O attached to CH2
> 110: #O 1 C from an N
> 111: #N 2 bonds from a CH2
> 112: #atoms with coordination number >= 4
> 113: #O in non-aromatic bonds to an [a]
> 114: #CH3 attached to CH2
> 115: #CH3 2 bonds from a CH2
> 116: #CH3 3 bonds from a CH2
> 117: #N 2 bonds from an O
> 118: (key(147)-1 if key(147)>1; else 0)
> 119: #N in double bonds
> 120: (key(137)-1 if key(137)>1; else 0)
> 121: #N in rings
> 122: #N with coordination number >=3
> 123: #O separated by 1 C
> 124: #het-het bonds
> 125: Is # AROMATIC RING > 1?
> 126: #non-ring O bonded to 2 heavy atoms
> 127: (key(143)-1 if key(143)>1; else 0)
> 128: #CH2s separated by 4 bonds
> 129: #CH2s separated by 3 bonds
> 130: (key(124)-1 if key(124)>1; else 0)
> 131: (# het atoms with H)
> 132: #O 2 bonds from CH2
> 133: #N non-ring bonded to a ring
> 134: #halogens
> 135: #N in a non-aromatic bond with [a]
> 136: Bit: is there more than 1 O=
> 137: Total # ring HETEROCYCLE atoms
> 138: (key(153)-1 if key(153)>1; else 0)
> 139: #OH groups
> 140: (key(164)-3 if key(164)>3; else 0)
> 141: (key(160)-2 if key(160)>2; else 0)
> 142: (key(161)-2 if key(161)>1; else 0)
> 143: #non ring O connected to a ring
> 144: #atoms separated by (!:):(!:)
> 145: #6M RING > 1
> 146: Key(164)-2 if key(164)>2; else 0
> 147: #CH2 attached to CH2
> 148: #non-C with coordination number >=3
> 149: (key(160)-1 if key(160)>1; else 0)
> 150: #X separated by (!r)-r-(!r)
> 151: #NH
> 152: #C bonded to >=2 C and 1 O
> 153: #non-carbons attached to CH2
> 154: #O in C=O
> 155: #non-ring CH2
> 156: #XN where coord. # of X>=3
> 157: #O in C-O single bonds
> 158: #N in C-N single bonds
> 159: Key(164)-1 if key(164)>1; else 0
> 160: #CH3 groups
> 161: #N
> 162: #aromatics
> 163: #atoms in 6 rings
> 164: #oxygens
> 165: #ring atoms
> 166: Is there more than 1 fragment?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Mike Mazanetz
Hi,

 

On second thoughts. The KNIME node does a lot of double counting for the
RDKit Substructure Counter, so it's not a useful tool for counting MACCS
keys.

 

Anyone got any better ideas?

 

Cheers,

mike

 

From: Mike Mazanetz  
Sent: 08 September 2020 18:42
To: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] MACCS keys

 

Hi folks,

 

I found that I can always use the KNIME nodes to count these, so no need to
reply.

 

Best,

mike

 

From: Mike Mazanetz  
Sent: 08 September 2020 13:30
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] MACCS keys

 

Hello Forum,

Does anyone know whether it's possible to obtain not just a fingerprint keys
for MACCS (binary values) but the number of occurrences of the keys,
particularly these details:

 

Thanks,

mike

 

1: #isotopes
2: #atoms with atomic number > 103
3: #group IVA, VA and VIA periods 4-6
4: #Actinides
5: #group IIIB, IVB elements
6: #Lanthanides
7: #group VB, VIB, VIIB elements
8: #heteroatoms in 4-membered rings
9: #group VIIIB elements
10: #alkaline earth elements
11: #atoms in 4 ring
12: #group IB, IIB elements
13: #N connected to 1 O and 2 C
14: #S atoms in S-S groups
15: #C connected to 3 O
16: #heteroatoms in 3-membered rings
17: #C in CC triple bonds
18: #group IIIA elements
19: #atoms in 7 ring
20: #silicon atoms
21: #C = bonded to C and 3 heavy atoms
22: #atoms in 3 ring
23: #C bonded 1 N and 2 O
24: #O-N single bonds
25: #C bonded to at least 3 N atoms
26: #C in 3 ring bonds and a double bond
27: #iodine atoms
28: #XCH2X, where X<>C
29: #phosphorous atoms
30: #non-C Q4 bonded to >= 3 C
31: #halogens connected to non carbons
32: #S bonded to an N and a C
33: #S atoms bonded to N
34: #CH2= units
35: #alkali (group IA ) elements
36: #S atoms in rings
37: #C bonded to >= 1 O & >=2 N
38: #C bonded >= 2 N and 1 C
39: #S atoms bonded to 3 O
40: #S single bonded to OQ2
41: #N in C#N
42: #fluorine atoms
43: #X-H heteroatoms 2 bonds from another
44: #other elements
45: #N atoms adjacent to -C=C
46: #bromine atoms
47: #S two bonds from an N
48: #non C bonded to >= 3 O
49: #charged atoms
50: #C in C=C bonded to >= 3 C
51: #S bonded to a C and an O
52: #N bonded to N
53: #QH 4 bonds from another QH
54: #QH 3 bonds from another QH
55: #S bonded to >=2 O
56: #N bonded to >= 2O and >= 1 C
57: #O in rings
58: #S bonded to >=2 non-carbon atoms
59: #non-aromatic S-[a]
60: #[S+]-[O-]
61: #SQ3
62: #non-ring bonds that connect rings
63: #N atoms in double bonds with O
64: #non-ring S attached to a ring
65: #N in aromatic bonds with C
66: #CX4 bonded to >=3 carbons
67: #S attached to heteroatoms
68: #QH bonded to another QH
69: #QH bonded to another Q
70: #N bonded to two non-C heavy atoms
71: #N bonded to O
72: #O separated by 3 bonds
73: #S in double/charge separated bonds
74: #dimethyl substituted atoms
75: #N non-ring bonded to a ring
76: #C in C=C bonded to >= 3 heavy atoms
77: #N separated by 2 bonds
78: #N double bonded to C
79: #N separated by 3 bonds
80: #N separated by 4 bonds
81: #S attached to Q >= 3 atoms
82: #heteratoms attached to a CH2
83: #heteroatoms in 5 ring
84: #NH2 groups
85: #N bonded to >= 3 C
86: #CH2 or CH3 separated by non-C
87: #halogens bonded to any ring
88: #sulfurs
89: #O separated by 4 bonds
90: #het. 3 bonds from a CH2
91: #het. 4 bonds from a CH2
92: #C bonded to >=1 N, >=1 C & >= 1 O
93: #methylated heteroatoms
94: #N bonded to non C
95: #O 3 bonds from an N
96: #atoms in 5-rings
97: #O 4 bonds from an N
98: #het. in 6-ring
99: #C in C=C
100: #N attached to CH2
101: #atoms in 8-ring or higher
102: #O bonded to non C heavy atoms
103: #chlorine atoms
104: #hets. 2 bonds from a CH2
105: #hets. ring bonded to a 3-ring bond X
106: #X bonded to >= 3 non-C
107: #XQ>3 bonded to at least 1 halogen
108: #CH3 4 bonds from a CH2
109: #O attached to CH2
110: #O 1 C from an N
111: #N 2 bonds from a CH2
112: #atoms with coordination number >= 4
113: #O in non-aromatic bonds to an [a]
114: #CH3 attached to CH2
115: #CH3 2 bonds from a CH2
116: #CH3 3 bonds from a CH2
117: #N 2 bonds from an O
118: (key(147)-1 if key(147)>1; else 0)
119: #N in double bonds
120: (key(137)-1 if key(137)>1; else 0)
121: #N in rings
122: #N with coordination number >=3
123: #O separated by 1 C
124: #het-het bonds
125: Is # AROMATIC RING > 1?
126: #non-ring O bonded to 2 heavy atoms
127: (key(143)-1 if key(143)>1; else 0)
128: #CH2s separated by 4 bonds
129: #CH2s separated by 3 bonds
130: (key(124)-1 if key(124)>1; else 0)
131: (# het atoms with H)
132: #O 2 bonds from CH2
133: #N non-ring bonded to a ring
134: #halogens
135: #N in a non-aromatic bond with [a]
136: Bit: is there more than 1 O=
137: Total # ring HETEROCYCLE atoms
138: (key(153)-1 if key(153)>1; else 0)
139: #OH groups
140: (key(164)-3 if key(164)>3; else 0)
141: (key(160)-2 if key(160)>2; else 0)
142: (key(161)-2 if key(161)>1; else 0)
143: #non ring O connected to a ring
144: #atoms separated by (!:):(!:)
145: #6M RING > 1
146: Key(164)-2 if key(164)>2; else 

Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread Greg Landrum
M  V30 12 1 12 11
> M  V30 13 1 13 11
> M  V30 14 2 17 11
> M  V30 15 1 14 12
> M  V30 16 2 15 13
> M  V30 17 2 16 14
> M  V30 18 1 16 15
> M  V30 19 1 9 1
> M  V30 20 1 9 10
> M  V30 END BOND
> M  V30 END CTAB
> M  END
>
>
>
>
>
>
>
>
>
> On Thu, Aug 6, 2020 at 3:51 AM Greg Landrum 
> wrote:
>
>> Hi,
>>
>> Without seeing the SDF itself it's hard to be specific, but here's what
>> the error messages are telling you, in general:
>>
>> the first one normally indicates a badly formed record in the SDF. If you
>> look at around that line in the file you will, hopefully, see a misformed
>> record.
>> The next one, "Explicit valence" indicates that the molecule has an atom
>> (in this case an "O") that has the equivalent of three bonds to it. That's
>> not chemically reasonable, so the software complains
>> The error about "Alkyl" is self explanatory: there's a molecule in the
>> SDF which has an atom with symbol "Alkyl".
>> The rest are warnings.
>>
>> In order to provide more specific help, we'll need to see the SDF you're
>> using (or at least the SDF for the molecules that are failing) as well as
>> information about which version of the RDKit you're using.
>>
>> -greg
>>
>>
>>
>> On Wed, Aug 5, 2020 at 11:43 PM Pitanti Chalowa  wrote:
>>
>>> Respected Altruistic Researcher,
>>> While converting one sdf file to fingerprint, I am facing several errors.
>>>
>>> My code
>>>
>>> suppl = Chem.SDMolSupplier('1.sdf')for mol in suppl:
>>>   if mol is None: continue
>>>   # print(mol.GetNumAtoms())
>>>
>>> fps = [Chem.RDKFingerprint(x) for x in supply]
>>>
>>> I am facing many errors
>>>
>>> ERROR: Problems encountered parsing Mol data, M  END missing around line 
>>> 16739...
>>> ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted...
>>> ERROR: Could not sanitize molecule ending on line 78558...
>>> ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not foundRDKit 
>>> ERROR: Violation occurred on line 91 in file 
>>> /home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit
>>>  ERROR: Failed Expression: anum > -1
>>> ...
>>> WARNING: not removing hydrogen atom without neighbors
>>>
>>> RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn 
>>> valence 6.
>>>
>>> Please direct me to the references. How can I correct them?
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread Francois Berenger

On 07/08/2020 03:15, dmaziuk via Rdkit-discuss wrote:

On 8/6/2020 7:14 AM, Pitanti Chalowa wrote:
...


DTXCID601285170
   Mrv1805 05101813452D


Does it have to have a blank line after '' ?


No: having the molecule's name/identifier in there is quite standard.


Dima



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread dmaziuk via Rdkit-discuss

On 8/6/2020 7:14 AM, Pitanti Chalowa wrote:
...


DTXCID601285170
   Mrv1805 05101813452D


Does it have to have a blank line after '' ?

Dima



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread Pitanti Chalowa
a, M  END missing around line 
>> 16739...
>> ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted...
>> ERROR: Could not sanitize molecule ending on line 78558...
>> ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not foundRDKit 
>> ERROR: Violation occurred on line 91 in file 
>> /home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit
>>  ERROR: Failed Expression: anum > -1
>> ...
>> WARNING: not removing hydrogen atom without neighbors
>>
>> RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn 
>> valence 6.
>>
>> Please direct me to the references. How can I correct them?
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread Greg Landrum
Hi,

Without seeing the SDF itself it's hard to be specific, but here's what the
error messages are telling you, in general:

the first one normally indicates a badly formed record in the SDF. If you
look at around that line in the file you will, hopefully, see a misformed
record.
The next one, "Explicit valence" indicates that the molecule has an atom
(in this case an "O") that has the equivalent of three bonds to it. That's
not chemically reasonable, so the software complains
The error about "Alkyl" is self explanatory: there's a molecule in the SDF
which has an atom with symbol "Alkyl".
The rest are warnings.

In order to provide more specific help, we'll need to see the SDF you're
using (or at least the SDF for the molecules that are failing) as well as
information about which version of the RDKit you're using.

-greg



On Wed, Aug 5, 2020 at 11:43 PM Pitanti Chalowa  wrote:

> Respected Altruistic Researcher,
> While converting one sdf file to fingerprint, I am facing several errors.
>
> My code
>
> suppl = Chem.SDMolSupplier('1.sdf')for mol in suppl:
>   if mol is None: continue
>   # print(mol.GetNumAtoms())
>
> fps = [Chem.RDKFingerprint(x) for x in supply]
>
> I am facing many errors
>
> ERROR: Problems encountered parsing Mol data, M  END missing around line 
> 16739...
> ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted...
> ERROR: Could not sanitize molecule ending on line 78558...
> ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not foundRDKit 
> ERROR: Violation occurred on line 91 in file 
> /home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit
>  ERROR: Failed Expression: anum > -1
> ...
> WARNING: not removing hydrogen atom without neighbors
>
> RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn 
> valence 6.
>
> Please direct me to the references. How can I correct them?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-05 Thread Pitanti Chalowa
Respected Altruistic Researcher,
While converting one sdf file to fingerprint, I am facing several errors.

My code

suppl = Chem.SDMolSupplier('1.sdf')for mol in suppl:
  if mol is None: continue
  # print(mol.GetNumAtoms())

fps = [Chem.RDKFingerprint(x) for x in supply]

I am facing many errors

ERROR: Problems encountered parsing Mol data, M  END missing around
line 16739...
ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted...
ERROR: Could not sanitize molecule ending on line 78558...
ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not
foundRDKit ERROR: Violation occurred on line 91 in file
/home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit
ERROR: Failed Expression: anum > -1
...
WARNING: not removing hydrogen atom without neighbors

RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn
valence 6.

Please direct me to the references. How can I correct them?
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation problem

2020-08-03 Thread Scalfani, Vincent
Hi Sebastian,

As far as I can tell, the latest version available in conda is 2020.03.3. As 
Dave mentioned, I was able to get this version by specifying python 3.7:

More information on this method here:

http://rdkit.blogspot.com/2019/10/sharing-conda-environments.html

You can also specify the rdkit version instead:

https://github.com/rdkit/conda-rdkit/issues/84

Vin




From: Francois Berenger 
Sent: Sunday, August 2, 2020 8:32 PM
To: Sebastián J. Castro 
Cc: rdkit-discuss@lists.sourceforge.net 
Subject: Re: [Rdkit-discuss] RDKit installation problem

Dear Sebastian,

Since last week, you should also be able to install rdkit on Linux
via linuxbrew:

---
sudo apt install linuxbrew-wrapper
brew tap rdkit/rdkit
brew update
brew install rdkit

# to test it
/home/linuxbrew/.linuxbrew/bin/python3
import rdkit
---

Thanks to Nuri Jung on github (@jnooree) for proposing a fix
to the brew rdkit install formula.

Regards,
F.

On 02/08/2020 03:03, Sebastián J. Castro wrote:
> I have try the installation suggested at
> http://www.rdkit.org/docs/Install.html:
>
> $ conda create -c rdkit -n my-rdkit-env rdkit
>
> But I get 2017 version instead of 2020 (last released).
>
> I don't know how to install it. Can you help me?
>
> I have Ubuntu 20.04 LTS
>
> Thank you
>
> Best regards!
>
> --
>
> Dr. Sebastián J. Castro
> Departamento de Ciencias Farmacéuticas
> Facultad de Ciencias Químicas
> Universidad Nacional de Córdoba
> UNITEFA-CONICET
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation problem

2020-08-02 Thread Francois Berenger

Dear Sebastian,

Since last week, you should also be able to install rdkit on Linux
via linuxbrew:

---
sudo apt install linuxbrew-wrapper
brew tap rdkit/rdkit
brew update
brew install rdkit

# to test it
/home/linuxbrew/.linuxbrew/bin/python3
import rdkit
---

Thanks to Nuri Jung on github (@jnooree) for proposing a fix
to the brew rdkit install formula.

Regards,
F.

On 02/08/2020 03:03, Sebastián J. Castro wrote:

I have try the installation suggested at
http://www.rdkit.org/docs/Install.html:

$ conda create -c rdkit -n my-rdkit-env rdkit

But I get 2017 version instead of 2020 (last released).

I don't know how to install it. Can you help me?

I have Ubuntu 20.04 LTS

Thank you

Best regards!

--

Dr. Sebastián J. Castro
Departamento de Ciencias Farmacéuticas
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
UNITEFA-CONICET
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation problem

2020-08-02 Thread Lukas Pravda
Hi Sebastian,

 

Quickly looking at the available builds in the rdkit conda channel 
(https://anaconda.org/rdkit/rdkit) it appears that you are pulling windows 
32-bit version of rdkit. Perhaps this is caused by the fact that you use 32bit 
version of conda? Try installing 64-bit version of conda and pull again.

 

Best,

Lukas

From: "Sebastián J. Castro" 
Date: Saturday, 1 August 2020 at 20:29
To: 
Subject: [Rdkit-discuss] RDKit installation problem

 

I have try the installation suggested at http://www.rdkit.org/docs/Install.html:

 

$ conda create -c rdkit -n my-rdkit-env rdkit
But I get 2017 version instead of 2020 (last released).

 

I don't know how to install it. Can you help me?

 

I have Ubuntu 20.04 LTS

 

Thank you

 

Best regards!

 

-- 

Dr. Sebastián J. Castro

Departamento de Ciencias Farmacéuticas

Facultad de Ciencias Químicas

Universidad Nacional de Córdoba

UNITEFA-CONICET

___ Rdkit-discuss mailing list 
Rdkit-discuss@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation problem

2020-08-01 Thread David Cosgrove
That happened to me, too, recently. I think I solved it by specifying the
python version explicitly (-python==3.7 IIRC).

Hope that helps,
Dave


On Sat, 1 Aug 2020 at 19:30, Sebastián J. Castro  wrote:

> I have try the installation suggested at
> http://www.rdkit.org/docs/Install.html:
>
> $ conda create -c rdkit -n my-rdkit-env rdkit
>
> But I get 2017 version instead of 2020 (last released).
>
> I don't know how to install it. Can you help me?
>
> I have Ubuntu 20.04 LTS
>
> Thank you
>
> Best regards!
>
>
> --
> Dr. Sebastián J. Castro
> Departamento de Ciencias Farmacéuticas
> Facultad de Ciencias Químicas
> Universidad Nacional de Córdoba
> UNITEFA-CONICET
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit installation problem

2020-08-01 Thread Sebastián J . Castro
I have try the installation suggested at
http://www.rdkit.org/docs/Install.html:

$ conda create -c rdkit -n my-rdkit-env rdkit

But I get 2017 version instead of 2020 (last released).

I don't know how to install it. Can you help me?

I have Ubuntu 20.04 LTS

Thank you

Best regards!

-- 
Dr. Sebastián J. Castro
Departamento de Ciencias Farmacéuticas
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
UNITEFA-CONICET
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit/tautomers

2020-07-22 Thread Da'Adoosh Binyamin
Thanks for your helpful answer. I learned a lot.

I have few more questions:

1. How do you achieve non-standard InChI? Is it available in RDKit?

2. What are the 15T and KET options?

3. Is your solution cannot be systematic? As a systematic solution I tried:

enumerator = rdMolStandardize.TautomerEnumerator()

for smi in my_smi_list:
m = Chem.MolFromSmiles(smi)
m = enumerator.Canonicalize(m)
inchi = Chem.rdinchi.MolToInchi(m)

The problem with this solution was that with very big molecules (for example, 
macrocycles) I have 'MemoryError'.

4. In another case (not for tautomers), I can't understand if the InChI output 
is correct or not:

C[N+]1=C(\C=C\C2=CNC=C2)C=CC2=CC=CC=C12
C[N+]1=C(\C=C/C2=CNC=C2)C=CC2=CC=CC=C12

Usually, when I enter two E/Z stereoisomers - I have two different InChIs (and 
the difference is in the the /b or /t layers, as should be). However, this time 
(both in RDKit and OpenBabel) I have:

InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1
InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1

Only if I remove the charge (hydrogen instead of carbon on the methylquinoline) 
or modify the pyrrole group on the other side, it gives me different InChI. Why?

Thanks a lot,
Benny


From: Markus Sitzmann [mailto:markus.sitzm...@gmail.com]
Sent: Tuesday, July 21, 2020 2:47 PM
To: Da'Adoosh Binyamin 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] RDKit/tautomers

Hi Benny,

that is a pure InChI problem (not a RDKit one). Back then when the Standard 
InChI was defined, the 15T and the KET option for the InChI calculation weren't 
either available or still experimental (I don't remember :-)), so they didn't 
make it into the standard set of options for the Standard InChI calculation. 
Hence it isn't too surprising that this tautomer pair doesn't calculate the 
same Standard InChI (InChI isn't/wasn't particularly strong regarding 
tautomerism outside rings). You might use (non-standard) InChI and switch the 
15T and KET options on, that should fix your particular case.

In general there are still ongoing efforts to make InChI stronger regarding 
tautomerism: https://pubmed.ncbi.nlm.nih.gov/32043883/

Markus


On Tue, Jul 21, 2020 at 12:11 PM Da'Adoosh Binyamin 
mailto:daado...@tauex.tau.ac.il>> wrote:
Hi,

I have a question about RDKit/tautomers.

Let's say I have smiles input:

C[CH]2CCC(=O)C1=C(O)[CH](O)C[CH](O)[CH]12
C[CH]2CCC(O)=C1C(=O)[CH](O)C[CH](O)[CH]12

Now, if I make this code for each input:

m = Chem.MolFromSmiles(input)
inchi = Chem.rdinchi.MolToInchi(m)

I get different InChIs:

InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,13-15H,2-4H2,1H3
InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,12-14H,2-4H2,1H3

My question is why is it happening. Usually if I enter two tautomers - they 
have the same InChI (like it is supposed to be, according to the literature ). 
What is the difference in this example?

Thanks,
Benny

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation for C++

2020-07-21 Thread Alan Kerstjens Medina
Hi Leon,

While I don’t recommend this because of how “hacky” it is, on Linux it's 
possible to use the headers and library files that come with the Anaconda 
distribution. To do so you could point the compiler and linker to the files 
within your RDKit Anaconda environment. If you use GCC this means adding the 
following arguments to your command line (adjust the file path to your Anaconda 
environment directory):

-I/home/user/anaconda3/envs/rdkit/include/rdkit/ 
-L/home/user/anaconda3/envs/rdkit/lib/

However, if you already have some of the RDKit’s dependencies installed in your 
system, you will have to be extra careful to avoid conflicts between different 
versions of them, since the RDKit's dependencies’ library files are also in the 
same directory. Additionally, as far as I’m aware, this won’t work on Windows 
because the Windows Anaconda distribution doesn’t ship with all the headers and 
libraries.

Given how common questions and issues regarding building the RDKit are I think 
it would be very helpful to create a C++ RDKit distribution within some package 
manager. I personally like vcpkg (https://github.com/microsoft/vcpkg). It’s 
open-source, cross-platform and, much like Anaconda, you can download and 
install a package of interest and all its dependencies with a single command. I 
believe it also builds on CMake, so creating a port for the RDKit should be 
possible.

Best regards,
Alan

From: topgunhaides<mailto:sunzhi@gmail.com>
Sent: 21 July 2020 18:43
To: RDKit Discuss<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] RDKit installation for C++

Hello guys,

Working on transferring my entire RDKit Python code to C++. Before testing new 
code, I need to make it working with C++...

Not experienced with it, so I got a couple of questions about the installation:
1. My Conda RDKit works well with Python on Linux. Do I have to install a new 
"C++ version" of RDKit? If not, how can I link it to the current Conda RDKit?
2. If I need to reinstall it for C++ purpose, do I have to build it from 
source? I have my cmake and gcc up-to-date on linux.
3. I can try the Windows Installation using RDKit binaries, but same question 
here: does it work fully with C++?

Can anyone offer some help? Thanks a lot!

Best,
Leon


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation for C++

2020-07-21 Thread Greg Landrum
Hi Leon,

On Tue, Jul 21, 2020 at 6:43 PM topgunhaides  wrote:

>
> Working on transferring my entire RDKit Python code to C++. Before testing
> new code, I need to make it working with C++...
>
> Not experienced with it, so I got a couple of questions about the
> installation:
> 1. My Conda RDKit works well with Python on Linux. Do I have to install a
> new "C++ version" of RDKit? If not, how can I link it to the current Conda
> RDKit?
>

You need to build the RDKit from scratch yourself. We haven't done an RDKit
conda package that's useable for development (though we could... it's worth
thinking about)


> 2. If I need to reinstall it for C++ purpose, do I have to build it from
> source? I have my cmake and gcc up-to-date on linux.
>

yes,  you need to install from source, there's some info about how to do
this in the documentation that will hopefully be helpful.
Note that if you have a conda environment available, you can get all
the prerequisites you need (like boost, cairo, eigen) from conda.


> 3. I can try the Windows Installation using RDKit binaries, but same
> question here: does it work fully with C++?
>

For windows you will also need to build yourself.

-greg



>
> Can anyone offer some help? Thanks a lot!
>
> Best,
> Leon
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation for C++

2020-07-21 Thread dmaziuk via Rdkit-discuss

On 7/21/2020 11:41 AM, topgunhaides wrote:

Hello guys,

Working on transferring my entire RDKit Python code to C++. Before testing
new code, I need to make it working with C++...


Law of The Hammer corollary: if your tool is C++, every problem looks 
like a thumb.


Dima



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit installation for C++

2020-07-21 Thread topgunhaides
Hello guys,

Working on transferring my entire RDKit Python code to C++. Before testing
new code, I need to make it working with C++...

Not experienced with it, so I got a couple of questions about the
installation:
1. My Conda RDKit works well with Python on Linux. Do I have to install a
new "C++ version" of RDKit? If not, how can I link it to the current Conda
RDKit?
2. If I need to reinstall it for C++ purpose, do I have to build it from
source? I have my cmake and gcc up-to-date on linux.
3. I can try the Windows Installation using RDKit binaries, but same
question here: does it work fully with C++?

Can anyone offer some help? Thanks a lot!

Best,
Leon
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit/tautomers

2020-07-21 Thread Markus Sitzmann
Hi Benny,

that is a pure InChI problem (not a RDKit one). Back then when the Standard
InChI was defined, the 15T and the KET option for the InChI calculation
weren't either available or still experimental (I don't remember :-)), so
they didn't make it into the standard set of options for the Standard InChI
calculation. Hence it isn't too surprising that this tautomer pair doesn't
calculate the same Standard InChI (InChI isn't/wasn't particularly strong
regarding tautomerism outside rings). You might use (non-standard) InChI
and switch the 15T and KET options on, that should fix your particular case.

In general there are still ongoing efforts to make InChI stronger regarding
tautomerism: https://pubmed.ncbi.nlm.nih.gov/32043883/

Markus


On Tue, Jul 21, 2020 at 12:11 PM Da'Adoosh Binyamin <
daado...@tauex.tau.ac.il> wrote:

> Hi,
>
>
>
> I have a question about RDKit/tautomers.
>
>
>
> Let's say I have smiles input:
>
>
>
> C[CH]2CCC(=O)C1=C(O)[CH](O)C[CH](O)[CH]12
>
> C[CH]2CCC(O)=C1C(=O)[CH](O)C[CH](O)[CH]12
>
>
>
> Now, if I make this code for each input:
>
>
>
> m = Chem.MolFromSmiles(input)
>
> inchi = Chem.rdinchi.MolToInchi(m)
>
>
>
> I get different InChIs:
>
>
>
>
> InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,13-15H,2-4H2,1H3
>
>
> InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,12-14H,2-4H2,1H3
>
>
>
> My question is why is it happening. Usually if I enter two tautomers -
> they have the same InChI (like it is supposed to be, according to the
> literature ). What is the difference in this example?
>
>
>
> Thanks,
>
> Benny
>
>
> _______
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit/tautomers

2020-07-21 Thread Da'Adoosh Binyamin
Hi,

I have a question about RDKit/tautomers.

Let's say I have smiles input:

C[CH]2CCC(=O)C1=C(O)[CH](O)C[CH](O)[CH]12
C[CH]2CCC(O)=C1C(=O)[CH](O)C[CH](O)[CH]12

Now, if I make this code for each input:

m = Chem.MolFromSmiles(input)
inchi = Chem.rdinchi.MolToInchi(m)

I get different InChIs:

InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,13-15H,2-4H2,1H3
InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,12-14H,2-4H2,1H3

My question is why is it happening. Usually if I enter two tautomers - they 
have the same InChI (like it is supposed to be, according to the literature ). 
What is the difference in this example?

Thanks,
Benny

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit: get reasonable conformation of a small molecule with tri-phosphate from a smiles

2020-06-22 Thread Ming Hao
Hello,

Recently, I countered such a question.

I can not get the good conformation for a small molecule with
tri-phosphate, like below:
https://www.rcsb.org/ligand/ET9

 NC1=NC(=O)c2ncn([C@H]3C[C@H](O)[C@
@H](CO[P](O)(=O)O[P](O)(=O)O[P](O)(O)=O)C3=C)c2N1

The most difficult part I think is that it includes the tri-phosphate (too
flexible).

Can RDKit handle small molecules with tri-phosphate and get reasonable
conformations?

Thanks,
Ming
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit SDMolSupplier stumbles over "M SAP" records::

2020-05-14 Thread Greg Landrum
Hi Thomas,

This is actually fixed on master already (Roger sent the patch in April:
https://github.com/rdkit/rdkit/pull/3072) The fix was supposed to be in the
2020.03.2 release, and it's in the release notes, but it looks like I
forgot to actually merge in the changes over onto the release branch.

I will make sure this makes it into the 2020.03.3 release, which I will
hopefully do next week.

Sorry about the inconvenience.
-greg


On Thu, May 14, 2020 at 5:49 PM  wrote:

> Hi,
>
>
>
> I need to process a number of sd files containing peptides with rdkit.
> Using the SDMolSupplier, I get the following error message when trying to
> read in (some of) my peptides:
>
>
>
> 
>
> Pre-condition Violation
>
> atom bookmark not found
>
> Violation occurred on line 195 in file
> /home/conda/feedstock_root/build_artifacts/rdkit_1557999846784/work/Code/GraphMol/ROMol.cpp
>
> Failed Expression: d_atomBookmarks.count(mark) != 0
>
> 
>
>
>
> [17:10:58] Unexpected error hit on line 36
>
> [17:10:58] ERROR: moving to the begining of the next molecule
>
>
>
> I traced this to “M SAP” records in my input file – is there a way to tell
> rdkit to ignore these records? A temporary workaround is to preprocess the
> sdfiles and remove these lines, but there might be a time I need this info,
> so Im hesitant to throw it away.
>
>
>
> I have attached an example dipeptide, GA.sdf; following the code snippet I
> used:
>
>
>
> from rdkit import Chem
>
>
>
> sdfile = '/home/foxt/AG.sdf'
>
> suppl = Chem.SDMolSupplier(sdfile)
>
>
>
> for mol in suppl:
>
>
>
> if mol is None:
>
> continue
>
>
>
> molblock = Chem.MolToMolBlock(mol)
>
>
>
> print(molblock)
>
>
>
> Any ideas? Is this a feature? Or is this sth simply not implemented?
>
>
>
> Best,
>
> Th.
>
> Mit freundlichen Grüßen / Kind regards,
> Dr. Thomas Fox
>
> Boehringer Ingelheim Pharma GmbH & Co. KG
> Medicinal Chemistry
> Tel.: +49 (7351) 54-7585
> Fax: +49 (7351) 83-7585
> mailto:thomas@boehringer-ingelheim.com
> 
>
> Pflichtangaben finden Sie unter: 
> https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
>
> Mandatory information can be found at:
> https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
>
> *Datenschutzhinweis*: Für bereits bestehende und neue
> Geschäftsbeziehungen nutzen wir personenbezogene Daten und werden diese für
> die Dauer unserer Geschäftsbeziehung aufbewahren. Während unserer
> Geschäftsbeziehung erheben wir unter Umständen Kontaktdaten, Daten zur
> Berufsqualifikation (Publikationen etc.). Einige Daten werden aus
> öffentlichen Quellen und Internetseiten bezogen. Rechtsgrundlage: Artikel 6
> (1) b) und f) EU DS-GVO. Klicken Sie *hier
> <https://www.boehringer-ingelheim.com/locations/europe>*, um weitere
> Informationen auf der lokalen Unternehmensinternetseite des betreffenden
> Landes über Datenschutz bei Boehringer Ingelheim und zu Ihren Rechten zu
> erhalten. Bitte beachten Sie, dass zusätzliche Datenschutzhinweise gelten
> können und alle diese Datenschutzhinweise von Zeit zu Zeit aktualisiert
> werden können.
>
> *P**rivacy Notice*: We use personal data for current and future business
> collaborations, and will retain such data for the duration of our business
> relationship. During the course of our business relationship we may collect
> contact data, data about professional qualifications (publications etc.).
> Some of the data is sourced from public sources and websites. Legal basis:
> Article 6 (1) b) and f) EU GDPR. Click *here
> <https://www.boehringer-ingelheim.com/locations/europe>* for more
> information on the local company website of the respective country about
> data protection at Boehringer Ingelheim and your rights. Please note that
> additional privacy notices may apply and that all these privacy notices
> might be updated from time to time.
>
> Diese E-Mail ist vertraulich zu behandeln. Sie kann besonderem rechtlichem
> Schutz unterliegen. Wenn Sie nicht der richtige Adressat sind, senden Sie
> bitte diese E-Mail an den Absender zurück, löschen die eingegangene E-Mail
> und geben den Inhalt der E-Mail nicht weiter. Jegliche unbefugte
> Bearbeitung, Nutzung, Vervielfältigung oder Verbreitung ist verboten. /
> This e-mail is confidential and may also be legally privileged. If you
> are not the intended recipient please reply to sender, delete the e-mail
> and do not disclose its contents to any person. Any unauthorized review,
> use, disclosure, copying or distribution is strictly prohibited.
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit SDMolSupplier stumbles over "M SAP" records::

2020-05-14 Thread thomas.fox
Hi,

I need to process a number of sd files containing peptides with rdkit. Using 
the SDMolSupplier, I get the following error message when trying to read in 
(some of) my peptides:


Pre-condition Violation
atom bookmark not found
Violation occurred on line 195 in file 
/home/conda/feedstock_root/build_artifacts/rdkit_1557999846784/work/Code/GraphMol/ROMol.cpp
Failed Expression: d_atomBookmarks.count(mark) != 0


[17:10:58] Unexpected error hit on line 36
[17:10:58] ERROR: moving to the begining of the next molecule

I traced this to "M SAP" records in my input file - is there a way to tell 
rdkit to ignore these records? A temporary workaround is to preprocess the 
sdfiles and remove these lines, but there might be a time I need this info, so 
Im hesitant to throw it away.

I have attached an example dipeptide, GA.sdf; following the code snippet I used:

from rdkit import Chem

sdfile = '/home/foxt/AG.sdf'
suppl = Chem.SDMolSupplier(sdfile)

for mol in suppl:

if mol is None:
continue

molblock = Chem.MolToMolBlock(mol)

print(molblock)

Any ideas? Is this a feature? Or is this sth simply not implemented?

Best,
Th.
Mit freundlichen Grüßen / Kind regards,
Dr. Thomas Fox

Boehringer Ingelheim Pharma GmbH & Co. KG
Medicinal Chemistry
Tel.: +49 (7351) 54-7585
Fax: +49 (7351) 83-7585
mailto:thomas@boehringer-ingelheim.com
Pflichtangaben finden Sie unter: 
https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
Mandatory information can be found at: 
https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
Datenschutzhinweis: Für bereits bestehende und neue Geschäftsbeziehungen nutzen 
wir personenbezogene Daten und werden diese für die Dauer unserer 
Geschäftsbeziehung aufbewahren. Während unserer Geschäftsbeziehung erheben wir 
unter Umständen Kontaktdaten, Daten zur Berufsqualifikation (Publikationen 
etc.). Einige Daten werden aus öffentlichen Quellen und Internetseiten bezogen. 
Rechtsgrundlage: Artikel 6 (1) b) und f) EU DS-GVO. Klicken Sie 
hier, um weitere 
Informationen auf der lokalen Unternehmensinternetseite des betreffenden Landes 
über Datenschutz bei Boehringer Ingelheim und zu Ihren Rechten zu erhalten. 
Bitte beachten Sie, dass zusätzliche Datenschutzhinweise gelten können und alle 
diese Datenschutzhinweise von Zeit zu Zeit aktualisiert werden können.
Privacy Notice: We use personal data for current and future business 
collaborations, and will retain such data for the duration of our business 
relationship. During the course of our business relationship we may collect 
contact data, data about professional qualifications (publications etc.). Some 
of the data is sourced from public sources and websites. Legal basis: Article 6 
(1) b) and f) EU GDPR. Click 
here for more 
information on the local company website of the respective country about data 
protection at Boehringer Ingelheim and your rights. Please note that additional 
privacy notices may apply and that all these privacy notices might be updated 
from time to time.
Diese E-Mail ist vertraulich zu behandeln. Sie kann besonderem rechtlichem 
Schutz unterliegen. Wenn Sie nicht der richtige Adressat sind, senden Sie bitte 
diese E-Mail an den Absender zurück, löschen die eingegangene E-Mail und geben 
den Inhalt der E-Mail nicht weiter. Jegliche unbefugte Bearbeitung, Nutzung, 
Vervielfältigung oder Verbreitung ist verboten. / This e-mail is confidential 
and may also be legally privileged. If you are not the intended recipient 
please reply to sender, delete the e-mail and do not disclose its contents to 
any person. Any unauthorized review, use, disclosure, copying or distribution 
is strictly prohibited.



AG.sdf
Description: AG.sdf
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [RDKit-discuss] rdchem.ResonanceMolSupplier cause segfault on some inputs

2020-04-29 Thread Paolo Tosco

Dear Victor,

I have tested this on the latest RDKit trunk code and it does not 
segfault; I believe this is the same bug described here:


https://github.com/rdkit/rdkit/issues/3048

and it is already fixed.

Cheers,
p.

On 29/04/2020 17:24, victor viterbo via Rdkit-discuss wrote:


 RDKit  3D

 23 25  0  0  0  0  0  0  0  0999 V2000
    0.3053    3.6659    0.0544 O   0  0  0  0  0  0  0  0  0  0 0 0
   -0.1542    2.5093    0.0748 N   0  0  0  0  0  0  0  0  0  0 0 0
   -1.5618    2.3423    0.0180 O   0  0  0  0  0  0  0  0  0  0 0 0
    0.4179    1.3007    0.1430 C   0  0  0  0  0  0  0  0  0  0 0 0
    1.7461    0.8529    0.2172 C   0  0  0  0  0  0  0  0  0  0 0 0
    1.9320   -0.5143    0.2726 C   0  0  0  0  0  0  0  0  0  0 0 0
    0.8664   -1.4626    0.2581 C   0  0  0  0  0  0  0  0  0  0 0 0
   -0.4354   -1.0479    0.1863 C   0  0  0  0  0  0  0  0  0  0 0 0
   -0.6211    0.3406    0.1300 C   0  0  0  0  0  0  0  0  0  0 0 0
   -1.8191    1.0230    0.0528 C   0  0  0  0  0  0  0  0  0  0 0 0
   -3.0437    0.2952    0.0215 C   0  0  0  0  0  0  0  0  0  0 0 0
   -2.9565   -1.0485    0.0711 C   0  0  0  0  0  0  0  0  0  0 0 0
   -1.6963   -1.8573    0.1583 C   0  0  0  0  0  0  0  0  0  0 0 0
    3.3058   -1.0642    0.3525 C   0  0  0  0  0  0  0  0  0  0 0 0
    3.5605   -2.2494    0.4147 O   0  0  0  0  0  0  0  0  0  0 0 0
    4.2582   -0.1354    0.3501 O   0  0  0  0  0  0  0  0  0  0 0 0
    2.5743    1.5404    0.2298 H   0  0  0  0  0  0  0  0  0  0 0 0
    1.1370   -2.5054    0.3065 H   0  0  0  0  0  0  0  0  0  0 0 0
   -3.9845    0.8147   -0.0397 H   0  0  0  0  0  0  0  0  0  0 0 0
   -3.8648   -1.6307    0.0494 H   0  0  0  0  0  0  0  0  0  0 0 0
   -1.7515   -2.4890    1.0517 H   0  0  0  0  0  0  0  0  0  0 0 0
    5.1452   -0.5339    0.4038 H   0  0  0  0  0  0  0  0  0  0 0 0
   -1.6720   -2.5533   -0.6873 H   0  0  0  0  0  0  0  0  0  0 0 0
  2  1  1  0
 14  6  1  0
 15 14  2  0
 16 14  1  0
 17  5  1  0
 18  7  1  0
 19 11  1  0
 20 12  1  0
 21 13  1  0
 22 16  1  0
 23 13  1  0
  3 10  1  0
  4  2  1  0
  2  3  1  0
 10  9  1  0
  8 13  1  0
 13 12  1  0
 12 11  1  0
  9  4  4  0
  5  6  4  0
  7  8  4  0
  4  5  4  0
  6  7  4  0
  8  9  4  0
 10 11  2  0
V    1 O
M  END




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] [RDKit-discuss] rdchem.ResonanceMolSupplier cause segfault on some inputs

2020-04-29 Thread victor viterbo via Rdkit-discuss

Dear RDKit community,

I am using rdkit 2019.09.3 on Ubuntu 20.04 focal, and for certain inputs 
like the one specified below I get a Segmentation Fault.


Here is a minimal script to reproduce the error as well as the two sdf 
that caused the error.


from rdkit import Chem
from rdkit.Chem import rdchem

mol = Chem.SDMolSupplier('input.sdf', removeHs=False)[0]
flags = flags = Chem.KEKULE_ALL | Chem.ALLOW_INCOMPLETE_OCTETS | 
Chem.ALLOW_CHARGE_SEPARATION | Chem.UNCONSTRAINED_ANIONS | 
Chem.UNCONSTRAINED_CATIONS

suppl = rdchem.ResonanceMolSupplier(mol, flags=flags)
if suppl:
    mol2 = [m for m in suppl][0]



 RDKit  3D

 24 26  0  0  0  0  0  0  0  0999 V2000
    3.7865   -0.2900    0.6422 C   0  0  0  0  0  0  0  0  0  0 0  0
    3.6385    0.6315   -0.3412 C   0  0  0  0  0  0  0  0  0  0 0  0
    2.3608    0.9715   -0.8135 C   0  0  0  0  0  0  0  0  0  0 0  0
    1.1943    0.3935   -0.3100 C   0  0  0  0  0  0  0  0  0  0 0  0
    1.2965   -0.5663    0.7084 C   0  0  0  0  0  0  0  0  0  0 0  0
    2.6181   -0.9571    1.2400 C   0  0  0  0  0  0  0  0  0  0 0  0
   -0.0570   -1.3484    1.4078 S   0  0  0  0  0  0  0  0  0  0 0  0
   -1.4374   -0.6473    0.5380 C   0  0  0  0  0  0  0  0  0  0 0  0
   -1.2617    0.3162   -0.4562 C   0  0  0  0  0  0  0  0  0  0 0  0
   -0.0175    0.7796   -0.8241 N   0  0  0  0  0  0  0  0  0  0 0  0
   -2.3988    0.8168   -1.0899 C   0  0  0  0  0  0  0  0  0  0 0  0
   -3.6585    0.3681   -0.7403 C   0  0  0  0  0  0  0  0  0  0 0  0
   -3.8156   -0.5908    0.2494 C   0  0  0  0  0  0  0  0  0  0 0  0
   -2.6995   -1.0990    0.8898 C   0  0  0  0  0  0  0  0  0  0 0  0
    4.7582   -0.5638    1.0181 H   0  0  0  0  0  0  0  0  0  0 0  0
    4.4954    1.1188   -0.7771 H   0  0  0  0  0  0  0  0  0  0 0  0
    2.2733    1.7084   -1.5969 H   0  0  0  0  0  0  0  0  0  0 0  0
    2.6397   -0.8019    2.3282 H   0  0  0  0  0  0  0  0  0  0 0  0
   -0.0002    1.4769   -1.5560 H   0  0  0  0  0  0  0  0  0  0 0  0
   -2.2855    1.5626   -1.8612 H   0  0  0  0  0  0  0  0  0  0 0  0
   -4.5238    0.7690   -1.2439 H   0  0  0  0  0  0  0  0  0  0 0  0
   -4.7977   -0.9409    0.5214 H   0  0  0  0  0  0  0  0  0  0 0  0
   -2.8007   -1.8452    1.6623 H   0  0  0  0  0  0  0  0  0  0 0  0
    2.7459   -2.0453    1.1511 H   0  0  0  0  0  0  0  0  0  0 0  0
 15  1  1  0
 16  2  1  0
 17  3  1  0
 18  6  1  0
 19 10  1  0
 20 11  1  0
 21 12  1  0
 22 13  1  0
 23 14  1  0
 24  6  1  0
 11 12  4  0
 13 14  4  0
  4 10  1  0
 10  9  1  0
  9  8  4  0
  8  7  1  0
  2  3  1  0
  4  5  1  0
  5  6  1  0
  6  1  1  0
  1  2  2  0
  3  4  2  0
  5  7  2  0
  8 14  4  0
  9 11  4  0
 12 13  4  0
M  CHG  1   7   1
V    1 C
M  END





 RDKit  3D

 23 25  0  0  0  0  0  0  0  0999 V2000
    0.3053    3.6659    0.0544 O   0  0  0  0  0  0  0  0  0  0 0  0
   -0.1542    2.5093    0.0748 N   0  0  0  0  0  0  0  0  0  0 0  0
   -1.5618    2.3423    0.0180 O   0  0  0  0  0  0  0  0  0  0 0  0
    0.4179    1.3007    0.1430 C   0  0  0  0  0  0  0  0  0  0 0  0
    1.7461    0.8529    0.2172 C   0  0  0  0  0  0  0  0  0  0 0  0
    1.9320   -0.5143    0.2726 C   0  0  0  0  0  0  0  0  0  0 0  0
    0.8664   -1.4626    0.2581 C   0  0  0  0  0  0  0  0  0  0 0  0
   -0.4354   -1.0479    0.1863 C   0  0  0  0  0  0  0  0  0  0 0  0
   -0.6211    0.3406    0.1300 C   0  0  0  0  0  0  0  0  0  0 0  0
   -1.8191    1.0230    0.0528 C   0  0  0  0  0  0  0  0  0  0 0  0
   -3.0437    0.2952    0.0215 C   0  0  0  0  0  0  0  0  0  0 0  0
   -2.9565   -1.0485    0.0711 C   0  0  0  0  0  0  0  0  0  0 0  0
   -1.6963   -1.8573    0.1583 C   0  0  0  0  0  0  0  0  0  0 0  0
    3.3058   -1.0642    0.3525 C   0  0  0  0  0  0  0  0  0  0 0  0
    3.5605   -2.2494    0.4147 O   0  0  0  0  0  0  0  0  0  0 0  0
    4.2582   -0.1354    0.3501 O   0  0  0  0  0  0  0  0  0  0 0  0
    2.5743    1.5404    0.2298 H   0  0  0  0  0  0  0  0  0  0 0  0
    1.1370   -2.5054    0.3065 H   0  0  0  0  0  0  0  0  0  0 0  0
   -3.9845    0.8147   -0.0397 H   0  0  0  0  0  0  0  0  0  0 0  0
   -3.8648   -1.6307    0.0494 H   0  0  0  0  0  0  0  0  0  0 0  0
   -1.7515   -2.4890    1.0517 H   0  0  0  0  0  0  0  0  0  0 0  0
    5.1452   -0.5339    0.4038 H   0  0  0  0  0  0  0  0  0  0 0  0
   -1.6720   -2.5533   -0.6873 H   0  0  0  0  0  0  0  0  0  0 0  0
  2  1  1  0
 14  6  1  0
 15 14  2  0
 16 14  1  0
 17  5  1  0
 18  7  1  0
 19 11  1  0
 20 12  1  0
 21 13  1  0
 22 16  1  0
 23 13  1  0
  3 10  1  0
  4  2  1  0
  2  3  1  0
 10  9  1  0
  8 13  1  0
 13 12  1  0
 12 11  1  0
  9  4  4  0
  5  6  4  0
  7  8  4  0
  4  5  4  0
  6  7  4  0
  8  9  4  0
 10 11  2  0
V    1 O
M  END





The molecule was created using rdkit rwmol module and the mol works with 
every other functions including Chem.SanitizeMol(mol). The sdf was 
generated using rdkit.


The segfault occurs when accessing the supplier.

Would you know any way to fix/circumvent/predict the segfault

Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread dmaziuk via Rdkit-discuss

On 4/26/2020 2:44 PM, Riccardo Vianello wrote:
...

as far as I know, this isn't (or at least wasn't) directly possible. SWIG
includes the core functionality required to generate these wrappers, but C
is not a supported target language. Last time I checked a mostly functional
but not ready to merge GSoC branch existed, that was aimed at providing
this feature, but I am not sure it made any progress.


Like Greg said, C does not have objects. It also lacks method 
overriding, and a few other things. While objects are trivial to emulate 
using structs w/ pointers to functions, I doubt an automated rewrite can 
do even that well enough. (E.g. if you want name mangling that makes sense.)


Once you run into more advanced use of templates: forget it.

Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread Axel Pahl



On 27.04.20 10:18, Stephan Michels wrote:



Am 27.04.2020 um 09:58 schrieb Axel Pahl mailto:axelp...@gmx.de>>:

yes, I agree that manual wrapping would be a ton of work (and hard to
maintain).
My question was not necessarily with a concrete language in mind, but
since you are asking (;-)), I am really excited about the Swift
programming language lately. Open Source, nice syntax, developed by
Apple and endorsed by Google as the next generation platform for
machine learning ("Swift for Tensorflow",
https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/).


Unfortunately Swift doesn’t support C++ yet. I use RDKit in my Swift
projects by using an Objective-C Framework, where I wrap all necessary
classes, which I need. These wrappers are far from complete. But I get
my things done and it is cross compiled for x86 and arm.

Regards,
Stephan Michels


Dear Stephan,

thanks a lot, that is really interesting.
I am using Swift on Linux, so I guess Objective-C wrappers are not
usable for me?

Kind regards,
Axel



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread Stephan Michels

> Am 27.04.2020 um 09:58 schrieb Axel Pahl :
> 
> yes, I agree that manual wrapping would be a ton of work (and hard to 
> maintain).
> My question was not necessarily with a concrete language in mind, but since 
> you are asking (;-)), I am really excited about the Swift programming 
> language lately. Open Source, nice syntax, developed by Apple and endorsed by 
> Google as the next generation platform for machine learning ("Swift for 
> Tensorflow", 
> https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/
>  
> ).

Unfortunately Swift doesn’t support C++ yet. I use RDKit in my Swift projects 
by using an Objective-C Framework, where I wrap all necessary classes, which I 
need. These wrappers are far from complete. But I get my things done and it is 
cross compiled for x86 and arm.

Regards,
Stephan Michels

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread Axel Pahl

Hi Greg,

yes, I agree that manual wrapping would be a ton of work (and hard to
maintain).
My question was not necessarily with a concrete language in mind, but
since you are asking (;-)), I am really excited about the Swift
programming language lately. Open Source, nice syntax, developed by
Apple and endorsed by Google as the next generation platform for machine
learning ("Swift for Tensorflow",
https://tryolabs.com/blog/2020/04/02/swift-googles-bet-on-differentiable-programming/).

Thanks a lot for the link to the MinimalLib, that looks very
interesting, I will definitely have a look.

Kind regards,
Axel

On 27.04.20 09:43, Greg Landrum wrote:

Hi Axel,

Doing complete C wrappers around the C++ library would be a ton of
work - you'd have to simulate the object model on the C side, and
that's "not easy".

Doing a "full" wrapper for another programming language is almost
certainly best handled using SWIG (assuming it has bindings for the
language you are interested in).

An alternate approach would be to look at the MinimalLib: the library
that's used to generate the Javascript wrappers: it's based on the
idea of exposing a small number of useful functions instead of
supporting the full API:
https://github.com/rdkit/rdkit/tree/master/Code/MinimalLib

-greg


On Sun, Apr 26, 2020 at 9:47 PM Riccardo Vianello
mailto:riccardo.viane...@gmail.com>> wrote:

Hi Axel,

On Sun, Apr 26, 2020 at 6:21 PM Axel Pahl mailto:axelp...@gmx.de>> wrote:

is anyone aware of efforts for creating "bridging" C wrappers
for the RDKit?
This would make it easier to bind the toolkit to other
programming languages, in addition to Python and Java.

I'm not aware of any active/existing projects, but I've been also
thinking about this possibility in a couple of occasions.

Could the existing SWIG interface be perused for this?

as far as I know, this isn't (or at least wasn't) directly
possible. SWIG includes the core functionality required to
generate these wrappers, but C is not a supported target language.
Last time I checked a mostly functional but not ready to merge
GSoC branch existed, that was aimed at providing this feature, but
I am not sure it made any progress.

Best,
Riccardo


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread Greg Landrum
Hi Axel,

Doing complete C wrappers around the C++ library would be a ton of work -
you'd have to simulate the object model on the C side, and that's "not
easy".

Doing a "full" wrapper for another programming language is almost certainly
best handled using SWIG (assuming it has bindings for the language you are
interested in).

An alternate approach would be to look at the MinimalLib: the library
that's used to generate the Javascript wrappers: it's based on the idea of
exposing a small number of useful functions instead of supporting the full
API:
https://github.com/rdkit/rdkit/tree/master/Code/MinimalLib

-greg


On Sun, Apr 26, 2020 at 9:47 PM Riccardo Vianello <
riccardo.viane...@gmail.com> wrote:

> Hi Axel,
>
> On Sun, Apr 26, 2020 at 6:21 PM Axel Pahl  wrote:
>
>> is anyone aware of efforts for creating "bridging" C wrappers for the
>> RDKit?
>> This would make it easier to bind the toolkit to other programming
>> languages, in addition to Python and Java.
>>
>
> I'm not aware of any active/existing projects, but I've been also thinking
> about this possibility in a couple of occasions.
>
> Could the existing SWIG interface be perused for this?
>
> as far as I know, this isn't (or at least wasn't) directly possible. SWIG
> includes the core functionality required to generate these wrappers, but C
> is not a supported target language. Last time I checked a mostly functional
> but not ready to merge GSoC branch existed, that was aimed at providing
> this feature, but I am not sure it made any progress.
>
> Best,
> Riccardo
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-27 Thread Axel Pahl

Hi Riccardo,

thanks for the information.

Yes, I think it would be quite useful, but I guess it would have to be
mostly auto-generated, otherwise it would be too much work (and hard to
maintain).
The GSoC SWIG 2012 branch does indeed not seem to have made it into
master (https://github.com/swig/swig/tree/gsoc2012-c/).

Kind regards,
Axel


On 26.04.20 21:44, Riccardo Vianello wrote:

Hi Axel,

On Sun, Apr 26, 2020 at 6:21 PM Axel Pahl mailto:axelp...@gmx.de>> wrote:

is anyone aware of efforts for creating "bridging" C wrappers for
the RDKit?
This would make it easier to bind the toolkit to other programming
languages, in addition to Python and Java.

I'm not aware of any active/existing projects, but I've been also
thinking about this possibility in a couple of occasions.

Could the existing SWIG interface be perused for this?

as far as I know, this isn't (or at least wasn't) directly possible.
SWIG includes the core functionality required to generate these
wrappers, but C is not a supported target language. Last time I
checked a mostly functional but not ready to merge GSoC branch
existed, that was aimed at providing this feature, but I am not sure
it made any progress.

Best,
Riccardo


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit C wrappers?

2020-04-26 Thread Riccardo Vianello
Hi Axel,

On Sun, Apr 26, 2020 at 6:21 PM Axel Pahl  wrote:

> is anyone aware of efforts for creating "bridging" C wrappers for the
> RDKit?
> This would make it easier to bind the toolkit to other programming
> languages, in addition to Python and Java.
>

I'm not aware of any active/existing projects, but I've been also thinking
about this possibility in a couple of occasions.

Could the existing SWIG interface be perused for this?

as far as I know, this isn't (or at least wasn't) directly possible. SWIG
includes the core functionality required to generate these wrappers, but C
is not a supported target language. Last time I checked a mostly functional
but not ready to merge GSoC branch existed, that was aimed at providing
this feature, but I am not sure it made any progress.

Best,
Riccardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit C wrappers?

2020-04-26 Thread Axel Pahl

Dear all,

is anyone aware of efforts for creating "bridging" C wrappers for the RDKit?
This would make it easier to bind the toolkit to other programming
languages, in addition to Python and Java.

Could the existing SWIG interface be perused for this?

Kind regards,
Axel
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Application for Google Season of the Docs

2020-04-23 Thread Greg Landrum
Unfortunately not. That requires a technical solution.

There is actually some hope here though: SWIG 4.0 is capable of using
doxygen to generate javadoc:
http://www.swig.org/Doc4.0/SWIGDocumentation.html#Doxygen

That just requires someone with the time and motivation to try it out and
make whatever tweaks are required to the C++ documentation to make it
compatible.

-greg


On Thu, 23 Apr 2020 at 18:52, Tim Dudgeon  wrote:

> Would it be possible to add improving the docs for Java to the list?
>
> e.g. https://sourceforge.net/p/rdkit/mailman/message/36929992/
>
>
> Tim
>
>
> On 23/04/2020 16:17, Scalfani, Vincent wrote:
>
> Dear RDKit Community,
>
>
> Greg and I are putting together an application for the Google Season of
> the Docs program:
>
>
> https://developers.google.com/season-of-docs
>
>
> The program connects open source organizations with technical writers. A
> technical writer would work with several RDKit community mentors to advance
> the documentation over a 3 month period (starting in August).
>
>
> We have a few project ideas already including expansion of the RDKit Book
> to include additional undocumented methods, creating a beginner friendly
> guide to making the most of the Python API Docs, and creating an official
> RDKit with Pandas Book. Detailed project descriptions will be available on
> the RDKit blog soon, and I'm also happy to share directly with you.
>
>
> We are seeking mentors to help us advance the RDKit documentation and
> participate in Google Season of the Docs. Here are the mentor
> responsibilities:
>
>
>
> https://developers.google.com/season-of-docs/docs/admin-mentor-responsibilities
>
>
> We'll need to submit our application by May 4, so please do get in touch
> if you are interested.
>
>
> Thanks,
>
>
> Vin Scalfani
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Application for Google Season of the Docs

2020-04-23 Thread Tim Dudgeon

Would it be possible to add improving the docs for Java to the list?

e.g. https://sourceforge.net/p/rdkit/mailman/message/36929992/


Tim


On 23/04/2020 16:17, Scalfani, Vincent wrote:


Dear RDKit Community,


Greg and I are putting together an application for the Google Season 
of the Docs program:



https://developers.google.com/season-of-docs


The program connects open source organizations with technical writers. 
A technical writer would work with several RDKit community mentors to 
advance the documentation over a 3 month period (starting in August).



We have a few project ideas already including expansion of the RDKit 
Book to include additional undocumented methods, creating a beginner 
friendly guide to making the most of the Python API Docs, and creating 
an official RDKit with Pandas Book. Detailed project descriptions will 
be available on the RDKit blog soon, and I'm also happy to share 
directly with you.



We are seeking mentors to help us advance the RDKit documentation and 
participate in Google Season of the Docs. Here are the mentor 
responsibilities:



https://developers.google.com/season-of-docs/docs/admin-mentor-responsibilities


We'll need to submit our application by May 4, so please do get in 
touch if you are interested.



Thanks,


Vin Scalfani







___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Application for Google Season of the Docs

2020-04-23 Thread Scalfani, Vincent
Dear RDKit Community,


Greg and I are putting together an application for the Google Season of the 
Docs program:


https://developers.google.com/season-of-docs


The program connects open source organizations with technical writers. A 
technical writer would work with several RDKit community mentors to advance the 
documentation over a 3 month period (starting in August).


We have a few project ideas already including expansion of the RDKit Book to 
include additional undocumented methods, creating a beginner friendly guide to 
making the most of the Python API Docs, and creating an official RDKit with 
Pandas Book. Detailed project descriptions will be available on the RDKit blog 
soon, and I'm also happy to share directly with you.


We are seeking mentors to help us advance the RDKit documentation and 
participate in Google Season of the Docs. Here are the mentor responsibilities:


https://developers.google.com/season-of-docs/docs/admin-mentor-responsibilities


We'll need to submit our application by May 4, so please do get in touch if you 
are interested.


Thanks,


Vin Scalfani




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

2020-04-05 Thread Gustavo Seabra
Thanks. Yes, I too understood that it should get the connectivity from the 
distances.

I'm using PDB for it being the output from another program.

I'll see what I can change then.

Thanks,
Gustavo.

--
Gustavo Seabra


From: Alan Kerstjens Medina 
Sent: Sunday, April 5, 2020 9:15:26 AM
To: Gustavo Seabra ; 
rdkit-discuss@lists.sourceforge.net 
Subject: RE: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...


Hi Gustavo,



I haven’t looked into the RDKit source code for this but I assume this has to 
do with the lack of CONECT records in the PDB file you attached (i.e. you are 
only storing atom coordinates, not connectivity).



>From what I could gather from the RDKit documentation, the default behaviour 
>for the MolFromPDBFile function is to “sense” bonds based on atom proximity 
>(proximityBonding=True), but I guess that isn’t happening. Maybe someone else 
>could chime in and clarify how to make that feature work as intended.



Is there any particular reason you want to use PDB files for small molecules? 
They tend to be a bit of a headache and not particularly efficient 
storage-wise. If atom coordinates are important maybe it would be easier to use 
SDF or MOL2 files instead.



Best regards,

Alan



From: Gustavo Seabra<mailto:gustavo.sea...@gmail.com>
Sent: 04 April 2020 22:08
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...



Hi all,

I'm having another problem when reading a PDB file. Some files just return
"None", with no error message at all. For example, the attached file:

>>> Chem.MolFromPDBFile("./a3.pdb")

Does not return a Mol object. Does anyone know what is wrong with this file?
I can open it regularly in other programs. Is there any way to "force" rdkit
to recognize the file?

Thanks,
--
Gustavo Seabra


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

2020-04-05 Thread Alan Kerstjens Medina
Hi Gustavo,

I haven’t looked into the RDKit source code for this but I assume this has to 
do with the lack of CONECT records in the PDB file you attached (i.e. you are 
only storing atom coordinates, not connectivity).

>From what I could gather from the RDKit documentation, the default behaviour 
>for the MolFromPDBFile function is to “sense” bonds based on atom proximity 
>(proximityBonding=True), but I guess that isn’t happening. Maybe someone else 
>could chime in and clarify how to make that feature work as intended.

Is there any particular reason you want to use PDB files for small molecules? 
They tend to be a bit of a headache and not particularly efficient 
storage-wise. If atom coordinates are important maybe it would be easier to use 
SDF or MOL2 files instead.

Best regards,
Alan

From: Gustavo Seabra<mailto:gustavo.sea...@gmail.com>
Sent: 04 April 2020 22:08
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

Hi all,

I'm having another problem when reading a PDB file. Some files just return
"None", with no error message at all. For example, the attached file:

>>> Chem.MolFromPDBFile("./a3.pdb")

Does not return a Mol object. Does anyone know what is wrong with this file?
I can open it regularly in other programs. Is there any way to "force" rdkit
to recognize the file?

Thanks,
--
Gustavo Seabra

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit Chem.MolFromPDBFile ignores some files...

2020-04-04 Thread Gustavo Seabra
Hi all,

I'm having another problem when reading a PDB file. Some files just return
"None", with no error message at all. For example, the attached file:

>>> Chem.MolFromPDBFile("./a3.pdb")

Does not return a Mol object. Does anyone know what is wrong with this file?
I can open it regularly in other programs. Is there any way to "force" rdkit
to recognize the file?

Thanks,
--
Gustavo Seabra



a3.pdb
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in C++

2020-03-27 Thread David Cosgrove
Hi Leon,
Sorry for the slow reply.  Greg has just merged in an updated C+ document
which will be available in the new release.  If you've used the old methods
for iterating over atoms, you might want to take a look at that bit at
least.  It is now vastly simpler.
It seems the RMS routines in C++ are still waiting for someone to be
stirred enough to do them.  I'm not sure they will make your life much
faster in any case, as it is an inherently expensive process and the basic
comparison of 2 conformers is compiled C++ code wrapped up for Python.
What the Python wrapper has that isn't in C++ is just a higher-level
function that does all conformers vs all conformers for a molecule.  The
overhead for those extra loops in Python is unlikely to be huge, I wouldn't
think.
Best,
Dave


On Wed, Feb 26, 2020 at 5:41 PM topgunhaides .  wrote:

> Hey Paolo and David,
>
> Thanks a lot!
> This is probably the most helpful resource I can use. It is great that you
> are planning to add new stuff in there and update things.
>
> One reason for me to transform my python code to c++ is to improve
> efficiency.
> (need to do a series of RDKit works like embedding confromers, RMS between
> confs, Shape Tanimoto distances, etc., with a lot of my own programming
> logic)
> In addition, profiling my python code showed the RMS (bestrms) step is the
> bottleneck, is the C++ version of RMS code coming soon?
>
> I will keep tracking the changes you make in the near future. Really
> appreciate it!
>
> Best,
> Leon
>
>
>
>
> On Wed, Feb 26, 2020 at 11:17 AM David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>> Hi Leon,
>> There is indeed such a thing.  It's not as complete as the Python one, as
>> it was rather more work than I anticipated.  Also, I haven't been keeping
>> the examples uptodate, especially the newer ways of iterating over atoms
>> and bonds, and the CMakeLists.txt. It should give you some useful pointers,
>> however. You can find it here:
>> https://github.com/rdkit/rdkit/blob/master/Docs/Book/GettingStartedInC%2B%2B.md,
>> which should be in $RDBASE/Docs/Book if you have cloned the repo.  The
>> examples are in C++Examples in that directory also.
>> I will try and find time over the next few weeks to make the examples
>> current.  Also, underneath $RDBASE/Code there are lots of files called
>> test*cpp which are the unit tests for the various parts, and they have
>> useful stuff in them as well.
>> Cheers,
>> Dave
>>
>>
>> On Wed, Feb 26, 2020 at 3:53 PM topgunhaides . 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I noticed that someone asked such question some years ago.
>>> Since it is now 2020, do we now have anything like "Getting Started with
>>> the RDKit in C++"?
>>>
>>> I am planning to transfer my RDKit python code to C++.
>>> Can anyone give me some resources? I found some, but just in case that I
>>> missed important ones. Any suggestions are very welcome. Thanks!
>>>
>>> Best,
>>> Leon
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit/Anaconda: Fingerprints for a database

2020-03-12 Thread Omar H94
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>
>0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>
> >>>
>
> But like I said I would like to keep the number of nBits = 1024. But this
> is not the main problem because I would like it to be automatically written
> in a file. It's possible? And above all, is it possible to do it for a file
> that has smiles and names for each line? For example type file .txt
>
> CC1257
>    544235
> CC  9850982
> CCC   894983
>
> To do this I guess I have to use the list function like:
>
> >>> list = [r'C:\Users\HP\Desktop\Python_ex\smile_molecules.txt'] #which
> is the location of the file.
> >>>  for mol in list  #Here give me error SyntaxError:
> invalid syntax
>
> >>> fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024,
> bitInfo=info) #here maybe mol became list ?
>
> >>> vector = np.array(fp)
>
> >>> vector
>
> But obviously it doesn't work. I hope you can help me. I don't know if
> what I want to do is possible. If you know some similar work, I'm really
> glad to read it, and maybe I can use it as a guide.
>
> Good day and thank you very much for your availability and collaboration.
>
> Best regards,
> Francesco Coppola
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit/Anaconda: Fingerprints for a database

2020-03-12 Thread Francois Berenger
 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0])





But like I said I would like to keep the number of nBits = 1024. But
this is not the main problem because I would like it to be
automatically written in a file. It's possible? And above all, is it
possible to do it for a file that has smiles and names for each line?
For example type file .txt
CC1257
   544235
CC  9850982
CCC   894983

To do this I guess I have to use the list function like:


list = [r'C:\Users\HP\Desktop\Python_ex\smile_molecules.txt']

#which is the location of the file.

 for mol in list  #Here give me error SyntaxError:

invalid syntax


fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024,

bitInfo=info) #here maybe mol became list ?


vector = np.array(fp)



vector

But obviously it doesn't work. I hope you can help me. I don't know if
what I want to do is possible. If you know some similar work, I'm
really glad to read it, and maybe I can use it as a guide.

Good day and thank you very much for your availability and
collaboration.

Best regards,
Francesco Coppola
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


  1   2   3   4   5   6   7   8   9   >