Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-10 Thread Francois Berenger

On 07/12/2018 19:22, Noel O'Boyle wrote:

An ECFP4 implementation could use a single bit or a million bits. The
actual information that is being encoded is an element of a set of
size of more than billions (I forget the details). So it's hashed to
something manageable. The shorter the length, the more bit collisions
(everything will collide with a single bit, for example). Open Babel
uses 4096. I would regard this as the minimum.


Just FYI, in rdkit, 2048 bits is the default length.


When converting from hex, you could concatenate the binaries. Or you
could use pybel which doesn't the conversion for you:

pybel.readstring("smi", "c1c1C(=O)Cl").calcfp("ecfp4").bits

[556, 1348, 1509, 1547, 1993, 2078, 2089, 2378, 2487, 2531, 2700,
3017, 3023, 3117, 3324, 3395, 3599, 4036]

These are the bits that are set. If you use "len", you can get the
number of them.

Regards,

- Noel

On Fri, 7 Dec 2018 at 09:49, I. Camps  wrote:


@Geoff

I use Python.

I already made an script to convert hex to binary, but as I wrote
previously, the fingerprint (fp) from OpenBabel is in the form of a
set of hex numbers. I converted each one to binary and then
concatenate all the binaries. Is it that okay?

If it is okay, the second problem is that the fp is much longer
(6040) than the RDKit (1024). I really do not understand the
"folded" issue because any read about ECFP4 talk about a 1024 bit
string and not higher.

@Francois

I certainly will take a look!

thank you both.

Camps

On Fri, Dec 7, 2018 at 1:59 AM Geoffrey Hutchison
 wrote:

Using OpenBabel, I got a file with the information that the
fingerprint is a 6040 bits set and got hexadecimal numbers.
Using PyBioMed, which is based in RDKIT, I got a binary string of
1024 bits, very different from that obtained with OpenBabel.

The RDKit binary string will be "folded" down to 1024 bits, so of
course they will be very different bit strings.

2-) How can I convert the ECFP4 obtained from OpenBabel in
hexadecimal form to a bit string with only ones and zeros?

What programming language are you using? For example in Python, a
quick search on StackExchange:
https://stackoverflow.com/questions/1425493/convert-hex-to-binary
[1]

Hope that helps,
-Geoff

 ___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss [2]


Links:
--
[1] https://stackoverflow.com/questions/1425493/convert-hex-to-binary
[2] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss




___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-07 Thread Noel O'Boyle
Just a note regarding the use of fingerprints for repeating structures such
as nanotubes. The bits in the fingerprint quickly become saturated as they
are based on local structural information which is the same again and again
within the repeating structure. For this reason, all peptides, for example,
appear equally similar to all other peptides (to a first approximation).
And all carbon nanotubes may appear equally similar also. Something to bear
in mind. This is one of the arguments for measuring similarity in a
different way, for example graph edit distance, or a descriptor that
captures some measurable global structural property related to a physical
property of interest.

- Noel

On Fri, 7 Dec 2018 at 10:22, Noel O'Boyle  wrote:

> An ECFP4 implementation could use a single bit or a million bits. The
> actual information that is being encoded is an element of a set of size of
> more than billions (I forget the details). So it's hashed to something
> manageable. The shorter the length, the more bit collisions (everything
> will collide with a single bit, for example). Open Babel uses 4096. I would
> regard this as the minimum.
>
> When converting from hex, you could concatenate the binaries. Or you could
> use pybel which doesn't the conversion for you:
> >>> pybel.readstring("smi", "c1c1C(=O)Cl").calcfp("ecfp4").bits
> [556, 1348, 1509, 1547, 1993, 2078, 2089, 2378, 2487, 2531, 2700, 3017,
> 3023, 3117, 3324, 3395, 3599, 4036]
>
> These are the bits that are set. If you use "len", you can get the number
> of them.
>
> Regards,
> - Noel
>
>
> On Fri, 7 Dec 2018 at 09:49, I. Camps  wrote:
>
>> @Geoff
>> I use Python.
>> I already made an script to convert hex to binary, but as I wrote
>> previously, the fingerprint (fp) from OpenBabel is in the form of a set of
>> hex numbers. I converted each one to binary and then concatenate all the
>> binaries. Is it that okay?
>> If it is okay, the second problem is that the fp is much longer (6040)
>> than the RDKit (1024). I really do not understand the "folded" issue
>> because any read about ECFP4 talk about a 1024 bit string and not higher.
>>
>> @Francois
>> I certainly will take a look!
>>
>> thank you both.
>>
>> Camps
>>
>>
>> On Fri, Dec 7, 2018 at 1:59 AM Geoffrey Hutchison <
>> geoff.hutchi...@gmail.com> wrote:
>>
>>> Using OpenBabel, I got a file with the information that the fingerprint
>>> is a 6040 bits set and got hexadecimal numbers.
>>> Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
>>> bits, very different from that obtained with OpenBabel.
>>>
>>>
>>> The RDKit binary string will be "folded" down to 1024 bits, so of course
>>> they will be very different bit strings.
>>>
>>> 2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal
>>> form to a bit string with only ones and zeros?
>>>
>>>
>>> What programming language are you using? For example in Python, a quick
>>> search on StackExchange:
>>>  https://stackoverflow.com/questions/1425493/convert-hex-to-binary
>>>
>>> Hope that helps,
>>> -Geoff
>>>
>> ___
>> OpenBabel-discuss mailing list
>> OpenBabel-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>>
>
___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-07 Thread Noel O'Boyle
An ECFP4 implementation could use a single bit or a million bits. The
actual information that is being encoded is an element of a set of size of
more than billions (I forget the details). So it's hashed to something
manageable. The shorter the length, the more bit collisions (everything
will collide with a single bit, for example). Open Babel uses 4096. I would
regard this as the minimum.

When converting from hex, you could concatenate the binaries. Or you could
use pybel which doesn't the conversion for you:
>>> pybel.readstring("smi", "c1c1C(=O)Cl").calcfp("ecfp4").bits
[556, 1348, 1509, 1547, 1993, 2078, 2089, 2378, 2487, 2531, 2700, 3017,
3023, 3117, 3324, 3395, 3599, 4036]

These are the bits that are set. If you use "len", you can get the number
of them.

Regards,
- Noel


On Fri, 7 Dec 2018 at 09:49, I. Camps  wrote:

> @Geoff
> I use Python.
> I already made an script to convert hex to binary, but as I wrote
> previously, the fingerprint (fp) from OpenBabel is in the form of a set of
> hex numbers. I converted each one to binary and then concatenate all the
> binaries. Is it that okay?
> If it is okay, the second problem is that the fp is much longer (6040)
> than the RDKit (1024). I really do not understand the "folded" issue
> because any read about ECFP4 talk about a 1024 bit string and not higher.
>
> @Francois
> I certainly will take a look!
>
> thank you both.
>
> Camps
>
>
> On Fri, Dec 7, 2018 at 1:59 AM Geoffrey Hutchison <
> geoff.hutchi...@gmail.com> wrote:
>
>> Using OpenBabel, I got a file with the information that the fingerprint
>> is a 6040 bits set and got hexadecimal numbers.
>> Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
>> bits, very different from that obtained with OpenBabel.
>>
>>
>> The RDKit binary string will be "folded" down to 1024 bits, so of course
>> they will be very different bit strings.
>>
>> 2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal
>> form to a bit string with only ones and zeros?
>>
>>
>> What programming language are you using? For example in Python, a quick
>> search on StackExchange:
>>  https://stackoverflow.com/questions/1425493/convert-hex-to-binary
>>
>> Hope that helps,
>> -Geoff
>>
> ___
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-07 Thread I. Camps
@Geoff
I use Python.
I already made an script to convert hex to binary, but as I wrote
previously, the fingerprint (fp) from OpenBabel is in the form of a set of
hex numbers. I converted each one to binary and then concatenate all the
binaries. Is it that okay?
If it is okay, the second problem is that the fp is much longer (6040) than
the RDKit (1024). I really do not understand the "folded" issue because any
read about ECFP4 talk about a 1024 bit string and not higher.

@Francois
I certainly will take a look!

thank you both.

Camps


On Fri, Dec 7, 2018 at 1:59 AM Geoffrey Hutchison 
wrote:

> Using OpenBabel, I got a file with the information that the fingerprint is
> a 6040 bits set and got hexadecimal numbers.
> Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
> bits, very different from that obtained with OpenBabel.
>
>
> The RDKit binary string will be "folded" down to 1024 bits, so of course
> they will be very different bit strings.
>
> 2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal
> form to a bit string with only ones and zeros?
>
>
> What programming language are you using? For example in Python, a quick
> search on StackExchange:
>  https://stackoverflow.com/questions/1425493/convert-hex-to-binary
>
> Hope that helps,
> -Geoff
>
___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-06 Thread Geoffrey Hutchison
> Using OpenBabel, I got a file with the information that the fingerprint is a 
> 6040 bits set and got hexadecimal numbers.
> Using PyBioMed, which is based in RDKIT, I got a binary string of 1024 bits, 
> very different from that obtained with OpenBabel.

The RDKit binary string will be "folded" down to 1024 bits, so of course they 
will be very different bit strings.

> 2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal form 
> to a bit string with only ones and zeros?

What programming language are you using? For example in Python, a quick search 
on StackExchange:
 https://stackoverflow.com/questions/1425493/convert-hex-to-binary 


Hope that helps,
-Geoff___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] << ECFP4 in OpenBabel >>

2018-12-06 Thread Francois Berenger

On 07/12/2018 02:43, I. Camps wrote:

Dear all,

I am trying to compute the ECFP4 fingerprint for a library of
molecules. The molecules are carbon nanotubes functionalized with -OH
group.

Using OpenBabel, I got a file with the information that the
fingerprint is a 6040 bits set and got hexadecimal numbers.

Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
bits, very different from that obtained with OpenBabel.

My questions are:
1-) How good and reliable is the ECFP4 calculated with OpenBabel?

2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal
form to a bit string with only ones and zeros?


I have some code for rdkit's ECFP4 here:
https://github.com/UnixJunkie/consent/blob/master/bin/lbvs_consent_ecfp4.py

And some other code using open-babel, but it is for the MACCS 
fingerprint, there:

https://github.com/UnixJunkie/consent/blob/master/src/ob_maccs.cpp

Regards,
F.


I need the string in 1´s and 0´s because I will used latter to
calculated the Shannon entropy and the BiEntropy.

[]'s,

Camps

___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss




___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


[Open Babel] << ECFP4 in OpenBabel >>

2018-12-06 Thread I. Camps
Dear all,

I am trying to compute the ECFP4 fingerprint for a library of molecules.
The molecules are carbon nanotubes functionalized with -OH group.

Using OpenBabel, I got a file with the information that the fingerprint is
a 6040 bits set and got hexadecimal numbers.

Using PyBioMed, which is based in RDKIT, I got a binary string of 1024
bits, very different from that obtained with OpenBabel.

My questions are:
1-) How good and reliable is the ECFP4 calculated with OpenBabel?
2-) How can I convert the ECFP4 obtained from OpenBabel in hexadecimal form
to a bit string with only ones and zeros?

I need the string in 1´s and 0´s because I will used latter to calculated
the Shannon entropy and the BiEntropy.

[]'s,

Camps
___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss