Re: [Rdkit-discuss] Question

2009-04-30 Thread Greg Landrum
[redirecting to list since this may be of general interest]

Yes, I generally store molecules in databases in blob columns
containing the pickles. The primary reason for this is that one can
then skip all the work of parsing the molecule, perceiving the
chemistry, etc.

I don't have a good general answer for how long pickles are. It really
depends on the molecules. One example I have handy is a sqlite
database containing the pubchem screening deck. The molecules are
stores as follows:
sqlite .schema
CREATE TABLE molecules (compound_id varchar not null unique,molpkl blob);
sqlite select count(*) from molecules;
214178

% ls -l Compounds.sqlt
-rw-r--r--  1 landrgr1  staff  167240704 Nov 22 07:28 Compounds.sqlt

There is, no doubt, some overhead associated with the sqlite data, but
this gives a rough estimate.

-greg

On Thu, Apr 30, 2009 at 10:55 AM, Evgueni Kolossov ekolos...@gmail.com wrote:
 and what's the length of Pickles?

 2009/4/30 Evgueni Kolossov ekolos...@gmail.com:
 Greg,

 In this case you probably storing Pickles into database together with
 fingerprints. Am I right?

 Regards,
 Evgueni

 2009/4/30 Greg Landrum greg.land...@gmail.com:
 nope... the transformation is a lossy one

 On Thu, Apr 30, 2009 at 9:56 AM, Evgueni Kolossov ekolos...@gmail.com 
 wrote:
 Hi Greg,

 Another probably stupid question - is it possible to re-create ROMol
 from fingerprints?

 Regards,
 Evgueni





 --
 Dr. Evgueni Kolossov (PhD)
 ekolos...@gmail.com
 Tel.   +44(0)1628 627168
 Mob. +44(0)7812070446




 --
 Dr. Evgueni Kolossov (PhD)
 ekolos...@gmail.com
 Tel.   +44(0)1628 627168
 Mob. +44(0)7812070446




Re: [Rdkit-discuss] Question

2009-04-30 Thread Evgueni Kolossov
Thanks Greg,

Unfortunately I do not quite got it - you mean the size of your
example is 167240704 bytes?

Regards,
Evgueni

2009/4/30 Greg Landrum greg.land...@gmail.com:
 [redirecting to list since this may be of general interest]

 Yes, I generally store molecules in databases in blob columns
 containing the pickles. The primary reason for this is that one can
 then skip all the work of parsing the molecule, perceiving the
 chemistry, etc.

 I don't have a good general answer for how long pickles are. It really
 depends on the molecules. One example I have handy is a sqlite
 database containing the pubchem screening deck. The molecules are
 stores as follows:
 sqlite .schema
 CREATE TABLE molecules (compound_id varchar not null unique,molpkl blob);
 sqlite select count(*) from molecules;
 214178

 % ls -l Compounds.sqlt
 -rw-r--r--  1 landrgr1  staff  167240704 Nov 22 07:28 Compounds.sqlt

 There is, no doubt, some overhead associated with the sqlite data, but
 this gives a rough estimate.

 -greg

 On Thu, Apr 30, 2009 at 10:55 AM, Evgueni Kolossov ekolos...@gmail.com 
 wrote:
 and what's the length of Pickles?

 2009/4/30 Evgueni Kolossov ekolos...@gmail.com:
 Greg,

 In this case you probably storing Pickles into database together with
 fingerprints. Am I right?

 Regards,
 Evgueni

 2009/4/30 Greg Landrum greg.land...@gmail.com:
 nope... the transformation is a lossy one

 On Thu, Apr 30, 2009 at 9:56 AM, Evgueni Kolossov ekolos...@gmail.com 
 wrote:
 Hi Greg,

 Another probably stupid question - is it possible to re-create ROMol
 from fingerprints?

 Regards,
 Evgueni





 --



Re: [Rdkit-discuss] Question

2009-04-30 Thread Greg Landrum
Yes, the database containing the 214K molecules is 167MB


On Thu, Apr 30, 2009 at 7:55 PM, Evgueni Kolossov ekolos...@gmail.com wrote:
 Thanks Greg,

 Unfortunately I do not quite got it - you mean the size of your
 example is 167240704 bytes?

 Regards,
 Evgueni

 2009/4/30 Greg Landrum greg.land...@gmail.com:
 [redirecting to list since this may be of general interest]

 Yes, I generally store molecules in databases in blob columns
 containing the pickles. The primary reason for this is that one can
 then skip all the work of parsing the molecule, perceiving the
 chemistry, etc.

 I don't have a good general answer for how long pickles are. It really
 depends on the molecules. One example I have handy is a sqlite
 database containing the pubchem screening deck. The molecules are
 stores as follows:
 sqlite .schema
 CREATE TABLE molecules (compound_id varchar not null unique,molpkl blob);
 sqlite select count(*) from molecules;
 214178

 % ls -l Compounds.sqlt
 -rw-r--r--  1 landrgr1  staff  167240704 Nov 22 07:28 Compounds.sqlt

 There is, no doubt, some overhead associated with the sqlite data, but
 this gives a rough estimate.

 -greg




Re: [Rdkit-discuss] Question

2009-04-30 Thread Greg Landrum
There really isn't a maximum. It depends on the number of atoms,
number of bonds, and number of conformers.

On Thu, Apr 30, 2009 at 9:09 PM, Evgueni Kolossov ekolos...@gmail.com wrote:
 Ok , so the average size 781 byte. What's the max size of one molecule
 can be in theory?

 2009/4/30 Greg Landrum greg.land...@gmail.com:
 Yes, the database containing the 214K molecules is 167MB


 On Thu, Apr 30, 2009 at 7:55 PM, Evgueni Kolossov ekolos...@gmail.com 
 wrote:
 Thanks Greg,

 Unfortunately I do not quite got it - you mean the size of your
 example is 167240704 bytes?

 Regards,
 Evgueni

 2009/4/30 Greg Landrum greg.land...@gmail.com:
 [redirecting to list since this may be of general interest]

 Yes, I generally store molecules in databases in blob columns
 containing the pickles. The primary reason for this is that one can
 then skip all the work of parsing the molecule, perceiving the
 chemistry, etc.

 I don't have a good general answer for how long pickles are. It really
 depends on the molecules. One example I have handy is a sqlite
 database containing the pubchem screening deck. The molecules are
 stores as follows:
 sqlite .schema
 CREATE TABLE molecules (compound_id varchar not null unique,molpkl blob);
 sqlite select count(*) from molecules;
 214178

 % ls -l Compounds.sqlt
 -rw-r--r--  1 landrgr1  staff  167240704 Nov 22 07:28 Compounds.sqlt

 There is, no doubt, some overhead associated with the sqlite data, but
 this gives a rough estimate.

 -greg





 --
 Dr. Evgueni Kolossov (PhD)
 ekolos...@gmail.com
 Tel.   +44(0)1628 627168
 Mob. +44(0)7812070446