[Rdkit-discuss] Fwd: Question

2009-05-01 Thread Evgueni Kolossov
Thank you very much Greg,

Couple of more things to clarify:
- in documentation you have mentioned pickle files. Can you please
give an example read/write for this kind of files;
- in SDF the records separated by quite definite sequence which allow
us to create an index file and have direct access to particular record
number. Is it possible with pickle files?

Regards,
Evgueni

2009/4/30 Greg Landrum :
> There really isn't a maximum. It depends on the number of atoms,
> number of bonds, and number of conformers.
>
> On Thu, Apr 30, 2009 at 9:09 PM, Evgueni Kolossov  wrote:
>> Ok , so the average size 781 byte. What's the max size of one molecule
>> can be in theory?
>>
>> 2009/4/30 Greg Landrum :
>>> Yes, the database containing the 214K molecules is 167MB
>>>
>>>
>>> On Thu, Apr 30, 2009 at 7:55 PM, Evgueni Kolossov  
>>> wrote:
 Thanks Greg,

 Unfortunately I do not quite got it - you mean the size of your
 example is 167240704 bytes?

 Regards,
 Evgueni

 2009/4/30 Greg Landrum :
> [redirecting to list since this may be of general interest]
>
> Yes, I generally store molecules in databases in blob columns
> containing the pickles. The primary reason for this is that one can
> then skip all the work of parsing the molecule, perceiving the
> chemistry, etc.
>
> I don't have a good general answer for how long pickles are. It really
> depends on the molecules. One example I have handy is a sqlite
> database containing the pubchem screening deck. The molecules are
> stores as follows:
> sqlite> .schema
> CREATE TABLE molecules (compound_id varchar not null unique,molpkl blob);
> sqlite> select count(*) from molecules;
> 214178
>
> % ls -l Compounds.sqlt
> -rw-r--r--  1 landrgr1  staff  167240704 Nov 22 07:28 Compounds.sqlt
>
> There is, no doubt, some overhead associated with the sqlite data, but
> this gives a rough estimate.
>
> -greg
>
>>>
>>
>>
>>
>> --
>> Dr. Evgueni Kolossov (PhD)
>> ekolos...@gmail.com
>> Tel.   +44(0)1628 627168
>> Mob. +44(0)7812070446
>>
>



--
Dr. Evgueni Kolossov (PhD)
ekolos...@gmail.com
Tel.   +44(0)1628 627168
Mob. +44(0)7812070446



-- 
Dr. Evgueni Kolossov (PhD)
ekolos...@gmail.com
Tel.   +44(0)1628 627168
Mob. +44(0)7812070446



Re: [Rdkit-discuss] Fwd: Question

2009-05-01 Thread Greg Landrum
On Fri, May 1, 2009 at 8:06 AM, Evgueni Kolossov  wrote:
> Thank you very much Greg,
>
> Couple of more things to clarify:
> - in documentation you have mentioned pickle files. Can you please
> give an example read/write for this kind of files;

The documentation is primarily focused on python. Python has it's own
method for serializing (pickling) objects. From C++ I never really did
too much with writing to/reading from binary files. I guess one could
just write the binary data directly to the stream and read it back the
same way, but this doesn't answer your next question:

> - in SDF the records separated by quite definite sequence which allow
> us to create an index file and have direct access to particular record
> number. Is it possible with pickle files?

If you create your own convention for how you write the files, sure.
Otherwise you have to just build files and then write out the result
of an fget after each read is finished. On reading you can seek to the
relevant position and then start reading.

It probably would be useful to have a standardized binary format for
reading from C++ (or python), but I have never had the pressing need;
so it hasn't happened.

-greg



Re: [Rdkit-discuss] Fwd: Question

2009-05-01 Thread Evgueni Kolossov
Ok Greg,

What if we will try to define the format and start with the record separator
- may be use the same as SDF?
Index file can be created during the writing.

Regards,
Evgueni

2009/5/1 Greg Landrum 

> On Fri, May 1, 2009 at 8:06 AM, Evgueni Kolossov 
> wrote:
> > Thank you very much Greg,
> >
> > Couple of more things to clarify:
> > - in documentation you have mentioned pickle files. Can you please
> > give an example read/write for this kind of files;
>
> The documentation is primarily focused on python. Python has it's own
> method for serializing (pickling) objects. From C++ I never really did
> too much with writing to/reading from binary files. I guess one could
> just write the binary data directly to the stream and read it back the
> same way, but this doesn't answer your next question:
>
> > - in SDF the records separated by quite definite sequence which allow
> > us to create an index file and have direct access to particular record
> > number. Is it possible with pickle files?
>
> If you create your own convention for how you write the files, sure.
> Otherwise you have to just build files and then write out the result
> of an fget after each read is finished. On reading you can seek to the
> relevant position and then start reading.
>
> It probably would be useful to have a standardized binary format for
> reading from C++ (or python), but I have never had the pressing need;
> so it hasn't happened.
>
> -greg
>



-- 
Dr. Evgueni Kolossov (PhD)
ekolos...@gmail.com
Tel.   +44(0)1628 627168
Mob. +44(0)7812070446