Looking OBConversion.cpp, I already had the impression that it might
have something to do with the creation of a new zipstream everytime I
call Conversion(). So, I tried working with the zipstream instead of
the ifstream directly as shown here:
OpenBabel::OBConversion conv;
conv.SetInFormat("sdf");
conv.SetOutFormat("can");
for ( unsigned int i=1; i<100000; ++i ){
// get file name of sd file from i and store it in d1
// d1 is then of the form "/here/is/my/sdf/dir/mol.sdf.gz"
int2dir(i,d1);
std::ifstream ifs(d1.c_str());
zlib_stream::zip_istream zIn(ifs);
conv.Convert(&zIn,&std::cout);
ifs.close();
}
And yes, adding this zip_istream line solves the memory issue.
Gert
On Sep 15, 2010, at 4:28 PM, Noel O'Boyle wrote:
> How does it perform with an unzipped SD file?
>
> On 15 September 2010 15:19, Gert Thijs <[email protected]> wrote:
>> Dear all,
>>
>> I have encountered a memory issue when using OBConversion in a large
>> batch run. What I am trying to do is to process a large set of
>> gzipped
>> SD files and transform them into canonical smiles and write these
>> smiles string to std::cout. The file names of the are generated on
>> the fly based on some information about the directory structure.
>>
>> Below I have copied the main code used in the test script in which I
>> encountered a serious memory error.
>>
>> OpenBabel::OBConversion conv;
>> conv.SetInFormat("sdf");
>> conv.SetOutFormat("can");
>>
>> for ( unsigned int i=1; i<100000; ++i ){
>> // get file name of sd file from i and store it in d1
>> // d1 is then of the form "/here/is/my/sdf/dir/mol.sdf.gz"
>> int2dir(i,d1);
>>
>> std::ifstream ifs(d1.c_str());
>>
>> conv.Convert(&ifs,&std::cout);
>>
>> ifs.close();
>> }
>>
>>
>> If I run this code, I can see that it gradually eats all the RAM
>> until
>> the program crashes with a memory allocation error. I have done
>> several tests to check where the problem could come from. As far as I
>> understand it, it seems that OBConversion is the main source of the
>> problem. For instance when I open the stream, read one line from it
>> and print this line (and do not use OBConversion), the same program
>> can handle easily more than 1,000,000 files without any hassle.
>>
>> Furthermore, when I use the same code but now I recreate the
>> OBConversion object each time within the for loop the exactly the
>> same
>> kind of behavior is observed.
>> for ( unsigned int i=1; i<100000; ++i ){
>> // get file name of sd file from i
>> // d1 = /my/dir/mol.sdf.gz
>> int2dir(i,d1);
>>
>> std::ifstream ifs(d1.c_str());
>>
>> OpenBabel::OBConversion conv;
>> conv.SetInFormat("sdf");
>> conv.SetOutFormat("can");
>> conv.Convert(&ifs,&std::cout);
>>
>> ifs.close();
>> }
>>
>> So my guess is that there is something strange going on within
>> OBConversion. But as I am not really familiar with the inner workings
>> of OBConversion, I am not sure where to start looking.
>>
>>
>> Any thoughts on this one.
>>
>> I am working on Mac OS X 10.5.8 using g++ 4.0.1
>>
>> many thanks,
>> Gert
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Start uncovering the many advantages of virtual appliances
>> and start using them to simplify application deployment and
>> accelerate your shift to cloud computing.
>> http://p.sf.net/sfu/novell-sfdev2dev
>> _______________________________________________
>> OpenBabel-Devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>
Gert Thijs
Director Chemoinformatics
Silicos NV.
Wetenschapspark 7
B-3590 Diepenbeek
Belgium
Tel: +32 11 350703
Fax: +32 11 220525
http://www.silicos.com/
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel