Just to add, I can confirm that re-writing this in Python did indeed bounce
the memory issue I've been having.  Total consumption never crossed 0.1% of
my system memory.  :)  Way less than the 89% I was seeing with the Java
version of the same application!

On Wed, Jul 15, 2015 at 2:05 PM, Matthew Lardy <mla...@gmail.com> wrote:

> Hi Greg,
>
> I know what you mean.  :)  I had tried that before, but executing an
> rdmol.delete() at the end of the loop didn't help.  And, I just re-tried
> that to no avail.
>
> I remember having a similar issue with the SDMolSupplier before, where
> just reading the file consumed a ton of memory.  This was patched, and all
> of the rest of my code runs well.  But if I want to sample from the
> SDMolSupplier stream, things go weird.  I had hoped to copy the each rdmol
> to a new object (reducing the leak) if I wanted to hold it for a time, but
> that didn't help either.  I am deleting every molecule that I hold, but
> there appears to be no impact on memory consumption.  I think that the JVM
> is asleep killing these objects, as forcing it to do so (well, as much as
> one can) doesn't fix things.
>
> I may just have to write this in Python, where I am pretty certain the
> memory issues are non-existant.  :)  I was hopeful that someone else may
> have encountered this issue, and had a path around it.
>
> Thanks for taking a look Greg!
> Matt
>
>
> On Wed, Jul 15, 2015 at 1:57 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi,
>>
>> It's not easy (for me) to read through the Java code and figure out what
>> is going on, but it looks to me like you are "leaking" rdmol in each
>> iteration of your loop.
>>
>> The problem that the RDKit Java wrappers (really any Java wrapper created
>> with SWIG) has here is that the JVM doesn't know how big the underlying C++
>> object is, so it's not aggressive enough while cleaning up memory. I think
>> calling rdmol.delete() at the end of each iteration (this frees the
>> underlying C++ object) should help.
>>
>> -greg
>>
>>
>> On Tuesday, July 14, 2015, Matthew Lardy <mla...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have had a strange issue that I can't seem to find a way around.  The
>>> following code block consumes a ton of memory, which is strange as just
>>> using the SD File reader I have no memory issues.  I think that the issue
>>> is related to the java garbage collection not being picked up, even though
>>> I have attempted to force that (to no success).
>>>
>>> All the following block does is iterate through an SD file and look for
>>> the highest (or lowest) scoring molecule for each molecule.  The assumption
>>> is that all molecules of the same type will be next to each other in the
>>> file (which is not my problem).  Running this on a SD file of around 400K
>>> molecules consumes around 23GB of memory, so if anyone has an idea I will
>>> be most appreciative!
>>>
>>>    public static void main(String argv[]) throws IOException,
>>> InterruptedException
>>>    {
>>>       CommandLineParser cParser;
>>>       String[] modes    = {};
>>>       String[] parms    = {"-in", "-filterTag", "-direction", "-out"};
>>>       String[] reqParms = {"-in", "-filterTag", "-direction", "-out"};
>>>
>>>       String rdkitSO = System.getenv("RDKIT_SO");
>>>       System.load(rdkitSO);
>>>
>>>
>>>       String currentDir   = System.getProperty("user.dir");
>>>       File dir = new File(currentDir);
>>>
>>>       cParser = new
>>> CommandLineParser(EXPLAIN,0,0,argv,modes,parms,reqParms);
>>>
>>>       ROMol rdmol  = null;
>>>       ROMol rdmol2 = null;
>>>
>>>       SDMolSupplier suppl = new SDMolSupplier(cParser.getValue("-in"));
>>>       SDWriter writer = new SDWriter(cParser.getValue("-out"));
>>>       int count = 0;
>>>
>>>       while (!suppl.atEnd())
>>>       {
>>>           count++;
>>>           if (count % 1000 == 0)
>>>           {
>>>              System.out.println(count);
>>>           }
>>>           rdmol = suppl.next();
>>>           if (rdmol2 == null)
>>>           {
>>> //             rdmol2.delete();
>>>              rdmol2 = new ROMol(rdmol);
>>>              continue;
>>>           }
>>>           if (rdmol.MolToSmiles().equals(rdmol2.MolToSmiles()))
>>>           {
>>>               if ( cParser.getValue("-direction").equals("highest") )
>>>               {
>>>                  double value1 =
>>> Double.parseDouble(rdmol.getProp(cParser.getValue("-filterTag")));
>>>                  double value2 =
>>> Double.parseDouble(rdmol2.getProp(cParser.getValue("-filterTag")));
>>>                  //System.out.println("Val1 " + value1 + " Val2 " +
>>> value2);
>>>                  if (value1 > value2)
>>>                  {
>>>                      rdmol2.delete();
>>>                      rdmol2 = new ROMol(rdmol);
>>>                  }
>>>               }
>>>               else
>>>               {
>>>                  if (
>>> Double.parseDouble(rdmol.getProp(cParser.getValue("-filterTag"))) <
>>> Double.parseDouble(rdmol2.getProp(cParser.getValue("-filterTag"))) )
>>>                  {
>>>                      rdmol2.delete();
>>>                      rdmol2 = new ROMol(rdmol);
>>>                  }
>>>               }
>>>           } else {
>>>               writer.write(rdmol2);
>>>               rdmol2.delete();
>>>               rdmol2 = new ROMol(rdmol);
>>>           }
>>>       }
>>>    }
>>>
>>>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to