Hi,
It's not easy (for me) to read through the Java code and figure out what is
going on, but it looks to me like you are "leaking" rdmol in each iteration
of your loop.
The problem that the RDKit Java wrappers (really any Java wrapper created
with SWIG) has here is that the JVM doesn't know how big the underlying C++
object is, so it's not aggressive enough while cleaning up memory. I think
calling rdmol.delete() at the end of each iteration (this frees the
underlying C++ object) should help.
-greg
On Tuesday, July 14, 2015, Matthew Lardy <mla...@gmail.com> wrote:
> Hi all,
>
> I have had a strange issue that I can't seem to find a way around. The
> following code block consumes a ton of memory, which is strange as just
> using the SD File reader I have no memory issues. I think that the issue
> is related to the java garbage collection not being picked up, even though
> I have attempted to force that (to no success).
>
> All the following block does is iterate through an SD file and look for
> the highest (or lowest) scoring molecule for each molecule. The assumption
> is that all molecules of the same type will be next to each other in the
> file (which is not my problem). Running this on a SD file of around 400K
> molecules consumes around 23GB of memory, so if anyone has an idea I will
> be most appreciative!
>
> public static void main(String argv[]) throws IOException,
> InterruptedException
> {
> CommandLineParser cParser;
> String[] modes = {};
> String[] parms = {"-in", "-filterTag", "-direction", "-out"};
> String[] reqParms = {"-in", "-filterTag", "-direction", "-out"};
>
> String rdkitSO = System.getenv("RDKIT_SO");
> System.load(rdkitSO);
>
>
> String currentDir = System.getProperty("user.dir");
> File dir = new File(currentDir);
>
> cParser = new
> CommandLineParser(EXPLAIN,0,0,argv,modes,parms,reqParms);
>
> ROMol rdmol = null;
> ROMol rdmol2 = null;
>
> SDMolSupplier suppl = new SDMolSupplier(cParser.getValue("-in"));
> SDWriter writer = new SDWriter(cParser.getValue("-out"));
> int count = 0;
>
> while (!suppl.atEnd())
> {
> count++;
> if (count % 1000 == 0)
> {
> System.out.println(count);
> }
> rdmol = suppl.next();
> if (rdmol2 == null)
> {
> // rdmol2.delete();
> rdmol2 = new ROMol(rdmol);
> continue;
> }
> if (rdmol.MolToSmiles().equals(rdmol2.MolToSmiles()))
> {
> if ( cParser.getValue("-direction").equals("highest") )
> {
> double value1 =
> Double.parseDouble(rdmol.getProp(cParser.getValue("-filterTag")));
> double value2 =
> Double.parseDouble(rdmol2.getProp(cParser.getValue("-filterTag")));
> //System.out.println("Val1 " + value1 + " Val2 " +
> value2);
> if (value1 > value2)
> {
> rdmol2.delete();
> rdmol2 = new ROMol(rdmol);
> }
> }
> else
> {
> if (
> Double.parseDouble(rdmol.getProp(cParser.getValue("-filterTag"))) <
> Double.parseDouble(rdmol2.getProp(cParser.getValue("-filterTag"))) )
> {
> rdmol2.delete();
> rdmol2 = new ROMol(rdmol);
> }
> }
> } else {
> writer.write(rdmol2);
> rdmol2.delete();
> rdmol2 = new ROMol(rdmol);
> }
> }
> }
>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss