[Rdkit-discuss] Issues with Java Compilation
Hi all, Compiling the latest version and I am having an issue I can't figure out with RDKit 2020_03. I can compile RDKit with python, without issue. But, when I throw the flag for the Java Swig wrappers I get this after everything is compiled: GraphMolJavaJAVA_wrap.cxx:227069:12: error: ‘jlong’ does not name a type; did you mean ‘ulong’? SWIGEXPORT jlong JNICALL Java_org_RDKit_RDKFuncsJNI_Point2D_1SWIGUpcast(JNIEnv *jenv, jclass jcls, jlong jarg1) { So I went to go look at the line specified by the compiler and here are lines 12-14 in GraphMolJavaJAVA_wrap.cxx: #ifndef SWIGJAVA #define SWIGJAVA #endif I've been kicking this thing for the last two weeks, off and on, trying to figure out what I am doing wrong. Any ideas? Thanks, Matt ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Compilation Issues
Hi all, I've been fighting this again with the 2017_09_1 and the 2018_03_1 builds. Both fail to pass all of the tests and for neither does the python wrapper work at all (Java Wrappers work fine). Here is the 2017_09_1 test results: 99% tests passed, 1 tests failed out of 164 Total Test time (real) = 411.95 sec The following tests FAILED: 163 - pythonTestDirChem (Failed) Errors while running CTest make: *** [test] Error 8 And when you use the the python wrapper this is what you get: Boost.Python.ArgumentError: Python argument types in SDWriter.write(SDWriter, NoneType) did not match C++ signature: write(RDKit::SDWriter {lvalue} self, RDKit::ROMol {lvalue} mol, int confId=-1) So, it is unusable. Here are the 2018_03_1 test results: 99% tests passed, 2 tests failed out of 167 Total Test time (real) = 475.07 sec The following tests FAILED: 138 - JavaDistanceGeometryTests (Failed) 166 - pythonTestDirChem (Failed) Errors while running CTest make: *** [test] Error 8 Same version of boost (1.59.0) and swig (3.0.12) with more errors? Again the python wrapper field test: Boost.Python.ArgumentError: Python argument types in SDWriter.write(SDWriter, NoneType) did not match C++ signature: write(RDKit::SDWriter {lvalue} self, RDKit::ROMol {lvalue} mol, int confId=-1) I have never seen these issues from RDKit and have no idea what's going on. This is a build on a clean machine (CentOS 7) with Boost 1.59, CMake 3.0.11, and Swig 3.0.12. If anyone has seen this before and fixed it, please let me know what you did! Until then I am falling back to RDKit 2016 as that built without issues. Thanks in advance! Matt -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit Compile Issue
Hi Paolo, Here are the results from test #5: test 5 Start 5: testMultiFPB 5: Test command: /gne/research/workspace/lardym/rdkit-Release_2017_03_2/build/Code/DataStructs/testMultiFPB 5: Test timeout computed to be: 9.99988e+06 5: [12:58:03] --- 5: Testing MultiFPBReader basics 5: [12:58:03] Finished 5: [12:58:03] --- 5: Testing MultiFPBReader Tanimoto 5: [12:58:03] Finished 5: [12:58:03] --- 5: Testing MultiFPBReader Tversky 5: [12:58:03] Finished 5: [12:58:03] --- 5: Testing MultiFPBReader contains search 5: [12:58:03] Finished 5: [12:58:03] --- 5: Testing MultiFPBReader Similarity Threaded 5: terminate called after throwing an instance of 'RDKit::BadFileException' 5: what(): BadFileException 1/1 Test #5: testMultiFPB .***Exception: Other 0.20 sec 0% tests passed, 1 tests failed out of 1 Total Test time (real) = 2.53 sec The following tests FAILED: 5 - testMultiFPB (OTHER_FAULT) Matt On Wed, Jun 21, 2017 at 11:33 AM, Paolo Tosco <paolo.to...@unito.it> wrote: > Hi Matthew, > > try running a couple of the failing tests in verbose mode: > > ctest -I 5,5 -V > ctest -I 9,9 -V > > I'd also suggest to check that your Boost libraries are in your > LD_LIBRARY_PATH, as you have built your own. > > Cheers, > p. > > > On 06/21/17 19:17, Matthew Lardy wrote: > > Hi all, > > I'm trying to get a new build for any version of RDKit from 2016_4 to the > most reccent release. Everything builds correctly, but when I perform the > tests each version fails at the same place: > > Start 5: testMultiFPB > 5/116 Test #5: testMultiFPB ...***Exception: > Other 0.12 sec > Start 6: pyBV > 6/116 Test #6: pyBV ... Passed3.65 > sec > Start 7: pyDiscreteValueVect > 7/116 Test #7: pyDiscreteValueVect Passed2.18 > sec > Start 8: pySparseIntVect > 8/116 Test #8: pySparseIntVect Passed2.41 > sec > Start 9: pyFPB > 9/116 Test #9: pyFPB ..***Failed2.16 > sec > > So here is what I'm using: > Boost v.1.59 > CMake v 3.0.0 > Swig v 3.0.8 > CentOS 6 (yes, I know, it's terrible to build on) > > I've looked through the test log, but nothing caught my eye. Any ideas > why the testMultiFPB would fail? > > Thanks! > Matt > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > ___ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit Compile Issue
Hi all, I'm trying to get a new build for any version of RDKit from 2016_4 to the most reccent release. Everything builds correctly, but when I perform the tests each version fails at the same place: Start 5: testMultiFPB 5/116 Test #5: testMultiFPB ...***Exception: Other 0.12 sec Start 6: pyBV 6/116 Test #6: pyBV ... Passed3.65 sec Start 7: pyDiscreteValueVect 7/116 Test #7: pyDiscreteValueVect Passed2.18 sec Start 8: pySparseIntVect 8/116 Test #8: pySparseIntVect Passed2.41 sec Start 9: pyFPB 9/116 Test #9: pyFPB ..***Failed2.16 sec So here is what I'm using: Boost v.1.59 CMake v 3.0.0 Swig v 3.0.8 CentOS 6 (yes, I know, it's terrible to build on) I've looked through the test log, but nothing caught my eye. Any ideas why the testMultiFPB would fail? Thanks! Matt -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit build issues
Hi Greg, I've been doing that too, and I am very happy that I've got the Java wrappers built (v2016.03.1). I've got four Python operations that fail. I can send the log from ctest -V if you'd like: 97% tests passed, 4 tests failed out of 133 Total Test time (real) = 464.59 sec The following tests FAILED: 44 - pyChemReactions (Failed) 62 - pyForceFieldHelpers (Failed) 86 - pyGraphMolWrap (Failed) 133 - pythonTestDirChem (Failed) Errors while running CTest Thanks for everyone's help! Matt On Tue, Apr 19, 2016 at 4:23 AM, Greg Landrum <greg.land...@gmail.com> wrote: > It's great that you're almost there. > > The best way to help diagnose problems running the tests is to run ctest > in verbose mode with "-V". Here's an example on my machine: > > glandrum@Otter:/scratch/RDKit_git/build$ echo $RDBASE > /scratch/RDKit_git > glandrum@Otter:/scratch/RDKit_git/build$ echo $LD_LIBRARY_PATH > /scratch/RDKit_git/lib:/usr/local/pgsql/lib: > glandrum@Otter:/scratch/RDKit_git/build$ ctest -V -R JavaAromat > UpdateCTestConfiguration from > :/scratch/RDKit_git/build/DartConfiguration.tcl > UpdateCTestConfiguration from > :/scratch/RDKit_git/build/DartConfiguration.tcl > Test project /scratch/RDKit_git/build > Constructing a list of tests > Done constructing a list of tests > Checking test dependency graph... > Checking test dependency graph end > test 99 > Start 99: JavaAromaticTests > > 99: Test command: /usr/bin/java > "-Djava.library.path=/scratch/RDKit_git/Code/JavaWrappers/gmwrapper" "-cp" > "/scratch/RDKit_git/External/java_lib/junit.jar:/scratch/RDKit_git/Code/JavaWrappers/gmwrapper/build-test:/scratch/RDKit_git/Code/JavaWrappers/gmwrapper/org.RDKit.jar" > "org.RDKit.AromaticTests" > 99: Test timeout computed to be: 9.99988e+06 > 99: JUnit version 4.10 > 99: .. > 99: Time: 1.8 > 99: > 99: OK (2 tests) > 99: > 1/1 Test #99: JavaAromaticTests Passed1.99 sec > > The following tests passed: > JavaAromaticTests > > 100% tests passed, 0 tests failed out of 1 > > Total Test time (real) = 2.00 sec > glandrum@Otter:/scratch/RDKit_git/build$ > > > I do not have CLASSPATH set and I don't think it's necessary in order to > be able to run the tests. > > -greg > > > > > On Mon, Apr 18, 2016 at 11:47 PM, Matthew Lardy <mla...@gmail.com> wrote: > >> Hi Brian, >> >> Thanks! I gave it a whirl, but no improvement. In previous installs >> I've specified the shared object directly, but there isn't a good reason >> for me to do so. I've done a make install, so everything should be built >> wherever it is supposed to go. >> >> I've missed something simple. I just don't see it yet, >> Matt >> >> >> On Mon, Apr 18, 2016 at 2:08 PM, Brian Kelley <fustiga...@gmail.com> >> wrote: >> >>> > $RDBASE/Code/JavaWrappers/gmwrapper/libGraphMolWrap.so >>> >>> You probably need to drop the explicit .so, the library path is the path >>> to the directory not to the .so itself. >>> >>> Here is what I normally use for testing ( note I usually build into an >>> rdkit_build directory via: -DCMAKE_INSTALL_PREFIX=>> rdkit_build> ) >>> >>> >>> RDBASE=`pwd`/../../rdkit LD_LIBRARY_PATH=`pwd`/rdkit_build/lib >>> PYTHONPATH=`pwd`/rdkit_build/lib/python2.7/site-packages ctest >>> >>> On Mon, Apr 18, 2016 at 4:57 PM, Matthew Lardy <mla...@gmail.com> wrote: >>> >>>> A quick question, what should be set up for the environment for the >>>> Java wrappers? I've got the following set: >>>> >>>> setenv RDBASE $WORKSPACE/RDKit-2016/rdkit-Release_2016_03 >>>> setenv PATH ${PATH}:$RDBASE/lib >>>> >>>> setenv CLASSPATH $RDBASE/Code/JavaWrappers/gmwrapper/org.RDKit.jar >>>> setenv LD_LIBRARY_PATH >>>> ${LD_LIBRARY_PATH}:$WORKSPACE/boost1.59/lib:$RDBASE/build/lib:$RDBASE/Code/JavaWrappers/gmwrapper/libGraphMolWrap.so >>>> setenv PYTHONPATH ${PYTHONPATH}:$RDBASE >>>> >>>> The code builds without issue, save a few warnings. Here is the Java >>>> error I am getting (during the tests): >>>> >>>> JUnit version 4.12 >>>> 119: Exception in thread "main" java.lang.NoClassDefFoundError: >>>> org/hamcrest/SelfDescribing >>>> >>>> Is my CLASSPATH wrong? Did I forget a SO in my LD_LIBRARY_PATH? >>>> >>>> Thanks in advance! >>>> Matt >>>> >>>> >>>> >>
Re: [Rdkit-discuss] RDKit build issues
Hi Brian, Thanks! I gave it a whirl, but no improvement. In previous installs I've specified the shared object directly, but there isn't a good reason for me to do so. I've done a make install, so everything should be built wherever it is supposed to go. I've missed something simple. I just don't see it yet, Matt On Mon, Apr 18, 2016 at 2:08 PM, Brian Kelley <fustiga...@gmail.com> wrote: > > $RDBASE/Code/JavaWrappers/gmwrapper/libGraphMolWrap.so > > You probably need to drop the explicit .so, the library path is the path > to the directory not to the .so itself. > > Here is what I normally use for testing ( note I usually build into an > rdkit_build directory via: -DCMAKE_INSTALL_PREFIX= > ) > > > RDBASE=`pwd`/../../rdkit LD_LIBRARY_PATH=`pwd`/rdkit_build/lib > PYTHONPATH=`pwd`/rdkit_build/lib/python2.7/site-packages ctest > > On Mon, Apr 18, 2016 at 4:57 PM, Matthew Lardy <mla...@gmail.com> wrote: > >> A quick question, what should be set up for the environment for the Java >> wrappers? I've got the following set: >> >> setenv RDBASE $WORKSPACE/RDKit-2016/rdkit-Release_2016_03 >> setenv PATH ${PATH}:$RDBASE/lib >> >> setenv CLASSPATH $RDBASE/Code/JavaWrappers/gmwrapper/org.RDKit.jar >> setenv LD_LIBRARY_PATH >> ${LD_LIBRARY_PATH}:$WORKSPACE/boost1.59/lib:$RDBASE/build/lib:$RDBASE/Code/JavaWrappers/gmwrapper/libGraphMolWrap.so >> setenv PYTHONPATH ${PYTHONPATH}:$RDBASE >> >> The code builds without issue, save a few warnings. Here is the Java >> error I am getting (during the tests): >> >> JUnit version 4.12 >> 119: Exception in thread "main" java.lang.NoClassDefFoundError: >> org/hamcrest/SelfDescribing >> >> Is my CLASSPATH wrong? Did I forget a SO in my LD_LIBRARY_PATH? >> >> Thanks in advance! >> Matt >> >> >> >> On Mon, Apr 18, 2016 at 9:56 AM, Matthew Lardy <mla...@gmail.com> wrote: >> >>> My thoughts exactly. I've forgotten to set RDBASE before and had a few >>> tests fail. This is different. It was everything and it took me days to >>> get to a place where some of the base code was passing tests. Someone >>> mentioned that I need to build boost with the version of Python that I am >>> using. This is problem in an environment where people choose their own >>> adventure with modules. I have no guarantee that they won't chose Python >>> 3.x when I've compiled everything with 2.7.7. That problem will be a good >>> to have when I get there. :) >>> >>> If my present build doesn't pass everything, and it won't, I'll send you >>> the fun directly. :) >>> >>> Thanks again Greg! >>> Matt >>> >>> >>> On Fri, Apr 15, 2016 at 9:34 PM, Greg Landrum <greg.land...@gmail.com> >>> wrote: >>> >>>> Brian's suggestion to take a look at the cmake invocation in the travis >>>> file is a good one. >>>> >>>> Based on the number of failing tests, I think something more than just >>>> an RDBASE problem is wrong. The best way to track this down is by sending >>>> me the cmake command you ran, the output of running make, and the output of >>>> running ctest (after you run ctest you will find this in the directory >>>> Testing/Temporary/LastTest.log). Because this is pretty large and likely >>>> not of interest to others, it's probably best to send it directly to me or >>>> to create a gist and send the link to that. >>>> >>>> -greg >>>> >>>> >>>> >>>> On Sat, Apr 16, 2016 at 2:22 AM, Brian Kelley <fustiga...@gmail.com> >>>> wrote: >>>> >>>>> Getting the right version of boost can be tricky. You can see our >>>>> normal cmake incantation here as well as how we set RDBASE for tests >>>>> >>>>> https://github.com/rdkit/rdkit/blob/master/.travis.yml >>>>> >>>>> Note the >>>>> >>>>> -D Boost_NO_SYSTEM_PATHS=ON >>>>> >>>>> When running cmake, otherwise cmake can get very confused. >>>>> >>>>> >>>>> >>>>> Brian Kelley >>>>> >>>>> On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote: >>>>> >>>>> I'll add, that remembering that cmake and ccmake can produce different >>>>> outcomes I've gone back to trying cmake. But I can't overwrite the >>>>> variables
Re: [Rdkit-discuss] RDKit build issues
A quick question, what should be set up for the environment for the Java wrappers? I've got the following set: setenv RDBASE $WORKSPACE/RDKit-2016/rdkit-Release_2016_03 setenv PATH ${PATH}:$RDBASE/lib setenv CLASSPATH $RDBASE/Code/JavaWrappers/gmwrapper/org.RDKit.jar setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:$WORKSPACE/boost1.59/lib:$RDBASE/build/lib:$RDBASE/Code/JavaWrappers/gmwrapper/libGraphMolWrap.so setenv PYTHONPATH ${PYTHONPATH}:$RDBASE The code builds without issue, save a few warnings. Here is the Java error I am getting (during the tests): JUnit version 4.12 119: Exception in thread "main" java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing Is my CLASSPATH wrong? Did I forget a SO in my LD_LIBRARY_PATH? Thanks in advance! Matt On Mon, Apr 18, 2016 at 9:56 AM, Matthew Lardy <mla...@gmail.com> wrote: > My thoughts exactly. I've forgotten to set RDBASE before and had a few > tests fail. This is different. It was everything and it took me days to > get to a place where some of the base code was passing tests. Someone > mentioned that I need to build boost with the version of Python that I am > using. This is problem in an environment where people choose their own > adventure with modules. I have no guarantee that they won't chose Python > 3.x when I've compiled everything with 2.7.7. That problem will be a good > to have when I get there. :) > > If my present build doesn't pass everything, and it won't, I'll send you > the fun directly. :) > > Thanks again Greg! > Matt > > > On Fri, Apr 15, 2016 at 9:34 PM, Greg Landrum <greg.land...@gmail.com> > wrote: > >> Brian's suggestion to take a look at the cmake invocation in the travis >> file is a good one. >> >> Based on the number of failing tests, I think something more than just an >> RDBASE problem is wrong. The best way to track this down is by sending me >> the cmake command you ran, the output of running make, and the output of >> running ctest (after you run ctest you will find this in the directory >> Testing/Temporary/LastTest.log). Because this is pretty large and likely >> not of interest to others, it's probably best to send it directly to me or >> to create a gist and send the link to that. >> >> -greg >> >> >> >> On Sat, Apr 16, 2016 at 2:22 AM, Brian Kelley <fustiga...@gmail.com> >> wrote: >> >>> Getting the right version of boost can be tricky. You can see our >>> normal cmake incantation here as well as how we set RDBASE for tests >>> >>> https://github.com/rdkit/rdkit/blob/master/.travis.yml >>> >>> Note the >>> >>> -D Boost_NO_SYSTEM_PATHS=ON >>> >>> When running cmake, otherwise cmake can get very confused. >>> >>> >>> >>> Brian Kelley >>> >>> On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote: >>> >>> I'll add, that remembering that cmake and ccmake can produce different >>> outcomes I've gone back to trying cmake. But I can't overwrite the >>> variables in cmake. Here are the results from trying to specify them: >>> >>> Manually-specified variables were not used by the project: >>> >>> BOOST_DIR >>> BOOST_INCLUDE_DIR >>> BOOST_LIBRARY_DIR >>> BOOST_PYTHON_LIBRARY_DEBUG >>> BOOST_PYTHON_LIBRARY_RELEASE >>> BOOST_REGEX_LIBRARY_DEBUG >>> BOOST_REGEX_LIBRARY_RELEASE >>> BOOST_SERIALIZATION_LIBRARY_DEBUG >>> BOOST_SERIALIZATION_LIBRARY_RELEASE >>> BOOST_SYSTEM_LIBRARY_DEBUG >>> BOOST_SYSTEM_LIBRARY_RELEASE >>> BOOST_THREAD_LIBRARY_DEBUG >>> BOOST_THREAD_LIBRARY_RELEASE >>> >>> cmake, as per it's usual, picked up an ancient version of boost (which I >>> want to override). I can get around this with ccmake, but nothing that I >>> compile can pass all of the tests. If someone knows what these variables >>> (shown in ccmake) are called, I'd love to know. >>> >>> Thanks in advance! >>> Matthew >>> >>> >>> On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> If someone has an insight I would love to hear it about how best to >>>> build RDKit from scratch. >>>> >>>> I am using the following to build the 2015.03 release: >>>> RedHat Linux version 6.4 >>>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) >>>> swig version 3.0.8 >>>> cmake vers
Re: [Rdkit-discuss] RDKit build issues
My thoughts exactly. I've forgotten to set RDBASE before and had a few tests fail. This is different. It was everything and it took me days to get to a place where some of the base code was passing tests. Someone mentioned that I need to build boost with the version of Python that I am using. This is problem in an environment where people choose their own adventure with modules. I have no guarantee that they won't chose Python 3.x when I've compiled everything with 2.7.7. That problem will be a good to have when I get there. :) If my present build doesn't pass everything, and it won't, I'll send you the fun directly. :) Thanks again Greg! Matt On Fri, Apr 15, 2016 at 9:34 PM, Greg Landrum <greg.land...@gmail.com> wrote: > Brian's suggestion to take a look at the cmake invocation in the travis > file is a good one. > > Based on the number of failing tests, I think something more than just an > RDBASE problem is wrong. The best way to track this down is by sending me > the cmake command you ran, the output of running make, and the output of > running ctest (after you run ctest you will find this in the directory > Testing/Temporary/LastTest.log). Because this is pretty large and likely > not of interest to others, it's probably best to send it directly to me or > to create a gist and send the link to that. > > -greg > > > > On Sat, Apr 16, 2016 at 2:22 AM, Brian Kelley <fustiga...@gmail.com> > wrote: > >> Getting the right version of boost can be tricky. You can see our normal >> cmake incantation here as well as how we set RDBASE for tests >> >> https://github.com/rdkit/rdkit/blob/master/.travis.yml >> >> Note the >> >> -D Boost_NO_SYSTEM_PATHS=ON >> >> When running cmake, otherwise cmake can get very confused. >> >> >> >> Brian Kelley >> >> On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote: >> >> I'll add, that remembering that cmake and ccmake can produce different >> outcomes I've gone back to trying cmake. But I can't overwrite the >> variables in cmake. Here are the results from trying to specify them: >> >> Manually-specified variables were not used by the project: >> >> BOOST_DIR >> BOOST_INCLUDE_DIR >> BOOST_LIBRARY_DIR >> BOOST_PYTHON_LIBRARY_DEBUG >> BOOST_PYTHON_LIBRARY_RELEASE >> BOOST_REGEX_LIBRARY_DEBUG >> BOOST_REGEX_LIBRARY_RELEASE >> BOOST_SERIALIZATION_LIBRARY_DEBUG >> BOOST_SERIALIZATION_LIBRARY_RELEASE >> BOOST_SYSTEM_LIBRARY_DEBUG >> BOOST_SYSTEM_LIBRARY_RELEASE >> BOOST_THREAD_LIBRARY_DEBUG >> BOOST_THREAD_LIBRARY_RELEASE >> >> cmake, as per it's usual, picked up an ancient version of boost (which I >> want to override). I can get around this with ccmake, but nothing that I >> compile can pass all of the tests. If someone knows what these variables >> (shown in ccmake) are called, I'd love to know. >> >> Thanks in advance! >> Matthew >> >> >> On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote: >> >>> Hi all, >>> >>> If someone has an insight I would love to hear it about how best to >>> build RDKit from scratch. >>> >>> I am using the following to build the 2015.03 release: >>> RedHat Linux version 6.4 >>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) >>> swig version 3.0.8 >>> cmake version 3.0.0 >>> boost version 1.53.0 (I've also tried boost 1.59.0) >>> python version 2.7.7 >>> >>> I use ccmake and alter the things which need to be altered (as cmake >>> always grabs the wrong things for me). My build compiles nicely, and >>> without errors. But the tests are a mess. Here is the summary of my build: >>> >>> 26% tests passed, 87 tests failed out of 118 >>> >>> I've put the details of the tests, in the hope that someone sees a >>> pattern that I do not, below. >>> >>> Before someone recommends a clean install that is not possible. I also >>> do not have root, so I can't just wipe the machine and start over. I >>> cannot use an RPM, so that is out too. Any ideas are welcome! >>> >>> Thanks, >>> Matthew >>> >>> The following tests FAILED: >>> 2 - testDataStructs (OTHER_FAULT) >>> 3 - pyBV (Failed) >>> 4 - pyDiscreteValueVect (Failed) >>> 5 - pySparseIntVect (Failed) >>> 7 - testGrid (OTHER_FAULT) >>> 8 - testPyGeomet
Re: [Rdkit-discuss] RDKit build issues
Hi Brian, The RDBASE was pointed out to me by others as well. It appears that rebuilding with cmake and forcing things via the command-line is working out better than it did earlier. I'm now up to just the python things failing (although I haven't attempted to build the Java wrappers yet). I am, clearly, all thumbs with cmake if things don't go well. :) Worse is that in my environment nothing is stock. In fact, I have to go back and rebuild boost as it also found the ancient libraries that it shouldn't be touching. The C++ code is compiling and passing tests now. No more failures! Thank you all!! Matt On Fri, Apr 15, 2016 at 5:22 PM, Brian Kelley <fustiga...@gmail.com> wrote: > Getting the right version of boost can be tricky. You can see our normal > cmake incantation here as well as how we set RDBASE for tests > > https://github.com/rdkit/rdkit/blob/master/.travis.yml > > Note the > > -D Boost_NO_SYSTEM_PATHS=ON > > When running cmake, otherwise cmake can get very confused. > > > > Brian Kelley > > On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote: > > I'll add, that remembering that cmake and ccmake can produce different > outcomes I've gone back to trying cmake. But I can't overwrite the > variables in cmake. Here are the results from trying to specify them: > > Manually-specified variables were not used by the project: > > BOOST_DIR > BOOST_INCLUDE_DIR > BOOST_LIBRARY_DIR > BOOST_PYTHON_LIBRARY_DEBUG > BOOST_PYTHON_LIBRARY_RELEASE > BOOST_REGEX_LIBRARY_DEBUG > BOOST_REGEX_LIBRARY_RELEASE > BOOST_SERIALIZATION_LIBRARY_DEBUG > BOOST_SERIALIZATION_LIBRARY_RELEASE > BOOST_SYSTEM_LIBRARY_DEBUG > BOOST_SYSTEM_LIBRARY_RELEASE > BOOST_THREAD_LIBRARY_DEBUG > BOOST_THREAD_LIBRARY_RELEASE > > cmake, as per it's usual, picked up an ancient version of boost (which I > want to override). I can get around this with ccmake, but nothing that I > compile can pass all of the tests. If someone knows what these variables > (shown in ccmake) are called, I'd love to know. > > Thanks in advance! > Matthew > > > On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote: > >> Hi all, >> >> If someone has an insight I would love to hear it about how best to build >> RDKit from scratch. >> >> I am using the following to build the 2015.03 release: >> RedHat Linux version 6.4 >> gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) >> swig version 3.0.8 >> cmake version 3.0.0 >> boost version 1.53.0 (I've also tried boost 1.59.0) >> python version 2.7.7 >> >> I use ccmake and alter the things which need to be altered (as cmake >> always grabs the wrong things for me). My build compiles nicely, and >> without errors. But the tests are a mess. Here is the summary of my build: >> >> 26% tests passed, 87 tests failed out of 118 >> >> I've put the details of the tests, in the hope that someone sees a >> pattern that I do not, below. >> >> Before someone recommends a clean install that is not possible. I also >> do not have root, so I can't just wipe the machine and start over. I >> cannot use an RPM, so that is out too. Any ideas are welcome! >> >> Thanks, >> Matthew >> >> The following tests FAILED: >> 2 - testDataStructs (OTHER_FAULT) >> 3 - pyBV (Failed) >> 4 - pyDiscreteValueVect (Failed) >> 5 - pySparseIntVect (Failed) >> 7 - testGrid (OTHER_FAULT) >> 8 - testPyGeometry (Failed) >> 11 - pyAlignment (Failed) >> 14 - testMMFFForceField (OTHER_FAULT) >> 15 - pyForceFieldConstraints (Failed) >> 17 - pyDistGeom (Failed) >> 18 - graphmolTest1 (OTHER_FAULT) >> 21 - graphmolMolOpsTest (SEGFAULT) >> 23 - graphmoltestChirality (OTHER_FAULT) >> 24 - graphmoltestPickler (OTHER_FAULT) >> 26 - hanoiTest (OTHER_FAULT) >> 28 - testDepictor (OTHER_FAULT) >> 29 - pyDepictor (Failed) >> 32 - fileParsersTest1 (OTHER_FAULT) >> 33 - testMolSupplier (OTHER_FAULT) >> 34 - testMolWriter (OTHER_FAULT) >> 35 - testTplParser (OTHER_FAULT) >> 36 - testMol2ToMol (OTHER_FAULT) >> 38 - testReaction (OTHER_FAULT) >> 40 - pyChemReactions (Failed) >> 41 - testChemTransforms (OTHER_FAULT) >> 44 - testFragCatalog (OTHER_FAULT) >> 45 - pyFragCatalog (Failed) >> 46 - testDescriptors (OTHER_FAULT) >
Re: [Rdkit-discuss] RDKit build issues
I'll add, that remembering that cmake and ccmake can produce different outcomes I've gone back to trying cmake. But I can't overwrite the variables in cmake. Here are the results from trying to specify them: Manually-specified variables were not used by the project: BOOST_DIR BOOST_INCLUDE_DIR BOOST_LIBRARY_DIR BOOST_PYTHON_LIBRARY_DEBUG BOOST_PYTHON_LIBRARY_RELEASE BOOST_REGEX_LIBRARY_DEBUG BOOST_REGEX_LIBRARY_RELEASE BOOST_SERIALIZATION_LIBRARY_DEBUG BOOST_SERIALIZATION_LIBRARY_RELEASE BOOST_SYSTEM_LIBRARY_DEBUG BOOST_SYSTEM_LIBRARY_RELEASE BOOST_THREAD_LIBRARY_DEBUG BOOST_THREAD_LIBRARY_RELEASE cmake, as per it's usual, picked up an ancient version of boost (which I want to override). I can get around this with ccmake, but nothing that I compile can pass all of the tests. If someone knows what these variables (shown in ccmake) are called, I'd love to know. Thanks in advance! Matthew On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote: > Hi all, > > If someone has an insight I would love to hear it about how best to build > RDKit from scratch. > > I am using the following to build the 2015.03 release: > RedHat Linux version 6.4 > gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) > swig version 3.0.8 > cmake version 3.0.0 > boost version 1.53.0 (I've also tried boost 1.59.0) > python version 2.7.7 > > I use ccmake and alter the things which need to be altered (as cmake > always grabs the wrong things for me). My build compiles nicely, and > without errors. But the tests are a mess. Here is the summary of my build: > > 26% tests passed, 87 tests failed out of 118 > > I've put the details of the tests, in the hope that someone sees a pattern > that I do not, below. > > Before someone recommends a clean install that is not possible. I also do > not have root, so I can't just wipe the machine and start over. I cannot > use an RPM, so that is out too. Any ideas are welcome! > > Thanks, > Matthew > > The following tests FAILED: > 2 - testDataStructs (OTHER_FAULT) > 3 - pyBV (Failed) > 4 - pyDiscreteValueVect (Failed) > 5 - pySparseIntVect (Failed) > 7 - testGrid (OTHER_FAULT) > 8 - testPyGeometry (Failed) > 11 - pyAlignment (Failed) > 14 - testMMFFForceField (OTHER_FAULT) > 15 - pyForceFieldConstraints (Failed) > 17 - pyDistGeom (Failed) > 18 - graphmolTest1 (OTHER_FAULT) > 21 - graphmolMolOpsTest (SEGFAULT) > 23 - graphmoltestChirality (OTHER_FAULT) > 24 - graphmoltestPickler (OTHER_FAULT) > 26 - hanoiTest (OTHER_FAULT) > 28 - testDepictor (OTHER_FAULT) > 29 - pyDepictor (Failed) > 32 - fileParsersTest1 (OTHER_FAULT) > 33 - testMolSupplier (OTHER_FAULT) > 34 - testMolWriter (OTHER_FAULT) > 35 - testTplParser (OTHER_FAULT) > 36 - testMol2ToMol (OTHER_FAULT) > 38 - testReaction (OTHER_FAULT) > 40 - pyChemReactions (Failed) > 41 - testChemTransforms (OTHER_FAULT) > 44 - testFragCatalog (OTHER_FAULT) > 45 - pyFragCatalog (Failed) > 46 - testDescriptors (OTHER_FAULT) > 47 - pyMolDescriptors (Failed) > 48 - testFingerprints (OTHER_FAULT) > 50 - pyPartialCharges (Failed) > 51 - testMolTransforms (OTHER_FAULT) > 52 - pyMolTransforms (Failed) > 53 - testMMFFForceFieldHelpers (OTHER_FAULT) > 54 - testUFFForceFieldHelpers (OTHER_FAULT) > 55 - pyForceFieldHelpers (Failed) > 56 - testDistGeomHelpers (OTHER_FAULT) > 57 - pyDistGeom (Failed) > 58 - testMolAlign (OTHER_FAULT) > 59 - pyMolAlign (Failed) > 60 - testFeatures (OTHER_FAULT) > 61 - pyChemicalFeatures (Failed) > 62 - testShapeHelpers (OTHER_FAULT) > 63 - pyShapeHelpers (Failed) > 65 - pyMolCatalog (Failed) > 66 - moldraw2DTest1 (OTHER_FAULT) > 67 - pyMolDraw2D (Failed) > 69 - pyFMCS (Failed) > 72 - pyMolHash (Failed) > 74 - pySLNParse (Failed) > 75 - pyGraphMolWrap (Failed) > 76 - pyTestConformerWrap (Failed) > 79 - pyMatCalc (Failed) > 80 - pySimDivPickers (Failed) > 81 - pyRanker (Failed) > 83 - pyFeatures (Failed) > 84 - JavaAromaticTests (Failed) > 85 - JavaAtomPairsTests (Failed) > 86 - JavaBasicMoleculeTests (Failed) > 87 - JavaBasicMolecule2Tests (Failed) > 88 - JavaChemAtomTests (Failed) > 89 - JavaChemBondTests (
[Rdkit-discuss] RDKit build issues
Hi all, If someone has an insight I would love to hear it about how best to build RDKit from scratch. I am using the following to build the 2015.03 release: RedHat Linux version 6.4 gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) swig version 3.0.8 cmake version 3.0.0 boost version 1.53.0 (I've also tried boost 1.59.0) python version 2.7.7 I use ccmake and alter the things which need to be altered (as cmake always grabs the wrong things for me). My build compiles nicely, and without errors. But the tests are a mess. Here is the summary of my build: 26% tests passed, 87 tests failed out of 118 I've put the details of the tests, in the hope that someone sees a pattern that I do not, below. Before someone recommends a clean install that is not possible. I also do not have root, so I can't just wipe the machine and start over. I cannot use an RPM, so that is out too. Any ideas are welcome! Thanks, Matthew The following tests FAILED: 2 - testDataStructs (OTHER_FAULT) 3 - pyBV (Failed) 4 - pyDiscreteValueVect (Failed) 5 - pySparseIntVect (Failed) 7 - testGrid (OTHER_FAULT) 8 - testPyGeometry (Failed) 11 - pyAlignment (Failed) 14 - testMMFFForceField (OTHER_FAULT) 15 - pyForceFieldConstraints (Failed) 17 - pyDistGeom (Failed) 18 - graphmolTest1 (OTHER_FAULT) 21 - graphmolMolOpsTest (SEGFAULT) 23 - graphmoltestChirality (OTHER_FAULT) 24 - graphmoltestPickler (OTHER_FAULT) 26 - hanoiTest (OTHER_FAULT) 28 - testDepictor (OTHER_FAULT) 29 - pyDepictor (Failed) 32 - fileParsersTest1 (OTHER_FAULT) 33 - testMolSupplier (OTHER_FAULT) 34 - testMolWriter (OTHER_FAULT) 35 - testTplParser (OTHER_FAULT) 36 - testMol2ToMol (OTHER_FAULT) 38 - testReaction (OTHER_FAULT) 40 - pyChemReactions (Failed) 41 - testChemTransforms (OTHER_FAULT) 44 - testFragCatalog (OTHER_FAULT) 45 - pyFragCatalog (Failed) 46 - testDescriptors (OTHER_FAULT) 47 - pyMolDescriptors (Failed) 48 - testFingerprints (OTHER_FAULT) 50 - pyPartialCharges (Failed) 51 - testMolTransforms (OTHER_FAULT) 52 - pyMolTransforms (Failed) 53 - testMMFFForceFieldHelpers (OTHER_FAULT) 54 - testUFFForceFieldHelpers (OTHER_FAULT) 55 - pyForceFieldHelpers (Failed) 56 - testDistGeomHelpers (OTHER_FAULT) 57 - pyDistGeom (Failed) 58 - testMolAlign (OTHER_FAULT) 59 - pyMolAlign (Failed) 60 - testFeatures (OTHER_FAULT) 61 - pyChemicalFeatures (Failed) 62 - testShapeHelpers (OTHER_FAULT) 63 - pyShapeHelpers (Failed) 65 - pyMolCatalog (Failed) 66 - moldraw2DTest1 (OTHER_FAULT) 67 - pyMolDraw2D (Failed) 69 - pyFMCS (Failed) 72 - pyMolHash (Failed) 74 - pySLNParse (Failed) 75 - pyGraphMolWrap (Failed) 76 - pyTestConformerWrap (Failed) 79 - pyMatCalc (Failed) 80 - pySimDivPickers (Failed) 81 - pyRanker (Failed) 83 - pyFeatures (Failed) 84 - JavaAromaticTests (Failed) 85 - JavaAtomPairsTests (Failed) 86 - JavaBasicMoleculeTests (Failed) 87 - JavaBasicMolecule2Tests (Failed) 88 - JavaChemAtomTests (Failed) 89 - JavaChemBondTests (Failed) 90 - JavaChemReactionTests (Failed) 91 - JavaChemSmartsTests (Failed) 92 - JavaChemTests (Failed) 93 - JavaChemv2Tests (Failed) 94 - JavaConformerTests (Failed) 95 - JavaDescriptorTests (Failed) 96 - JavaDistanceGeometryTests (Failed) 97 - JavaErrorHandlingTests (Failed) 98 - JavaFingerprintsTests (Failed) 99 - JavaForceFieldsTests (Failed) 100 - JavaHManipulationsTests (Failed) 101 - JavaLipinskiTests (Failed) 102 - JavaPicklingTests (Failed) 103 - JavaSmilesCreationTests (Failed) 104 - JavaSmilesDetailsTests (Failed) 105 - JavaSmilesTests (Failed) 106 - JavaSuppliersTests (Failed) 107 - JavaWrapperTests (Failed) 108 - JavaChemTransformsTests (Failed) 109 - JavaFMCSTests (Failed) 110 - JavaPDBTests (Failed) 111 - JavaAlignTests (Failed) 112 - pythonTestDbCLI (Failed) 113 - pythonTestDirML (Failed) 118 - pythonTestDirChem (Failed) Errors while running CTest -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial!
Re: [Rdkit-discuss] RDKit 2016.03 Installation
Thanks Greg and Guillaume, I caught the chatter about avoiding Boost 1.60, and have done so myself (I'm using boost 1.59). I'm also rolling with Cmake 3.0.0 and Swig 3.0.8. I'll keep at it. I think I'm close, but I am still seeing 80% of the tests fail after compilation. Some of it is working, so I'm on the right track! Thanks! Matthew On Thu, Apr 14, 2016 at 10:37 PM, Greg Landrum <greg.land...@gmail.com> wrote: > Hi Matt, > > 2016.03 isn't quite there (we're still working out some compatibility > things with older operating systems), but it should support a broad range > of versions of each of those tools. > Boost 1.60 is known to be problematic, so I'd avoid that. > > My primary development machine - and it's pretty safe to assume that the > RDKit builds on that without problems ;-) - has: > - Swig 3.0.2 (I think the 2.x series works, but you might as well use 3.x > if you can) > - boost 1.58 (this is what comes with ubuntu 15.10) > - cmake 3.2.2 (what comes with ubuntu 15.10) > > -greg > > > > > On Fri, Apr 15, 2016 at 1:39 AM, Matthew Lardy <mla...@gmail.com> wrote: > >> Hi all, >> >> Does someone know which version of boost, cmake, and swig seems to work >> best for the current release? >> >> Thanks in advance! >> Matt >> >> >> -- >> Find and fix application performance issues faster with Applications >> Manager >> Applications Manager provides deep performance insights into multiple >> tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit 2016.03 Installation
Hi all, Does someone know which version of boost, cmake, and swig seems to work best for the current release? Thanks in advance! Matt -- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Reaction Question
Hi Greg, I see what I was doing wrong, I had put the ring closures inside the brackets when I was mapping the atoms. (Which explains why I didn't see any output there). That was simple error on my side, thanks so much Greg!! Matt On Mon, Sep 21, 2015 at 11:32 PM, Greg Landrum <greg.land...@gmail.com> wrote: > Hi Matt, > > The problem is that you haven't included any atom mapping information that > allows the RDKit to know what to do with the reactants you provide. > > Here's a short demo of what you're doing: > > In [11]: rxn = AllChem.ReactionFromSmarts('C=C1CC=CC=C1>>Cc1c1') > > In [12]: ps = rxn.RunReactants((Chem.MolFromSmiles('C=C1CC=CC=C1'),)) > [02:28:59] reactant 0 has no mapped atoms. > [02:28:59] product 0 has no mapped atoms. > > In [13]: Chem.MolToSmiles(ps[0][0]) > Out[13]: 'Cc1c1' > > In [14]: ps = rxn.RunReactants((Chem.MolFromSmiles('CC=C1C(F)C=CC=C1'),)) > > In [15]: Chem.MolToSmiles(ps[0][0]) > Out[15]: 'Cc1c1' > > > (notice the warning after line [12]) > > And here's how to fix it: > > In [16]: rxn = > AllChem.ReactionFromSmarts('[C:1]=[C:2]1[C:3][C:4]=[C:5][C:6]=[C:7]1>>[C:1][c:2]1[c:3][c:4][c:5][c:6][c:7]1') > > In [17]: ps = rxn.RunReactants((Chem.MolFromSmiles('C=C1CC=CC=C1'),)) > > In [18]: Chem.MolToSmiles(ps[0][0]) > Out[18]: 'Cc1c1' > > In [19]: ps = rxn.RunReactants((Chem.MolFromSmiles('CC=C1C(F)C=CC=C1'),)) > > In [20]: Chem.MolToSmiles(ps[0][0]) > Out[20]: 'CCc1c1F' > > > I hope that helps, > -greg > > > > On Mon, Sep 21, 2015 at 4:41 PM, Matthew Lardy <mla...@gmail.com> wrote: > >> I've repeated the previously seen behavior in Python in case anyone would >> like to take a look there. I am trying to clean a dirty library from a >> vendor and I just want to re-aromatize these molecules. Sorry that the >> code looks so ugly, but the output is exactly the same. >> >> Thanks! >> Matt >> >> Output (not to file): >> Cc1c1 >> Cc1c1 >> >> Cc1c1 >> Cc1c1 >> >> The code that generated it: >> #!/usr/bin/python >> >> from rdkit import Chem >> from rdkit.Chem import AllChem,Draw >> from rdkit.Chem import ChemicalFeatures >> from rdkit import RDConfig >> import os >> import sys >> import gzip >> >> suppl = Chem.SDMolSupplier('w.sdf') >> rxn = AllChem.ReactionFromSmarts('C=C1CC=CC=C1>>Cc1c1') >> >> gz = gzip.open('output.sdf.gz', 'w+') >> writer = Chem.SDWriter(gz) >> >> for m in suppl: >> if not m: continue >> ps = rxn.RunReactants((m,)) >> if (len(ps) > 0): >> # uniq = set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in >> ps]) >> print Chem.MolToSmiles(ps[0][0]) >> #for p in uniq: >> #print Chem.MolToSmiles(p) >> #for prod in uniq: >> # writer.write(prod) >> else: >> writer.write(m) >> writer.close() >> gz.close() >> >> >> On Mon, Sep 21, 2015 at 12:39 PM, Matthew Lardy <mla...@gmail.com> wrote: >> >>> Hi all, >>> >>> I am attempting to transform a functional group in a series of >>> molecules. The reaction is pretty simple (a re-aromatization): >>> >>> C=C1CC=CC=C1>>Cc1c1 >>> >>> The code which generates this runs without error (and it was written in >>> Java). What I don't understand is that the products of the reaction are >>> just Cc1c1. The rest of the molecule is completely missing. Trying to >>> map these atoms didn't reproduce the error, it did not run. Is there a >>> trick to simply run something like this on every occurrence in a molecule? >>> >>> Thanks in advance, and my code fragment is below, >>> Matt >>> >>> Here is my code: >>> >>> SDMolSupplier suppl1 = new >>> SDMolSupplier(cParser.getValue("-in")); >>> ROMol rdmol; >>> //String line = ""; >>> >>> while (!suppl1.atEnd()) >>> { >>> try { >>> rdmol = suppl1.next(); >>> molId++; >>> >>> ROMol_Vect reacts = new ROMol_Vect(); >>> reacts.add(rdmol); >>> >>> String ID = rdmol.getProp("_Name"); >>> System.err.println("MOL_ID: &qu
Re: [Rdkit-discuss] Reaction Question
I've repeated the previously seen behavior in Python in case anyone would like to take a look there. I am trying to clean a dirty library from a vendor and I just want to re-aromatize these molecules. Sorry that the code looks so ugly, but the output is exactly the same. Thanks! Matt Output (not to file): Cc1c1 Cc1c1 Cc1c1 Cc1c1 The code that generated it: #!/usr/bin/python from rdkit import Chem from rdkit.Chem import AllChem,Draw from rdkit.Chem import ChemicalFeatures from rdkit import RDConfig import os import sys import gzip suppl = Chem.SDMolSupplier('w.sdf') rxn = AllChem.ReactionFromSmarts('C=C1CC=CC=C1>>Cc1c1') gz = gzip.open('output.sdf.gz', 'w+') writer = Chem.SDWriter(gz) for m in suppl: if not m: continue ps = rxn.RunReactants((m,)) if (len(ps) > 0): # uniq = set([Chem.MolToSmiles(x[0],isomericSmiles=True) for x in ps]) print Chem.MolToSmiles(ps[0][0]) #for p in uniq: #print Chem.MolToSmiles(p) #for prod in uniq: # writer.write(prod) else: writer.write(m) writer.close() gz.close() On Mon, Sep 21, 2015 at 12:39 PM, Matthew Lardy <mla...@gmail.com> wrote: > Hi all, > > I am attempting to transform a functional group in a series of molecules. > The reaction is pretty simple (a re-aromatization): > > C=C1CC=CC=C1>>Cc1c1 > > The code which generates this runs without error (and it was written in > Java). What I don't understand is that the products of the reaction are > just Cc1c1. The rest of the molecule is completely missing. Trying to > map these atoms didn't reproduce the error, it did not run. Is there a > trick to simply run something like this on every occurrence in a molecule? > > Thanks in advance, and my code fragment is below, > Matt > > Here is my code: > > SDMolSupplier suppl1 = new SDMolSupplier(cParser.getValue("-in")); > ROMol rdmol; > //String line = ""; > > while (!suppl1.atEnd()) > { > try { > rdmol = suppl1.next(); > molId++; > > ROMol_Vect reacts = new ROMol_Vect(); > reacts.add(rdmol); > > String ID = rdmol.getProp("_Name"); > System.err.println("MOL_ID: " + ID); > ROMol_Vect_Vect prods = umr.runReactants(reacts); > > > System.out.println("Reagents: " + reacts.size()); > System.err.println("Product AMT: " + prods.size()); > if (prods.size() < 1) { > // Write out untransformed molecules if they don't > have the pattern > //writer.write(rdmol); > ; > } else { > for (int i = 0; i < prods.size(); i++) > { > System.err.println("In here finally"); > prods.get(i).get(0).setProp("_Name",ID); > // Why am I getting the query back to me? > writer.write(prods.get(i).get(0)); > } > } > } catch (Exception e) { > System.err.println(e); > } catch (Error e) { > System.err.println(e); > } > } > > > -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Reaction Question
Hi all, I am attempting to transform a functional group in a series of molecules. The reaction is pretty simple (a re-aromatization): C=C1CC=CC=C1>>Cc1c1 The code which generates this runs without error (and it was written in Java). What I don't understand is that the products of the reaction are just Cc1c1. The rest of the molecule is completely missing. Trying to map these atoms didn't reproduce the error, it did not run. Is there a trick to simply run something like this on every occurrence in a molecule? Thanks in advance, and my code fragment is below, Matt Here is my code: SDMolSupplier suppl1 = new SDMolSupplier(cParser.getValue("-in")); ROMol rdmol; //String line = ""; while (!suppl1.atEnd()) { try { rdmol = suppl1.next(); molId++; ROMol_Vect reacts = new ROMol_Vect(); reacts.add(rdmol); String ID = rdmol.getProp("_Name"); System.err.println("MOL_ID: " + ID); ROMol_Vect_Vect prods = umr.runReactants(reacts); System.out.println("Reagents: " + reacts.size()); System.err.println("Product AMT: " + prods.size()); if (prods.size() < 1) { // Write out untransformed molecules if they don't have the pattern //writer.write(rdmol); ; } else { for (int i = 0; i < prods.size(); i++) { System.err.println("In here finally"); prods.get(i).get(0).setProp("_Name",ID); // Why am I getting the query back to me? writer.write(prods.get(i).get(0)); } } } catch (Exception e) { System.err.println(e); } catch (Error e) { System.err.println(e); } } -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Memory Issue
Hi Greg, I know what you mean. :) I had tried that before, but executing an rdmol.delete() at the end of the loop didn't help. And, I just re-tried that to no avail. I remember having a similar issue with the SDMolSupplier before, where just reading the file consumed a ton of memory. This was patched, and all of the rest of my code runs well. But if I want to sample from the SDMolSupplier stream, things go weird. I had hoped to copy the each rdmol to a new object (reducing the leak) if I wanted to hold it for a time, but that didn't help either. I am deleting every molecule that I hold, but there appears to be no impact on memory consumption. I think that the JVM is asleep killing these objects, as forcing it to do so (well, as much as one can) doesn't fix things. I may just have to write this in Python, where I am pretty certain the memory issues are non-existant. :) I was hopeful that someone else may have encountered this issue, and had a path around it. Thanks for taking a look Greg! Matt On Wed, Jul 15, 2015 at 1:57 AM, Greg Landrum greg.land...@gmail.com wrote: Hi, It's not easy (for me) to read through the Java code and figure out what is going on, but it looks to me like you are leaking rdmol in each iteration of your loop. The problem that the RDKit Java wrappers (really any Java wrapper created with SWIG) has here is that the JVM doesn't know how big the underlying C++ object is, so it's not aggressive enough while cleaning up memory. I think calling rdmol.delete() at the end of each iteration (this frees the underlying C++ object) should help. -greg On Tuesday, July 14, 2015, Matthew Lardy mla...@gmail.com wrote: Hi all, I have had a strange issue that I can't seem to find a way around. The following code block consumes a ton of memory, which is strange as just using the SD File reader I have no memory issues. I think that the issue is related to the java garbage collection not being picked up, even though I have attempted to force that (to no success). All the following block does is iterate through an SD file and look for the highest (or lowest) scoring molecule for each molecule. The assumption is that all molecules of the same type will be next to each other in the file (which is not my problem). Running this on a SD file of around 400K molecules consumes around 23GB of memory, so if anyone has an idea I will be most appreciative! public static void main(String argv[]) throws IOException, InterruptedException { CommandLineParser cParser; String[] modes= {}; String[] parms= {-in, -filterTag, -direction, -out}; String[] reqParms = {-in, -filterTag, -direction, -out}; String rdkitSO = System.getenv(RDKIT_SO); System.load(rdkitSO); String currentDir = System.getProperty(user.dir); File dir = new File(currentDir); cParser = new CommandLineParser(EXPLAIN,0,0,argv,modes,parms,reqParms); ROMol rdmol = null; ROMol rdmol2 = null; SDMolSupplier suppl = new SDMolSupplier(cParser.getValue(-in)); SDWriter writer = new SDWriter(cParser.getValue(-out)); int count = 0; while (!suppl.atEnd()) { count++; if (count % 1000 == 0) { System.out.println(count); } rdmol = suppl.next(); if (rdmol2 == null) { // rdmol2.delete(); rdmol2 = new ROMol(rdmol); continue; } if (rdmol.MolToSmiles().equals(rdmol2.MolToSmiles())) { if ( cParser.getValue(-direction).equals(highest) ) { double value1 = Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))); double value2 = Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))); //System.out.println(Val1 + value1 + Val2 + value2); if (value1 value2) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } else { if ( Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))) Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))) ) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } } else { writer.write(rdmol2); rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } } -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https
Re: [Rdkit-discuss] Memory Issue
Just to add, I can confirm that re-writing this in Python did indeed bounce the memory issue I've been having. Total consumption never crossed 0.1% of my system memory. :) Way less than the 89% I was seeing with the Java version of the same application! On Wed, Jul 15, 2015 at 2:05 PM, Matthew Lardy mla...@gmail.com wrote: Hi Greg, I know what you mean. :) I had tried that before, but executing an rdmol.delete() at the end of the loop didn't help. And, I just re-tried that to no avail. I remember having a similar issue with the SDMolSupplier before, where just reading the file consumed a ton of memory. This was patched, and all of the rest of my code runs well. But if I want to sample from the SDMolSupplier stream, things go weird. I had hoped to copy the each rdmol to a new object (reducing the leak) if I wanted to hold it for a time, but that didn't help either. I am deleting every molecule that I hold, but there appears to be no impact on memory consumption. I think that the JVM is asleep killing these objects, as forcing it to do so (well, as much as one can) doesn't fix things. I may just have to write this in Python, where I am pretty certain the memory issues are non-existant. :) I was hopeful that someone else may have encountered this issue, and had a path around it. Thanks for taking a look Greg! Matt On Wed, Jul 15, 2015 at 1:57 AM, Greg Landrum greg.land...@gmail.com wrote: Hi, It's not easy (for me) to read through the Java code and figure out what is going on, but it looks to me like you are leaking rdmol in each iteration of your loop. The problem that the RDKit Java wrappers (really any Java wrapper created with SWIG) has here is that the JVM doesn't know how big the underlying C++ object is, so it's not aggressive enough while cleaning up memory. I think calling rdmol.delete() at the end of each iteration (this frees the underlying C++ object) should help. -greg On Tuesday, July 14, 2015, Matthew Lardy mla...@gmail.com wrote: Hi all, I have had a strange issue that I can't seem to find a way around. The following code block consumes a ton of memory, which is strange as just using the SD File reader I have no memory issues. I think that the issue is related to the java garbage collection not being picked up, even though I have attempted to force that (to no success). All the following block does is iterate through an SD file and look for the highest (or lowest) scoring molecule for each molecule. The assumption is that all molecules of the same type will be next to each other in the file (which is not my problem). Running this on a SD file of around 400K molecules consumes around 23GB of memory, so if anyone has an idea I will be most appreciative! public static void main(String argv[]) throws IOException, InterruptedException { CommandLineParser cParser; String[] modes= {}; String[] parms= {-in, -filterTag, -direction, -out}; String[] reqParms = {-in, -filterTag, -direction, -out}; String rdkitSO = System.getenv(RDKIT_SO); System.load(rdkitSO); String currentDir = System.getProperty(user.dir); File dir = new File(currentDir); cParser = new CommandLineParser(EXPLAIN,0,0,argv,modes,parms,reqParms); ROMol rdmol = null; ROMol rdmol2 = null; SDMolSupplier suppl = new SDMolSupplier(cParser.getValue(-in)); SDWriter writer = new SDWriter(cParser.getValue(-out)); int count = 0; while (!suppl.atEnd()) { count++; if (count % 1000 == 0) { System.out.println(count); } rdmol = suppl.next(); if (rdmol2 == null) { // rdmol2.delete(); rdmol2 = new ROMol(rdmol); continue; } if (rdmol.MolToSmiles().equals(rdmol2.MolToSmiles())) { if ( cParser.getValue(-direction).equals(highest) ) { double value1 = Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))); double value2 = Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))); //System.out.println(Val1 + value1 + Val2 + value2); if (value1 value2) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } else { if ( Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))) Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))) ) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } } else { writer.write(rdmol2); rdmol2.delete(); rdmol2 = new ROMol(rdmol
[Rdkit-discuss] Memory Issue
Hi all, I have had a strange issue that I can't seem to find a way around. The following code block consumes a ton of memory, which is strange as just using the SD File reader I have no memory issues. I think that the issue is related to the java garbage collection not being picked up, even though I have attempted to force that (to no success). All the following block does is iterate through an SD file and look for the highest (or lowest) scoring molecule for each molecule. The assumption is that all molecules of the same type will be next to each other in the file (which is not my problem). Running this on a SD file of around 400K molecules consumes around 23GB of memory, so if anyone has an idea I will be most appreciative! public static void main(String argv[]) throws IOException, InterruptedException { CommandLineParser cParser; String[] modes= {}; String[] parms= {-in, -filterTag, -direction, -out}; String[] reqParms = {-in, -filterTag, -direction, -out}; String rdkitSO = System.getenv(RDKIT_SO); System.load(rdkitSO); String currentDir = System.getProperty(user.dir); File dir = new File(currentDir); cParser = new CommandLineParser(EXPLAIN,0,0,argv,modes,parms,reqParms); ROMol rdmol = null; ROMol rdmol2 = null; SDMolSupplier suppl = new SDMolSupplier(cParser.getValue(-in)); SDWriter writer = new SDWriter(cParser.getValue(-out)); int count = 0; while (!suppl.atEnd()) { count++; if (count % 1000 == 0) { System.out.println(count); } rdmol = suppl.next(); if (rdmol2 == null) { // rdmol2.delete(); rdmol2 = new ROMol(rdmol); continue; } if (rdmol.MolToSmiles().equals(rdmol2.MolToSmiles())) { if ( cParser.getValue(-direction).equals(highest) ) { double value1 = Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))); double value2 = Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))); //System.out.println(Val1 + value1 + Val2 + value2); if (value1 value2) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } else { if ( Double.parseDouble(rdmol.getProp(cParser.getValue(-filterTag))) Double.parseDouble(rdmol2.getProp(cParser.getValue(-filterTag))) ) { rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } } else { writer.write(rdmol2); rdmol2.delete(); rdmol2 = new ROMol(rdmol); } } } -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Java and sdf.gz files
Hi all, I am trying to open and write compressed sd files with the Java wrappers. I know, and have, been able to do this in Python but has anyone cracked how to read and write sdf.gz files in Java? Thanks in advance! Matthew -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit SDMolSupplier
Hi Christos, Thanks, presently I iterate over all of the tags and copy them over. That's really ugly and is identical to your suggestion. I was hopeful that there was a more elegant way of doing things but I guess not. The pickling thing really caught me by surprise. I can't tell you how many hours I burned until I realized that was the problem. Thanks for the quick response! Matthew On Thu, Mar 12, 2015 at 8:46 AM, Christos Kannas chriskan...@gmail.com wrote: Hi Mathew, In order to store the tags and the data associated with them prior to writing the molecule to SDF/SMILES file you have to use SetProps(props) e.g. outfile.SetProps(props), where props is a list of the tags names your molecules have. In regards to when you pickle molecules you loose that extra information,you can convert your molecules to PropertyMol, e.g. from rdkit.Chem.PropertyMol import PropertyMol pmol = PropertyMol(mol) Hope the information above helps a bit. Regards, Christos Christos Kannas Researcher Ph.D Student [image: View Christos Kannas's profile on LinkedIn] http://cy.linkedin.com/in/christoskannas On 12 March 2015 at 15:33, Matthew Lardy mla...@gmail.com wrote: Hi, I've noticed some strangeness between the Java and Python wrappers that I have spent some time battling. If I load a SD file, via the SDMolSupplier and then write out the file with SDWriter I lose all of the tags for the molecules. Also if I pickle an SD file, reload it and then iterate over it the tags are missing again. Is this the expected behaviour, or I am doing something wrong? Thanks in advance! Matthew -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit SDMolSupplier
Hi, I've noticed some strangeness between the Java and Python wrappers that I have spent some time battling. If I load a SD file, via the SDMolSupplier and then write out the file with SDWriter I lose all of the tags for the molecules. Also if I pickle an SD file, reload it and then iterate over it the tags are missing again. Is this the expected behaviour, or I am doing something wrong? Thanks in advance! Matthew -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SimilarityMaps
Hi Serenia, Yep that fixed it! Thanks! Matthew On Fri, Feb 20, 2015 at 10:59 PM, Sereina sereina.rini...@gmail.com wrote: Hi Matthew, I think this is related to a previous mailing list item ( https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03528.html ). It has probably something to do with the bounding boxes (they get scaled during the map generation process). In the previous case it was enough to set bbox_inches='tight' when saving the image to solve the problem. I hope this helps. Best, Sereina On 21 Feb 2015, at 02:28, Matthew Lardy mla...@gmail.com wrote: Hi, I am having an issue with the python similaritymaps. I am only seeing a fraction of the molecule. Anyone else have this issue? Thanks in advance! Matthew -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SimilarityMaps
Hi, I am having an issue with the python similaritymaps. I am only seeing a fraction of the molecule. Anyone else have this issue? Thanks in advance! Matthew -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] A RDKit/Scikit-learn question
It is a really nice combo! I got clued in that the variable was a 2D array, and once I saw that the rest was easy. :) Now I just have to walk the feature weights back and see how significant these things are! After double checking that the arrays are of the same size, of course. :) Thanks Greg! Matthew On Thu, Feb 19, 2015 at 10:29 PM, Greg Landrum greg.land...@gmail.com wrote: On Thu, Feb 19, 2015 at 11:59 PM, Matthew Lardy mla...@gmail.com wrote: I have been able to build models via scikit-learn with the RDKit python wrappers. That all works beautifully! It's a nice combination, isn't it? What I am struggling to get are the weights, or scalers, applied to each bit position. For a SVM regression model (SVR) I think that the values I seek are in the coef_ (if the model is created via the linear kernel). But, all I get is something like this when I print that out: [[-0. -0.87146158 -0.46331996 ..., 0.31076767 -0. -0.81882195]] I don't really know the SVM regression approach particularly well, but it looks like that's a vector of vectors. Is the length of the inner vector the same as the length of the fingerprint/descriptor vector you are providing? -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] A RDKit/Scikit-learn question
Hi Maciek, Thanks! My brain was stuck on this for a while, as it has been ages since I have written any Python. BTW- I also took a look at your ODDT, and it reminded me that I need to get the OB python wrappers re-compiled. :) Thanks, Matthew On Fri, Feb 20, 2015 at 1:06 AM, Maciek Wójcikowski mac...@wojcikowski.pl wrote: Hello, If I can remember correctly coefficients are Numpy array. You can try model.coef_.flatten() to get flat Numpy Array. If you really want a python list, then you probably should wrap it up with list(model. coef_.flatten()). The main reason, why the vector is nested is that you can have many output values for one feature vector. PS. I could also recommend my Open Drug Discovery Toolkit for playing around with RDKit and scikit-learn. https://github.com/oddt/oddt Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2015-02-20 7:29 GMT+01:00 Greg Landrum greg.land...@gmail.com: On Thu, Feb 19, 2015 at 11:59 PM, Matthew Lardy mla...@gmail.com wrote: I have been able to build models via scikit-learn with the RDKit python wrappers. That all works beautifully! It's a nice combination, isn't it? What I am struggling to get are the weights, or scalers, applied to each bit position. For a SVM regression model (SVR) I think that the values I seek are in the coef_ (if the model is created via the linear kernel). But, all I get is something like this when I print that out: [[-0. -0.87146158 -0.46331996 ..., 0.31076767 -0. -0.81882195]] I don't really know the SVM regression approach particularly well, but it looks like that's a vector of vectors. Is the length of the inner vector the same as the length of the fingerprint/descriptor vector you are providing? -greg -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] A RDKit/Scikit-learn question
Hi all, I know that this isn't the best forum for this, but I am stuck and was hopeful that someone else has made it through this. I have been able to build models via scikit-learn with the RDKit python wrappers. That all works beautifully! What I am struggling to get are the weights, or scalers, applied to each bit position. For a SVM regression model (SVR) I think that the values I seek are in the coef_ (if the model is created via the linear kernel). But, all I get is something like this when I print that out: [[-0. -0.87146158 -0.46331996 ..., 0.31076767 -0. -0.81882195]] Has anyone flipped this back into an array, or am I looking at the wrong thing here? Forgive my weak python skills for not knowing how to do this automatically, it has been an extremely long time since I have written this much python. (And I should add I have burned an incredible amount of time looking for example code doing exactly this.) Thanks! Matthew -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Replacing H's with F's
Thanks Peter and Greg! I had a three atom query to restrict were I was putting F's, otherwise I would have done as Peter had suggested. Granted my path to flush out the duplicates by pushing this out into Java (using the RDKit Swig bindings) was way more involved than this! Thanks for the walkthrough Greg! It was very helpful! Thanks again! Matthew On Sat, Jan 31, 2015 at 1:58 AM, Greg Landrum greg.land...@gmail.com wrote: For anyone interested in this topic, I just did an RDKit blog post that has a somewhat expanded version of this answer: http://rdkit.blogspot.com/2015/01/chemical-reaction-notes-i.html Best, -greg On Sat, Jan 31, 2015 at 7:59 AM, Greg Landrum greg.land...@gmail.com wrote: Hi Matthew, On Fri, Jan 30, 2015 at 11:06 PM, Matthew Lardy mla...@gmail.com wrote: I am having an issue using the Smarts based Reaction transformations in RDKit. This is a weird transformation, but I wanted to replace any or all of the protons on an aromatic ring with an F. The original transformation that I tried was: c(F)c But that didn't work. So then I tried a couple of other transformations: [c:1][c:2][c:3][c:1][c:2]([F])[c:3] That failed (as these things generally were failing): ps = rxn.RunReactants(mol1) Traceback (most recent call last): File stdin, line 1, in module Boost.Python.ArgumentError: Python argument types in ChemicalReaction.RunReactants(ChemicalReaction, Mol) did not match C++ signature: RunReactants(class RDKit::ChemicalReaction *, class boost::python::list) RunReactants(class RDKit::ChemicalReaction *, class boost::python::tuple) The hint to what is going on is in the error message: you called the RunReactants method with a Mol (the ChemicalReaction in the argument list is the self argument) and it was expecting either a list or a tuple. Here's a version that works: In [8]: rxn = AllChem.ReactionFromSmarts('[c:1][c:2][c:3][c:1][c:2]([F])[c:3]') In [9]: m = Chem.MolFromSmiles('c1c1') In [10]: ps = rxn.RunReactants((m,)) In [11]: len(ps) Out[11]: 12 In [12]: Chem.MolToSmiles(ps[0][0]) Out[12]: 'Fc1c1' Note that this still doesn't really do what you want, because it's encoded to add an F to an aromatic carbon. Here's an example that shows that: In [15]: m = Chem.MolFromSmiles('c1ccc(C)cc1') In [16]: ps = rxn.RunReactants((m,)) In [17]: len(ps) Out[17]: 12 In [18]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[18]: {'Cc1(F)c1', 'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} Note the first product: the F was also added to the carbon with the methyl group. We can fix that by specifying that the reacting carbon must have an H attached: In [22]: rxn = AllChem.ReactionFromSmarts('[c:1][cH:2][c:3][c:1][c:2]([F])[c:3]') In [23]: ps = rxn.RunReactants((m,)) In [24]: len(ps) Out[24]: 10 In [25]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[25]: {'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} There's still the question of why so many products are being produced. Look at Out[24], why do we get 10 different products? The answer is the symmetry in the query describing the reactant. Everywhere this query can match, it matches twice - frontwards and backwards. So instead of five products, three of which are unique, we get ten. This can be handled by recognizing that [c:1] and [c:3] are not actually involved in the reaction, they are just there to define the environment of [c:2]. We can do the same thing with a recursive SMARTS: In [30]: rxn = AllChem.ReactionFromSmarts('[cH$(c(c)c):2][c:2][F]') In [31]: ps = rxn.RunReactants((m,)) In [32]: len(ps) Out[32]: 5 In [33]: set([Chem.MolToSmiles(x[0],True) for x in ps]) Out[33]: {'Cc1ccc(F)cc1', 'Cc1(F)c1', 'Cc1c1F'} Hope this helps, -greg Then I got desperate: [#6:1][#6:2]([#1])[#6:3].[H][#9:4][#6:1][#6:2]([#9:4])[#6:3] Any mention of an explicit H caused issues, so then I dropped it and re-ran things again. No luck. I should mention that I am using the pre-built python RDKit wrappers for windows, and if I use the java wrappers on linux I get different errors but the same outcome. I should add, that the molecule that I read (and the molecule for HF) were both loaded without issue. Anyone else try to do something like this? Matthew -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Replacing H's with F's
Hi all, I am having an issue using the Smarts based Reaction transformations in RDKit. This is a weird transformation, but I wanted to replace any or all of the protons on an aromatic ring with an F. The original transformation that I tried was: c(F)c But that didn't work. So then I tried a couple of other transformations: [c:1][c:2][c:3][c:1][c:2]([F])[c:3] That failed (as these things generally were failing): ps = rxn.RunReactants(mol1) Traceback (most recent call last): File stdin, line 1, in module Boost.Python.ArgumentError: Python argument types in ChemicalReaction.RunReactants(ChemicalReaction, Mol) did not match C++ signature: RunReactants(class RDKit::ChemicalReaction *, class boost::python::list) RunReactants(class RDKit::ChemicalReaction *, class boost::python::tuple) Then I got desperate: [#6:1][#6:2]([#1])[#6:3].[H][#9:4][#6:1][#6:2]([#9:4])[#6:3] Any mention of an explicit H caused issues, so then I dropped it and re-ran things again. No luck. I should mention that I am using the pre-built python RDKit wrappers for windows, and if I use the java wrappers on linux I get different errors but the same outcome. I should add, that the molecule that I read (and the molecule for HF) were both loaded without issue. Anyone else try to do something like this? Matthew -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using ChemicalReactions to set double bond (E/Z)
Hi Greg, It has been a long time since I have done things like this with a Smarts based transformation. You are right, I forgot the specification for the double bond on the other side. If you add it, there isn't any change in the output on my side. I am using the most recent release of RDKit (as I compiled this a couple of weeks ago). For some reason I have had issues compiling the Python wrappers, but I am on CentOS 7, so I am just glad that I got the Java wrappers to work. I was following the examples that I saw in the Code folder pretty tightly, but I am glad to see that the Python wrapped libs seem to work as I would expect. I'll track down the std err log, but apriori I was unaware that one was being produced by RDKit. :) That might take some time to find. :) This all stems from my inability to keep stereochemistry from RWMol.MolFromSmiles(smiString), which is another issue I am having at the moment. I have the double bonds set appropriately and then (after transforming them through RDKit) I lose the stereochemistry. I am pretty certain that I am losing this through the perception of the molecule, but I also need the ability to transform double bonds so I have to find a path through this as well. Thanks, as always, for responding so quickly! I'll share the logs once I locate them! Matthew On Tue, Nov 11, 2014 at 8:46 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Matthew, On Wed, Nov 12, 2014 at 12:24 AM, Matthew Lardy mla...@gmail.com wrote: What I hope is a quick question. I have a smiles string of a compound with an exo-double bond that does not specify E or Z. I want to take these geometry unspecified smiles strings and force one conformation and return a smiles string. So I have the following SMARTS transformation: [C:1]=[C:2][C:3](=[O:4])[C:1]=[C:2]\[C:3](=[O:4]) Which from all I can tell is valid and recognized by RDKit. I get an error when I try to run the reaction: Exception in thread main org.RDKit.ChemicalReactionException at org.RDKit.RDKFuncsJNI.ChemicalReaction_runReactants(Native Method) at org.RDKit.ChemicalReaction.runReactants(ChemicalReaction.java:129) Has anyone else seen this? Or, have I just entered a semi-valid SMARTS transformation? I can't reproduce it in Python: In [4]: rxn = AllChem.ReactionFromSmarts(r'[C:1]=[C:2][C:3](=[O:4])[C:1]=[C:2]\[C:3](=[O:4])') In [5]: ps = rxn.RunReactants((Chem.MolFromSmiles('C=CC=O'),)) In [7]: Chem.MolToSmiles(ps[0][0],True) Out[7]: 'C=CC=O' Which version of the RDKit are you using and what are you providing as an input? If you can find the log file from that java process, the error message that shows up there would also be useful to see. We discovered recently that it's not always trivial to find the log file that contains the stderr stream, but it would certainly be helpful. Note that the reaction as specified doesn't do what I think you want it to. For that you need to specify the directionality of single bonds on both sides of the double bond in the products: In [8]: rxn = AllChem.ReactionFromSmarts(r'[C:5][C:1]=[C:2][C:3](=[O:4])[C:5]/[C:1]=[C:2]\[C:3](=[O:4])') In [9]: ps = rxn.RunReactants((Chem.MolFromSmiles('CC=CC=O'),)) In [10]: Chem.MolToSmiles(ps[0][0],True) Out[10]: 'C/C=C\\C=O' Best, -greg -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Using ChemicalReactions to set double bond (E/Z)
Hi all, What I hope is a quick question. I have a smiles string of a compound with an exo-double bond that does not specify E or Z. I want to take these geometry unspecified smiles strings and force one conformation and return a smiles string. So I have the following SMARTS transformation: [C:1]=[C:2][C:3](=[O:4])[C:1]=[C:2]\[C:3](=[O:4]) Which from all I can tell is valid and recognized by RDKit. I get an error when I try to run the reaction: Exception in thread main org.RDKit.ChemicalReactionException at org.RDKit.RDKFuncsJNI.ChemicalReaction_runReactants(Native Method) at org.RDKit.ChemicalReaction.runReactants(ChemicalReaction.java:129) Has anyone else seen this? Or, have I just entered a semi-valid SMARTS transformation? Thanks in advance! Matthew -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] java wrapper problem on CentOS 6.5
Hi Mattie, I would also caution you that the version of CMake appears to be very important. Previously, I have had no issues building the wrappers using Cmake (v. 2.8.10.2), Boost (v.1.55), and Swig (2.0.10) on CentOS 6.4-6.5. It took me a long time to figure that out. Hopefully that is helpful! Matthew On Tue, Sep 9, 2014 at 3:13 PM, Whitmore, Mattie [USA] whitmore_mat...@bah.com wrote: Dear All, I am building RDKit with java wrappers using the following code: cmake -D BOOST_ROOT=/usr/local -D PYTHON_LIBRARY=/usr/local/lib/python2.7/config/libpython2.7.a -D PYTHON_INCLUDE_PATH=/usr/local/include/python2.7/ -D PYTHON_EXECUTABLE=/usr/local/bin/python2.7 -D RDK_BUILD_SWIG_WRAPPERS=ON .. and I am getting the error: make[2]: *** [Code/JavaWrappers/gmwrapper/GraphMolJavaJAVA_wrap.cxx] Error 1 make[1]: *** [Code/JavaWrappers/gmwrapper/CMakeFiles/GraphMolWrap.dir/all] Error 2 make: *** [all] Error 2 I noticed others getting this error as well, so I'm trying to upgrade to boost 1.46.1 I realize this is not exactally the correct mailing list for this, but I am still running into issues with the proper make commands. I have run ./bootstrap.sh -with-libraries=python -with-python=Python2.7 -with-toolset=gcc ./bjam -a --layout=tagged -q libs/python/src/numeric.cpp:22: warning: ‘boost::python::numeric::unnamed::array_module’ defined but not used g++ -ftemplate-depth-128 -O3 -finline-functions -Wno-inline -Wall -pthread -DBOOST_ALL_NO_LIB=1 -DBOOST_PYTHON_SOURCE -DBOOST_PYTHON_STATIC_LIB -DNDEBUG -I. -I/usr/include/python2.6 -c -o bin.v2/libs/python/build/gcc-4.4.7/release/link-static/threading-multi/numeric.o libs/python/src/numeric.cpp ...failed gcc.compile.c++ bin.v2/libs/python/build/gcc-4.4.7/release/link-static/threading-multi/numeric.o... ...failed updating 1 target... Does anyone have suggestions on building boost with python libraries? Thanks in advance *Mattie * -- Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MaxMin Picker and Python
Hi Greg, Thanks! My Python is really rusty at the moment, so I am always unsure if I am just not going through the steps in the most efficient manner, or if the path that I am following is far from ideal. Granted I would prefer to write this in Java, but I should probably move back towards C++ when I do stuff like this. The code snippet you pointed to worked great for me! I hear you about random picking being about just as good. I'd prefer to have a method for this, but I understand that random is in itself a method. :) In the past I have used a diversity algorithm by Gobbi Lee to filter hitlists, and it has worked really well for me in the past. Rather than re-write that I wanted to give the diversity pickers in RDKit a whirl first. Thanks again Greg, Matt On Wed, Jul 16, 2014 at 9:19 PM, Greg Landrum greg.land...@gmail.com wrote: one other short thing. If this is the code you are using for the distance matrix: On Thu, Jul 17, 2014 at 12:18 AM, Matthew Lardy mla...@gmail.com wrote: dm=[] for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in the interest of time) dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True)) dm = array(dm) Then at least part of the problem is that you are generating the full matrix. I think you intend to have: dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[i+1:26000],returnDistance=True)) in there. That typo was in the original notebook that you used; I'm going to have to figure out how to fix that. -greg -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MaxMin Picker and Python
Hi Dave, That's interesting, and I'll look into it. As I wrote Greg, I am an aficionado of the Gobbi Lee method described here (since we are sharing our favourite methods): http://pubs.acs.org/doi/abs/10.1021/ci025554v While the set I am looking at the moment contains only 26K molecules, I have significantly larger sets behind this one (thus my interest in getting control of the memory consumption). I'll give the BigPicker a try, as I continue searching for a light diversity selection algorithm for ever increasing data sets. :) Thanks Dave! Matt On Thu, Jul 17, 2014 at 12:59 AM, David Cosgrove davidacosgrov...@gmail.com wrote: If you don't mind writing some extra code, we've had good success with a Monte Carlo implementation of a maximin diversity picker called BigPicker, described in Blomberg et al, JCAMD, 23, 513-525 (2009). With this implementation, you only need to keep the subset distance matrix in memory. At each step, one of the 2 molecules involved in the shortest subset distance is swapped out, and a randomly chosen molecule from the pool replaces it. The relevant row/column of the subset distance matrix is updated and the new minimum interdistance found. A Monte Carlo criterion is used to decide whether to accept the swap or not. As the name suggests, it can be used on very large datasets. Indeed, in our implementation we allowed for the case where the subset was too large for the subset distance matrix to be held in memory and the minimum distance was calculated from fingerprints on the the fly at each step. That really was slow, but if it's the only way of solving the problem... It's worth recognising that this sort of algorithm spends a lot of time mucking about improving the interdistance in the 4th or 5th decimal place. It's not clear that a subset with a minimum interdistance of 0.41567 is definitively better than one of 0.41568, so a fairly loose convergence criterion is usually ok. In our experience a larger number of shorter runs, to avoid convergence on a bad local minimum, is more reliable. Having said all that, I'd be inclined to agree with Greg that if you're only picking 200 compounds from 26000 you're probably going to do just as well with a pin. You could be slightly cleverer by only accepting the next random selection if it's above a threshold distance from anything you've already selected to avoid the pathological case he describes. Dave On Thu, Jul 17, 2014 at 5:19 AM, Greg Landrum greg.land...@gmail.com wrote: one other short thing. If this is the code you are using for the distance matrix: On Thu, Jul 17, 2014 at 12:18 AM, Matthew Lardy mla...@gmail.com wrote: dm=[] for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in the interest of time) dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True)) dm = array(dm) Then at least part of the problem is that you are generating the full matrix. I think you intend to have: dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[i+1:26000],returnDistance=True)) in there. That typo was in the original notebook that you used; I'm going to have to figure out how to fix that. -greg -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MaxMin Picker and Python
Hi all, I have been playing with the diversity selection in RDKit. I am running through a set of ~26,000 molecules to pick a set of 200 diverse molecules. I saw some examples of how to do this in Python (my variant of their script below), but the memory consumption is massive. I burned through ~15GB of memory before I killed it off. Is this about what others have seen, or should I move to doing this in C++ or Java (assuming that others have seen a significantly lower level of memory consumption)? Here is the script: from rdkit import Chem from rdkit.Chem import AllChem from rdkit import DataStructs import gzip from rdkit.Chem import Draw from rdkit.SimDivFilters import rdSimDivPickers zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None] zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims] dm=[] for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in the interest of time) dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True)) dm = array(dm) picker = rdSimDivPickers.MaxMinPicker() ids = picker.Pick(dm,26000,200) list(ids[:200]) Thanks in advance! Matt -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MaxMin Picker and Python
Hi Igor, Thanks! Maybe I am a throwback, but I prefer the command line to a GUI. Still I'll give it a whirl! :) If you are handling millions of molecules without issue; then my Python skills are really, really, rusty. Or, I shouldn't be using Python to handle this much data. :) Thanks for the info! Matt On Wed, Jul 16, 2014 at 3:31 PM, Igor Filippov igor.v.filip...@gmail.com wrote: Matthew, Two lines of shameless self-promotion: This is exactly the kind of problem for Diversity Genie - http://www.diversitygenie.com/ It is using RDKit library underneath, but wraps it in a simple, easy to use GUI front-end. Best regards, Igor On Wed, Jul 16, 2014 at 6:18 PM, Matthew Lardy mla...@gmail.com wrote: Hi all, I have been playing with the diversity selection in RDKit. I am running through a set of ~26,000 molecules to pick a set of 200 diverse molecules. I saw some examples of how to do this in Python (my variant of their script below), but the memory consumption is massive. I burned through ~15GB of memory before I killed it off. Is this about what others have seen, or should I move to doing this in C++ or Java (assuming that others have seen a significantly lower level of memory consumption)? Here is the script: from rdkit import Chem from rdkit.Chem import AllChem from rdkit import DataStructs import gzip from rdkit.Chem import Draw from rdkit.SimDivFilters import rdSimDivPickers zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None] zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims] dm=[] for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in the interest of time) dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True)) dm = array(dm) picker = rdSimDivPickers.MaxMinPicker() ids = picker.Pick(dm,26000,200) list(ids[:200]) Thanks in advance! Matt -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MaxMin Picker and Python
Hi Markus, It looks like the memory consumption (initially) drops. Still it gets out of control, likely after the file is read. Here is the file info: -rw-rw-r--. 1 mlardy mlardy 1.6M Jul 16 16:40 a.sdf.gz Looking into Patrick's suggestion, I got the first error: NameError: name 'array' is not defined Which I fixed by adding another import statement: from array import array This generated another error: TypeError: 'generator' object is unsubscriptable Sorry that I am all thumbs with Python, but thank you for the help so far! Matt On Wed, Jul 16, 2014 at 3:48 PM, Markus Sitzmann markus.sitzm...@gmail.com wrote: Hi Matt, maybe squeeze these two lines zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None] zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims] into one: zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None] because zims keeps the whole file in memory for no good reason :-) (is that sdf.gz big?) Markus On Thu, Jul 17, 2014 at 12:43 AM, Matthew Lardy mla...@gmail.com wrote: Hi Igor, Thanks! Maybe I am a throwback, but I prefer the command line to a GUI. Still I'll give it a whirl! :) If you are handling millions of molecules without issue; then my Python skills are really, really, rusty. Or, I shouldn't be using Python to handle this much data. :) Thanks for the info! Matt On Wed, Jul 16, 2014 at 3:31 PM, Igor Filippov igor.v.filip...@gmail.com wrote: Matthew, Two lines of shameless self-promotion: This is exactly the kind of problem for Diversity Genie - http://www.diversitygenie.com/ It is using RDKit library underneath, but wraps it in a simple, easy to use GUI front-end. Best regards, Igor On Wed, Jul 16, 2014 at 6:18 PM, Matthew Lardy mla...@gmail.com wrote: Hi all, I have been playing with the diversity selection in RDKit. I am running through a set of ~26,000 molecules to pick a set of 200 diverse molecules. I saw some examples of how to do this in Python (my variant of their script below), but the memory consumption is massive. I burned through ~15GB of memory before I killed it off. Is this about what others have seen, or should I move to doing this in C++ or Java (assuming that others have seen a significantly lower level of memory consumption)? Here is the script: from rdkit import Chem from rdkit.Chem import AllChem from rdkit import DataStructs import gzip from rdkit.Chem import Draw from rdkit.SimDivFilters import rdSimDivPickers zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None] zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims] dm=[] for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in the interest of time) dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True)) dm = array(dm) picker = rdSimDivPickers.MaxMinPicker() ids = picker.Pick(dm,26000,200) list(ids[:200]) Thanks in advance! Matt -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] installation problem
Hi Sergio, What version of Swig, Boost and Cmake are you using? I found that switching to Swig (v2.0.10), Boost (v1.55), and Cmake (v 2.8.10.2) resolved those types of errors when I was having them. At least that worked on CentOS 6.x. :) Matt On Thu, Jun 5, 2014 at 1:09 PM, Wong, Sergio E. wong...@llnl.gov wrote: Hi; I am trying to install the CDKit, and in particular, the python wrapper on a redhat x86_64 desktop from source. The cmake command I used is: cmake -DBOOST_ROOT=/home/wong105/usr/boost_1_47_0 -D PYTHON_LIBRARY=/usr/lib64/python2.4/config/libpython2.4.a -D PYTHON_INCLUDE_DIR=/usr/include/python2.4 -D PYTHON_EXECUTABLE=/usr/bin/python2.4 -D RDK_BUILD_SWIG_WRAPPERS=ON -D PYTHON_NUMPY_INCLUDE_PATH=/usr/lib64/python2.4/site-packages/numpy/core/include -D SWIG_DIR=/home/wong105/usr/swig-3.0.2/share/swig/3.0.2/ -D SWIG_EXECUTABLE=/home/wong105/usr/swig-3.0.2/bin/swig ../ and at the very end, the following set of warnings/errors occur: [ 97%] Built target rdChemicalFeatures [ 97%] Swig source /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:67: Warning 302: Identifier 'int64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:21: Warning 302: previous definition of 'int64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:68: Warning 302: Identifier 'uint64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:31: Warning 302: previous definition of 'uint64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:69: Warning 302: Identifier 'int_least64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:44: Warning 302: previous definition of 'int_least64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:70: Warning 302: Identifier 'uint_least64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:54: Warning 302: previous definition of 'uint_least64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:71: Warning 302: Identifier 'int_fast64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:67: Warning 302: previous definition of 'int_fast64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:72: Warning 302: Identifier 'uint_fast64_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:79: Warning 302: previous definition of 'uint_fast64_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:73: Warning 302: Identifier 'intmax_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:99: Warning 302: previous definition of 'intmax_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/GraphMolJava.i:74: Warning 302: Identifier 'uintmax_t' redefined (ignored), /home/wong105/usr/swig-3.0.2/share/swig/3.0.2/stdint.i:100: Warning 302: previous definition of 'uintmax_t'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/../BitOps.i:47: Warning 302: Identifier 'AllProbeBitsMatch' redefined (ignored) (Renamed from 'AllProbeBitsMatch ExplicitBitVect '), /home/wong105/usr/RDKit/Code/DataStructs/BitOps.h:72: Warning 302: previous definition of 'AllProbeBitsMatch'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/../BitOps.i:48: Warning 302: Identifier 'AllProbeBitsMatch' redefined (ignored) (Renamed from 'AllProbeBitsMatch ExplicitBitVect '), /home/wong105/usr/RDKit/Code/DataStructs/BitOps.h:72: Warning 302: previous definition of 'AllProbeBitsMatch'. /home/wong105/usr/RDKit/Code/JavaWrappers/gmwrapper/../BitOps.i:60: Warning 302: Identifier 'NumBitsInCommon' redefined (ignored) (Renamed from 'NumBitsInCommon ExplicitBitVect,ExplicitBitVect '), /home/wong105/usr/RDKit/Code/DataStructs/BitOps.h:219: Warning 302: previous definition of 'NumBitsInCommon'. /home/wong105/usr/RDKit/Code/RDBoost/Exceptions.h:18: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored. /home/wong105/usr/RDKit/Code/RDBoost/Exceptions.h:31: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored. /home/wong105/usr/RDKit/Code/RDBoost/Exceptions.h:46: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored. /home/wong105/usr/RDKit/Code/GraphMol/QueryOps.h:347: Warning 401: Nothing known about base class 'Queries::EqualityQuery int,ConstAtomPtr,true '. Ignored. /home/wong105/usr/RDKit/Code/GraphMol/QueryOps.h:347: Warning 401: Maybe you forgot to instantiate 'Queries::EqualityQuery int,ConstAtomPtr,true ' using %template. /home/wong105/usr/RDKit/Code/GraphMol/QueryOps.h:387: Warning 401: Nothing known about base class 'Queries::SetQuery int,Atom const *,true '. Ignored. /home/wong105/usr/RDKit/Code/GraphMol/QueryOps.h:387: Warning 401: Maybe you forgot to instantiate 'Queries::SetQuery int,Atom const *,true ' using %template. /home/wong105/usr/RDKit/Code/GraphMol/SanitException.h:26:
Re: [Rdkit-discuss] Molecule reading issues
Hi all, Thanks all! I should have been a bit more explicit. This is the context of the offending line: try { smi = rdmol.MolToSmiles(); } catch (org.RDKit.MolSanitizeException e) { System.err.println(Bad Mol Found: + smi); } I thought after reading the comments, that maybe I was being too specific. So, I tried the same try/catch block with a more general exception (which I hadn't tried before): try { smi = rdmol.MolToSmiles(); } catch (Exception e) { System.err.println(Bad Mol Found: + smi); } Nothing changed (I reached the same error): Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Hopefully my issue is a bit more clear now. There is another route around this issue, reading the input file line by line and excising those offending lines. This would work, but I would rather not have to buffer my RDKit code. :) Thanks! Matt On Sun, Jun 1, 2014 at 12:55 AM, Toby Wright toby.wri...@inhibox.com wrote: If you just want to ignore the error add a try...catch block around the offending line. Yours, Toby Wright On 31 May 2014 00:03, Matthew Lardy mla...@gmail.com wrote: Hi all, I am having this issue with the Java wrapper while trying to create a smiles string from a RWMol class object. I don't care about trying to figure out what is going wrong, I just want to bypass this record without my application closing. Any ideas? Here is the offending line: rdmol.MolToSmiles(); The error: Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Thanks in advance! Matt -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Molecule reading issues
Hi Jan, The exception should be a org.RDKit.MolSanitizeException, but the following code fails too: String smi = ; try { smi = rdmol.MolToSmiles(); } catch (Exception e) { System.err.println(Bad Mol Found: ); } Shouldn't this catch everything? Because that isn't working either. That's why I am confused as to what to do (in fact that type of catch works fine with other APIs [eg. ChemAxon]). Matt On Mon, Jun 2, 2014 at 11:12 AM, Jan Holst Jensen j...@biochemfusion.com wrote: Hi Matt, You are catching exceptions for Mol*To*Smiles(), but the exception that is giving you trouble is caused by Mol*From*Smiles(): at org.RDKit.RDKFuncsJNI.RWMol_Mol*From*Smiles__SWIG_3(Native Method) at org.RDKit.RWMol.Mol*From*Smiles(RWMol.java:422) So my guess is that 'smi' is used in downstream code with an invalid value that causes the error ? And it might be fixed by try { smi = rdmol.MolToSmiles(); } catch (org.RDKit.MolSanitizeException e) { *smi = CC; // Some valid dummy value in case of read errors.* System.err.println(Bad Mol Found: + smi); } ? Does the catch-clause work ? Does it output an error to stderr ? Cheers -- Jan On 2014-06-02 19:46, Matthew Lardy wrote: Hi all, Thanks all! I should have been a bit more explicit. This is the context of the offending line: try { smi = rdmol.MolToSmiles(); } catch (org.RDKit.MolSanitizeException e) { System.err.println(Bad Mol Found: + smi); } I thought after reading the comments, that maybe I was being too specific. So, I tried the same try/catch block with a more general exception (which I hadn't tried before): try { smi = rdmol.MolToSmiles(); } catch (Exception e) { System.err.println(Bad Mol Found: + smi); } Nothing changed (I reached the same error): Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Hopefully my issue is a bit more clear now. There is another route around this issue, reading the input file line by line and excising those offending lines. This would work, but I would rather not have to buffer my RDKit code. :) Thanks! Matt On Sun, Jun 1, 2014 at 12:55 AM, Toby Wright toby.wri...@inhibox.com wrote: If you just want to ignore the error add a try...catch block around the offending line. Yours, Toby Wright On 31 May 2014 00:03, Matthew Lardy mla...@gmail.com wrote: Hi all, I am having this issue with the Java wrapper while trying to create a smiles string from a RWMol class object. I don't care about trying to figure out what is going wrong, I just want to bypass this record without my application closing. Any ideas? Here is the offending line: rdmol.MolToSmiles(); The error: Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Thanks in advance! Matt -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today!http://p.sf.net/sfu/NeoTech ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Molecule reading issues
Hi Jan, Wow, I need to cut down on my caffeine intake. This application reads an sdf file and writes out smis. The exception that you pointed out was for code reached before this MolToSmiles block. A simple catch Exception e appears to have fixed everything. Thanks for pointing that out Jan! Matt On Mon, Jun 2, 2014 at 1:27 PM, Matthew Lardy mla...@gmail.com wrote: Hi Jan, The exception should be a org.RDKit.MolSanitizeException, but the following code fails too: String smi = ; try { smi = rdmol.MolToSmiles(); } catch (Exception e) { System.err.println(Bad Mol Found: ); } Shouldn't this catch everything? Because that isn't working either. That's why I am confused as to what to do (in fact that type of catch works fine with other APIs [eg. ChemAxon]). Matt On Mon, Jun 2, 2014 at 11:12 AM, Jan Holst Jensen j...@biochemfusion.com wrote: Hi Matt, You are catching exceptions for Mol*To*Smiles(), but the exception that is giving you trouble is caused by Mol*From*Smiles(): at org.RDKit.RDKFuncsJNI.RWMol_Mol*From*Smiles__SWIG_3(Native Method) at org.RDKit.RWMol.Mol*From*Smiles(RWMol.java:422) So my guess is that 'smi' is used in downstream code with an invalid value that causes the error ? And it might be fixed by try { smi = rdmol.MolToSmiles(); } catch (org.RDKit.MolSanitizeException e) { *smi = CC; // Some valid dummy value in case of read errors.* System.err.println(Bad Mol Found: + smi); } ? Does the catch-clause work ? Does it output an error to stderr ? Cheers -- Jan On 2014-06-02 19:46, Matthew Lardy wrote: Hi all, Thanks all! I should have been a bit more explicit. This is the context of the offending line: try { smi = rdmol.MolToSmiles(); } catch (org.RDKit.MolSanitizeException e) { System.err.println(Bad Mol Found: + smi); } I thought after reading the comments, that maybe I was being too specific. So, I tried the same try/catch block with a more general exception (which I hadn't tried before): try { smi = rdmol.MolToSmiles(); } catch (Exception e) { System.err.println(Bad Mol Found: + smi); } Nothing changed (I reached the same error): Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Hopefully my issue is a bit more clear now. There is another route around this issue, reading the input file line by line and excising those offending lines. This would work, but I would rather not have to buffer my RDKit code. :) Thanks! Matt On Sun, Jun 1, 2014 at 12:55 AM, Toby Wright toby.wri...@inhibox.com wrote: If you just want to ignore the error add a try...catch block around the offending line. Yours, Toby Wright On 31 May 2014 00:03, Matthew Lardy mla...@gmail.com wrote: Hi all, I am having this issue with the Java wrapper while trying to create a smiles string from a RWMol class object. I don't care about trying to figure out what is going wrong, I just want to bypass this record without my application closing. Any ideas? Here is the offending line: rdmol.MolToSmiles(); The error: Exception in thread main org.RDKit.MolSanitizeException at org.RDKit.RDKFuncsJNI.RWMol_MolFromSmiles__SWIG_3(Native Method) at org.RDKit.RWMol.MolFromSmiles(RWMol.java:422) Thanks in advance! Matt -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today!http://p.sf.net/sfu/NeoTech ___ Rdkit-discuss mailing listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Java, and Memory
Hi Greg, So I pulled the development version, the memory issue is indeed resolved! I had no issues building the wrappers using Cmake (v. 2.8.10.2), Boost (v.1.55), and Swig (2.0.10) on CentOS 6.4-6.5. Thanks so much Greg! Matt On Thu, May 29, 2014 at 10:44 AM, Greg Landrum greg.land...@gmail.com wrote: The code distribution on source forge hasn't been updated in over a year. Please use github instead On Thursday, May 29, 2014, Matthew Lardy mla...@gmail.com wrote: Hi Greg, So I took a development copy from sourceforge, but I am now unable to compile the C++ code. Is anyone else having this issue? Linking CXX executable graphmolIterTest libSmilesParse.so.1.2013.06.1pre: undefined reference to `yysmarts_debug' Thanks in advance! Matt On Sat, May 24, 2014 at 12:34 PM, Matthew Lardy mla...@gmail.com wrote: Thanks Greg! Yep, I built the wrappers with Swig myself. I'll update on Tuesday and give it a whirl. I noticed on another thread that someone needed the incantation to build the jar file in CentOS. I'll also pull back my conditions and post them there. Thanks again! Matt On Fri, May 23, 2014 at 9:17 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Matt, On Fri, May 23, 2014 at 6:01 PM, Matthew Lardy mla...@gmail.com wrote: Hi all, I am loving the Swig based wrappers for RDKit, Great to hear! but I keep running across one issue. Memory use. Using the SDMolSupplier I typically eat 2-3GBs of memory to process 10-20K molecules. Less great to hear. ;-) Is anyone else having this issue and, if so, has anyone solved it? I just set up a quick test, confirmed the problem, and fixed it. The fix is checked in, are you using a self-built version of the wrappers? -greg -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit, Java, and Memory
Thanks Greg! Yep, I built the wrappers with Swig myself. I'll update on Tuesday and give it a whirl. I noticed on another thread that someone needed the incantation to build the jar file in CentOS. I'll also pull back my conditions and post them there. Thanks again! Matt On Fri, May 23, 2014 at 9:17 PM, Greg Landrum greg.land...@gmail.comwrote: Hi Matt, On Fri, May 23, 2014 at 6:01 PM, Matthew Lardy mla...@gmail.com wrote: Hi all, I am loving the Swig based wrappers for RDKit, Great to hear! but I keep running across one issue. Memory use. Using the SDMolSupplier I typically eat 2-3GBs of memory to process 10-20K molecules. Less great to hear. ;-) Is anyone else having this issue and, if so, has anyone solved it? I just set up a quick test, confirmed the problem, and fixed it. The fix is checked in, are you using a self-built version of the wrappers? -greg -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit, Java, and Memory
Hi all, I am loving the Swig based wrappers for RDKit, but I keep running across one issue. Memory use. Using the SDMolSupplier I typically eat 2-3GBs of memory to process 10-20K molecules. Is anyone else having this issue and, if so, has anyone solved it? Thanks in advance! Matt -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolToPDBFile
Hi all, Hopefully this is easy, but I have an ROMol object that I want to convert into a PDB file. It seems to me that the MolToPDBFile is exactly what I should be using, but I am missing the protons in the resulting file. What's weird is that the input file has protons, and I have added a step to protonate the molecule after reading just to ensure that they are there. Any idea of how to move through this? Thanks in advance! Matt -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolToPDBFile
Hi Greg, Yep, the input molecule (a ROMol object read from a SDF) is indeed protonated. Which was why I was surprised that I didn't see any protons after I wrote out the PDB object. (I am not doing anything but using RDKit to pump out independent PDBs for each MOL record in the SD file.) I'll check to ensure that it is indeed seeing the protons, and get back to you. Thanks! Matt On Thu, May 1, 2014 at 8:44 PM, Greg Landrum greg.land...@gmail.com wrote: Hi Matt, On Thu, May 1, 2014 at 11:19 PM, Matthew Lardy mla...@gmail.com wrote: Hopefully this is easy, but I have an ROMol object that I want to convert into a PDB file. It seems to me that the MolToPDBFile is exactly what I should be using, but I am missing the protons in the resulting file. What's weird is that the input file has protons, and I have added a step to protonate the molecule after reading just to ensure that they are there. Any idea of how to move through this? If the Hs are there, they should end up being written to the PDB. Here's an example: In [5]: m = Chem.MolFromSmiles('C') In [6]: m=Chem.AddHs(m) In [7]: print Chem.MolToPDBBlock(m) HETATM1 C1 UNL 1 0.000 0.000 0.000 1.00 0.00 C HETATM2 H1 UNL 1 0.000 0.000 0.000 1.00 0.00 H HETATM3 H2 UNL 1 0.000 0.000 0.000 1.00 0.00 H HETATM4 H3 UNL 1 0.000 0.000 0.000 1.00 0.00 H HETATM5 H4 UNL 1 0.000 0.000 0.000 1.00 0.00 H CONECT12345 END Are you sure that the molecule actually has had Hs added? You can check this with the Debug() method: In [8]: m.Debug() Atoms: 0 6 C chg: 0 deg: 4 exp: 4 imp: 0 hyb: 4 arom?: 0 chi: 0 1 1 H chg: 0 deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0 2 1 H chg: 0 deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0 3 1 H chg: 0 deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0 4 1 H chg: 0 deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0 Bonds: 0 0-1 order: 1 conj?: 0 aromatic?: 0 1 0-2 order: 1 conj?: 0 aromatic?: 0 2 0-3 order: 1 conj?: 0 aromatic?: 0 3 0-4 order: 1 conj?: 0 aromatic?: 0 Note: you mention Hs being removed on reading your input files. If you want to preserve the Hs that are present in the input file, you can do this from a mol block/file like this: In [14]: print mb RDKit 5 4 0 0 0 0 0 0 0 0999 V2000 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0 0.0.0. H 0 0 0 0 0 1 0 0 0 0 0 0 0.0.0. H 0 0 0 0 0 1 0 0 0 0 0 0 0.0.0. H 0 0 0 0 0 1 0 0 0 0 0 0 0.0.0. H 0 0 0 0 0 1 0 0 0 0 0 0 1 2 1 0 1 3 1 0 1 4 1 0 1 5 1 0 M END In [15]: nm = Chem.MolFromMolBlock(mb,removeHs=False) In [16]: nm.GetNumAtoms() Out[16]: 5 The PDB parsers take a similar argument. -greg -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SD Tag Reording
Hi all, I've noticed that I am unable to reorder SD tags in RDKit. It appears that no matter what I try, they get reordered in alphabetical order. Is anyone else experiencing this behaviour? Thanks! Matt -- Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit Java/Swig Boost
Thanks Oriol and Greg! I wasn't aware of needing to do that, but I will give it a whirl. :) Thanks so much for your help! Matt On Tue, Jan 14, 2014 at 2:34 AM, Greg Landrum greg.land...@gmail.comwrote: On Tue, Jan 14, 2014 at 3:37 AM, Oriol López Massaguer olo...@imim.eswrote: I use RDKit from Scala/Java and experienced the same issue. Despite having GraphMolWrap.so in the LD_LIBRARY_PATH the code fails with a UnsatisfiedLinkError. I've solved it by loading explicitly the native library: System.load(/opt/collector/lib/libGraphMolWrap.so) I don't know if there is some other way to solve it. For what it's worth, this was also necessary to make the RDKit knime nodes work on some systems. I'm not enough of a java expert to know exactly why that is. Starting out by adding an explicit path to the System.load call definitely cannot hurt though. -greg -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDKit Java/Swig Boost
Hi all, First Swig and Java: While I was able to build the Java wrappers (2013-09-01-beta), and the resulting jars passed the tests (at least the Java wrappers did) I am being frustrated by the following error: Exception in thread main java.lang.UnsatisfiedLinkError: org.RDKit.RDKFuncsJNI.RWMol_MolFromMolFile__SWIG_2(Ljava/lang/String;) Now I have org.RDKit.jar in my classpath, and can compile this application without issue, but I am unsure how to proceed getting this app to run. I have the path to org.RDKit.jar, libGraphMolWrap.so, and GraphMolJava.i (which are all the same path) set in LD_LIBRARY_PATH. Even with this, I get the same error. I must be missing something simple, but I couldn't figure out what it is after alot of searching. At least I hope that I have just missed something simple. Boost: I should add, that trying to move from the 2013-09-01-beta to the actual release was impossible (I have been trying to make the jump for weeks). I was unable to get cmake to find my copy of boost (even when installed from an rpm). This is a bit frightening, as the beta accepted it after alot of effort. Building boost as described on the page didn't fix anything, as the incantations to override cmakes search failed. Removal of my rpm installed boost, in an attempt to force cmake to use my user compiled version, did nothing as well. I have seen this has been a continual issue for users, reading through the mailing list, and I should add my voice to those. I don't know who's fault it is (CentOS's for rolling with a prehistoric version of boost, CMake for just being frustrating in general, or Boost for not building the FindBoost.cmake from a user compile). If anyone has a workaround, I am all ears. :) Build environment: CentOS 6.4 (fresh install) Boost 1.41 Swig 2.0.10 GNU compilers Thanks in advance! Matt -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] install RDKit on CentOS
Hi Yingfeng, Forgive me if I am jumping in without knowing all that you have tried, but did you try pulling boost with Yum? I was able to get RDKit, and the Java wrappers, to cleanly compile on CentOS 6.4 with that flavor of boost. Matt On Mon, Jul 15, 2013 at 9:39 PM, Greg Landrum greg.land...@gmail.comwrote: On Mon, Jul 15, 2013 at 7:49 AM, Yingfeng Wang ywang...@gmail.com wrote: I also tried boost 1.49.0, and met same problem. If necessary, I can try 1.51. hmm, ok, so much for that idea. Let's try something else. On Sunday, July 14, 2013, Yingfeng Wang wrote: I failed to install RDKit on CentOS. The error message is [ 85%] Built target MolChemicalFeatures Linking CXX executable testSLNParse [ 85%] Built target rdSLNParse /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::__detail::_List_node_base::_M_unhook()@GLIBCXX_3.4.15' /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::overflow_error::~overflow_error()@GLIBCXX_3.4.15' /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@GLIBCXX_3.4.15' /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::ctypechar::_M_widen_init() const@GLIBCXX_3.4.11' /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::invalid_argument::~invalid_argument()@GLIBCXX_3.4.15' /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so: undefined reference to `std::__detail::_List_node_base::_M_transfer(std::__detail::_List_node_base*, std::__detail::_List_node_base*)@GLIBCXX_3.4.15' collect2: ld returned 1 exit status make[2]: *** [Code/GraphMol/SLNParse/testSLNParse] Error 1 make[1]: *** [Code/GraphMol/SLNParse/CMakeFiles/testSLNParse.dir/all] Error 2 This error is happening during the build of the SLN parser. The easiest solution, if you aren't planning on using SLN, is to disable that part of the build by providing cmake with the argument: -DRDK_BUILD_SLN_SUPPORT=OFF and then re-running make. If you want to find the real solution, I need a bit more information. It looks like there is something strange/wrong about your boost.regex build (it looks like it could be using a different version of libstdc++), so we're going to have to track that down. Start with sending the output of this command: ldd /home/yingfeng/software/boost/boost_1_54_0/stage/lib/libboost_regex.so next look in the lib/ directory of wherever you're doing the RDKit build and send the result of running ldd on one of the .so files in that directory. finally, do the following: VERBOSE=1 make testSLNParse and send the part of the (very long) output that includes the actual build command and the error message. This will be towards the bottom of the output and the build command will look something like this (the paths will of course be completely different for you): /usr/bin/c++ -O3 -DNDEBUG -Wl,-search_paths_first -Wl,-headerpad_max_install_names CMakeFiles/testSLNParse.dir/test.cpp.o -o testSLNParse -L/usr/local/lib ../../../lib/libSLNParse.2013.06.1beta1.dylib ../../../lib/libSmilesParse.2013.06.1beta1.dylib ../../../lib/libSubstructMatch.2013.06.1beta1.dylib ../../../lib/libGraphMol.2013.06.1beta1.dylib ../../../lib/libRDGeometryLib.2013.06.1beta1.dylib ../../../lib/libRDGeneral.2013.06.1beta1.dylib /usr/local/lib/libboost_regex.dylib ../../../lib/libDataStructs.2013.06.1beta1.dylib ../../../lib/libRDGeneral.2013.06.1beta1.dylib -lpthread -greg -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss