On Mon, 2007-01-29 at 14:03 +0200, Nina Jeliazkova wrote: > Hello, > > Egon Willighagen <[EMAIL PROTECTED]> wrote: > > > On Sunday 28 January 2007 21:58, Rajarshi Guha wrote: > > > Invariably I will know the names of my SD tags, so I can always create a > > > temporary HashMap and extract the relevant properties via getProperty() > > > and populate the HashMap and then send it setSdFields(). > > > > Well, I guess my hesitation comes from the fact that I have to know about SD > > > internals when wanting to read or store simple molecular properties... I > > understand the step is not large, the math rather simple, but still... CDK > > has set/getProperty() to store properties... why add another layer? > > > > I aggree with Egon, adding another layer doesn't seem an elegant approach. Few > more thoughts:
To a large extent I agree. My objections are from an 'ease of use' point of view (which is debatable :) > 1)User calculates some descriptors and saves them as SDF file properties. Then > at some moment this SDF file is read and properties loaded - how they should > be loaded - as special_SDF_properties or as regular properties? > > 2)Imagine another file format appears that is able to store properties - > should we introduce another my_file_special_Properties? > That's not hypothetical - I already have implemented classes extending > DefaultChemObjectWriter able to store / read compounds (as SMILES) along with > properties in comma delimited , tab delimited and MSExcel XLS files. That was > done by user request - is CDK team interested in these classes? I would be interested in seeing an XLS output writer - will it be able to run on Linux? Or does it need Windows DLL's? > 3)My problem with SDF fields are that they come with many different names - > for example CAS number can be "CAS RN", "CAS", "CAS#" , "CAS Registry Number", > etc. The same with chemical names, smiles, descriptor names. Once properties > are loaded, there is no easy way to recognise which field means what, without > asking the user. When working with multiple SDF files coming from different > sources, and trying to combine information, it becomes a nightmare. I agree - but this is going to happen even if if we place all the SD tag values as molecule properties directly (i.e., the current situation) > Some solutions to be discussed: > - introduce something like "translator facility" to perform translation > property_name_in_file -> property_name_to_use_in_this_software > > - separate properties into different Hashmaps - like identifiers (cas, > names), descriptors, measured properties, etc. I don't think this is a great idea since this leads to a much more specialization than simply saying that these sets of properties are derived from an SD file. I think this is especially important since an SDF can contan arbitrary properties - I don't see any way of making sense of those properties other than knowing what the tags mean and I think this drawback is a feature of the SD format (lack of dictionaries). My original point in this was to suggest a way to make clear that a set of properties did come from SD tags. I realize that this is a lazy approach and I understand that this leads to a specialization of the properties - but it seems to me that SD properties are special (as opposed to properties that I might set when manipulating a molecule in my code) In this sense, then, one can view the set of tags in an SD file as a 'property' of that molecule - justifying inclusion of all the SD tags as a single property in the molecule. The fact that this 'property' has several properties within it is a feature of the SD tags property. ------------------------------------------------------------------- Rajarshi Guha <[EMAIL PROTECTED]> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- A hacker does for love what others would not do for money. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

