On Jan 17, 2013, at 6:20 PM, Kirill Okhotnikov wrote:
> I decided to implement bzip2 pack/unpack functionality in open-babel 
> (#91 bzip2 compression/decompression).

What does the "#91" mean? It's not the Open Babel bug id, and a search
of the mailing list finds nothing matching "bzip" or "bzip2".

I do not think bzip2 support is important. I have rarely come across
people using it for cheminformatics data. For example, while PubChem,
ChEMBL and others release their data sets with gzip compression, I
don't know of anyone who releases bzip2 files.

Which data sets do you use which are big enough that the better
bzip2 compression becomes worthwhile?

As you found out, bzip2 doesn't support random seeking. It *can*
be emulated, which Python's bz2 module does, but "depending on
the parameters the operation may be extremely slow."


Personally, I think bzip is no longer a useful format. If you're
willing to take the extra CPU time then use the LZMA-based methods,
like .xz. Here's how the Python source distribution compresses
with each of the three methods:

        • Gzipped source tar ball ~ 16 MB
        • Bzipped source tar ball ~ 14 MB
        • XZ compressed source tar ball ~ 11 MB


I also think that supporting Boost is a nuisance. However, if
only some Boost functionality is needed, why not just include its
header files?


> I think, that it will be good idea to have boost library to be required by 
> the project (connected permanently). Some other boost libraries can be 
> useful. For example, Program Options, Geometry, RegExp. In the future 
> developers can easily use this powerful well known library.

Developers can already use Boost, by installing it themselves.

The only reason for switching to Boost is if Open Babel would make
effective use of what Boost provides. But the examples you list aren't
things which would easily change:

  - who would rewrite the options parser to use the Boost one?

  - why replace the existing geometry code with an alternative?

  - what advantages does RegExp have over C++'s <regex>? (I see
       that src/formats/gamessukformat.cpp already uses that
       the regex library that the C++ compiler provides.)

  - how much code would break?

Now, there are answers to this. For example, perhaps the Boost
geometry code makes parts of Open Babel 3x faster, or perhaps
20 of the format parsers could be shortened by 90% while being
more maintainable. But given the known work in rewriting those
parts of the code, and the known difficulty of supporting
Boost - something I've experienced myself - it's not as easy as
saying that other people might find it useful.

> 3) Can somebody help me to compile and test the system under MS Windows?

If all else fails, you might look into using an Amazon instance
running MS Windows, then install Visual Studio Express to compile
C++ code from the command-line.

Cheers,

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to