Re: [boost] Filesystem library name
Rene Rivera wrote: Not totally right... It should be: libboost__.lib. Putting the version at the end is somewhat standard. And in my current case of OpenBSD required. That may be standard on OpenBSD, but it's not on Windows, where the last part of the filename is used to tell the type of the file. On Windows, library files usually end in ".lib". BTW, Boost already uses (excruciatingly) long pathnames to select among different versions of the same library. I suggest the adoption of a fully "tagged" name scheme only for those files, like DLLs or shared libraries, that are probably going to be installed in some specific folder (on the PATH, for example). For example, although on my system the STLport DLL is named stlport_vc750.dll, a name that carries both the platorm "vc7" and the version "50", the corresponding library is simply named stlport_vc7.lib. I believe the even the "vc7" could have been removed from the lib's name if a pathname scheme like Boost had been implemented. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] New version of UTF library (was Re: UTF library available for review)
Hi, I just uploaded here http://groups.yahoo.com/group/boost/files/utf/ a new version of the UTF library. The changes are: 1) Added missing typename keywords and used BOOST_DEDUCED_TYPENAME in every applicable place 2) Added safety checks on buffer size. (Thanks to Dietmar Kuehl) 3) Now the state type is not assumed to be an integer type. In order to access the state two unqualified free functions get_state() and set_state() are used instead. File utf_config.hpp provides a default *non-portable* implementation that relies on reinterpret_cast, which should be specialized for each platform. (Thanks to Dietmar Kuehl) The suite now compiles correctly on gcc cygwin yet it fails to link because it complains about missing wchar_t specializations. Can anyone help me on this? It also seems that gcc does not provide specialization for any library class (basic_filebuf, char_traits, etc.) for internal types different from char and wchar_t. Could anyone confirm this? This could be a problem if the user want to use UTF-32 facets but its wchar_t is 16 bit wide. I can easily provide an implementation of char_traits for implementations lacking it. Should I do it? Alberto Barbati wrote: > Dietmar Kuehl wrote: >> Alberto Barbati wrote: The problem is that if char does not have 8 bits, then I cannot be sure that the underlying implementation reads from a file 8 bits at a time. Please correct me if I'm wrong on this point. That requirement is essential for the UTF-8 encoding. Has anyone any comment about this? I don't have access to any implementation where char has more than 8 bits to verify. There already exist a facility to select the correct facet according to the byte order mark. It's very simple to use: std::wifstream file("MyFile", std::ios_base::binary); boost::utf::imbue_detect_from_bom(file); that's it. I have seen this possibility and I disagree that it is very simple to use for several reasons: - There is at least one implementation which does not allow changing the locale after the file was opened. This is a reasonable restriction which seems to be covered by the standard (I thought otherwise myself but haven't found any statement supporting a different view). Thus, changing the code conversion facet without > closing the file may or may not be possible. Closing and reopening > a file may also be impossible for certain kinds of files. I guess you are mentioning 27.8.1.4, clauses 19 (description of function filebuf::imbue): "Note: This may require reconversion of previously converted characters. This in turn may require the implementation to be able to reconstruct the original contents of the file." That may indeed be a problem. In my humble opinion, the use of "may" is quite unfortunate... it seems that implementation need not reconvert previous characters and leaves unspecified (not even "undefined" nor "implementation defined") what happens if the implementation cannot perform the reconstruction. In which way is imbue implemented in the implementation you were mentioning? I looked deeper into the question. Of the three implementations I checked (VS.Net/Dinkumware, STLport, gcc 3.2 prerelease) none of them implement clause 19. gcc even has an explicit comment about this. All of them allows imbue() in the middle of a file. Which implementation where you talking about? I am considering writing a mega-facet that automatically adapts to the file encoding according to the BOM. It could easily be done for UTF-32 as the conversion code is already factored out of the facet classes (splitted into file utfXX_algo.hpp and utf32_strategy.hpp). I plan to do the same factorization for UTF-16 facets also; it is already done for facet utf8_utf16. However, please bear in mind that such a facet can't be as performant as the little ones, because each of do_in/do_out/do_length functions have to be a large switch over the several implementations and such a switch need to executed each of the several times do_XXX is called for each character. BTW, this mega-facet is ok when reading from a file. How should it behave when writing? Will it be ok to return error until a encoding is chosen? In fact, reading *and* writing at the same time to a Unicode file is IMHO a sure way to disaster, unless writing always occur at end of file with std::ios_base::app. I am considering adding stream classes, derived from std::basic_* classes (or maybe from boost::filesystem classes?) as a conveniency. What do you think? Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Next revision of boost::thread
Stefano Delli Ponti wrote: From: "David Abrahams" <[EMAIL PROTECTED]> "William E. Kempf" <[EMAIL PROTECTED]> writes: That's a good idea. So would users prefer new exception types here, or should I use the std:: exceptions? IMO, it's always safer to use an exception type which provides more-specific information. Agreed. And we should keep coherence with the filesystem library. Agreed. Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Next revision of boost::thread
William E. Kempf wrote: * Are there concerns about using conditional compilation and optional portions of the library, as POSIX does? I believe this is the only way Boost.Threads and the C++ standard will be able to provide "portable" threading libraries that don't restrict implementation to a least common denominator approach. What about using property maps? (I mean the Boost Property Map Library). * Are there issues with throwing std::invalid_argument for both invalid and unsupported values? Should I define Boost.Threads specific exceptions instead, seperating out the two exception types? If you want to use std:: exception classes, for "unsupported" value you could also use std::domain_error. Defining two new classes in the boost namespace is also an option. Beyond this, I'd appreciate any other feedback as well. I'm quite busy now, but I'm enjoying the thread library so much that I will be pleased help you as much as I can. Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: UTF library available for review
Dietmar Kuehl wrote: Alberto Barbati wrote: One can use a char traits class different from std::char_traits, that defines a suitable state type. This is not really viable due to 27.8.1.1 paragraph 4: An instance of basic_filebuf behaves as described in lib.filebuf provided traits::pos_type is fpos. Otherwise the behavior is undefined. Thanks for pointing that out, I missed it. However it's not really a problem, you can add a pos_type typedef to our test_traits, like this: template struct test_traits : public std::char_traits { typedef boost::uint32_t state_type; typedef std::fpos pos_type; }; It would be possible to create a conversion stream buffer (which is probably a good idea anyway) which removes this requirement but even then things don't really work out: stream using a different character traits are not compatible with the normal streams. I haven't worked much with wide character and don't know how important it is to have eg. the possibility of using a wide character file stream and one of the standard wide character streams (eg. 'std::wcout') be replacable. I think it is crucial that the library supports 'std::mbstate_t' although this will require platform specific stuff. It should be factored out and documented such that porting to new platform consists basically of looking up how 'std::mbstate_t' is defined. That's a better argument. I will think about it. As I said, I'm definetely not against adding acceessors to mbstate_t, I just have to think what's the better way to do it. I forgot to say in my previous post that this version of the library only supports platforms where type char is exactly 8 bits. This assumption is required because I have to work at the octet level while reading from/writing to a stream. I don't see why this would be required, however. This would only be necessary if you try to cast a sequence of 'char's into a sequence of 'wchar_t's. Converting between these is also possible in a portable way (well, at least portable across platforms with identical size of 'wchar_t' even if 'char' has different sizes). The problem is that if char does not have 8 bits, then I cannot be sure that the underlying implementation reads from a file 8 bits at a time. Please correct me if I'm wrong on this point. That requirement is essential for the UTF-8 encoding. Such decision is very strong, I know. Yet, one of the main problems with the acceptance of Unicode as a standard is that there are too many applications around that uses only a subset of it. For example, one of the first feedback I got, at the beginning of this work, was "I don't need to handle surrogates, could you provide an optimized facet for that case?". The answer was "Yes, I could, but I won't". As I said, I don't have strong feelings about this (and I have implemented such a facet myself already anyway...). However, note that I requested something quite different: I definitely want to detect if a character cannot be represented using the internally used character. In fact, I would like to see this happen even for a 16 bit internal type because UTF-16 processing is considerably more complex than UC2 processing and I can see people falling into the trap of testing only cases where UC2 is used. That is, the implicit choice of using UTF-16 is actually a pretty dangerous one, IMO. I know it's dangerous, but I prefer that way. I would like this to be "The UTF Library", not just some "conversion library". I also want to support the Unicode standard to its full extent. Supporting a conversion not covered by Unicode, just because someone finds it useful, does not go in that direction. If this position would stop my proposal to be accepted in Boost, I would just retire it. There already exist a facility to select the correct facet according to the byte order mark. It's very simple to use: std::wifstream file("MyFile", std::ios_base::binary); boost::utf::imbue_detect_from_bom(file); that's it. I have seen this possibility and I disagree that it is very simple to use for several reasons: - There is at least one implementation which does not allow changing the locale after the file was opened. This is a reasonable restriction which seems to be covered by the standard (I thought otherwise myself but haven't found any statement supporting a different view). Thus, changing the code conversion facet without > closing the file may or may not be possible. Closing and reopening > a file may also be impossible for certain kinds of files. I guess you are mentioning 27.8.1.4, clauses 19 (description of function filebuf::imbue): "Note: This may require reconversion of previously converted characters. This in turn may require the implementation to be able to recon
[boost] Re: UTF library available for review
First of all, thanks to everybody for your feedback. I realized that my message was a bit arrogant about the lack of interest... I apologize for that and I promise I'll give my best to get this library to boost standards! Dietmar Kuehl wrote: - The 'state' argument for the various facet functions is normally accessed as a plain integer in your code. However, eg. 'std::mbstate_t' is not an integer at all. In fact, it is implementation defined and can thus be something entirely different on each platform. I'd suggest to provide some from of accessor functions to read or write objects of this type, possibly in the form of some traits class. This could then be used to cope with differences and/or more complex organization of the state type. That's a good point. On my platform (Win32) mbstate_t indeed is a typedef to "int" and thus satisfies all requirements (i.e.: being an integer type capable of containing an unsigned 21-bit value). I was aware that this is not the case on other platforms, but I made the assumption anyway simply because the user is not really required to use mbstate_t. One can use a char traits class different from std::char_traits, that defines a suitable state type. For example: template struct MyTraits : public std::char_traits { typedef boost::uint32_t state_type; }; then use basic_fstream > instead of basic_fstream. [I just tried that and found a typo in correctness_test.hpp that prevents the test suite to compile, but nothing too serious] Of course I have to document this ;) If you believe that harnessing with the char traits class is undesirable, or simply "too much", I see no problems in adding those accessors function as you suggest. - Falling back to UTF-16 in cases where the Unicode characters may not fit into the words is one possible approach to deal with the characters. Another approach is to just indicate an error. For example, I know that certain XML-files I'm using exclusively store ASCII characters and there is no need to use a different internal character type than 'char'. If I ever come across a non-ASCII character I don't really want to have it encoded in something like UTF-8 but I want to fail reading the file (and possibly retry reading using a bigger character type). I would appreciate if such a possibility would be incorporated, too (this is, however, nothing I feel too strongly about). I forgot to say in my previous post that this version of the library only supports platforms where type char is exactly 8 bits. This assumption is required because I have to work at the octet level while reading from/writing to a stream. That said, I have deliberately decided not to allow "char" as the internal type. The internal type must be an integral type able to represent an unsigned 16-bit value (for UTF-16) or an unsigned 21-bit value (UTF-32). The choice between UTF-16 and UTF-32 as the internal encoding is done at compile-time based on the sizeof of the internal char type, which must be either 2 or 4. Such decision is very strong, I know. Yet, one of the main problems with the acceptance of Unicode as a standard is that there are too many applications around that uses only a subset of it. For example, one of the first feedback I got, at the beginning of this work, was "I don't need to handle surrogates, could you provide an optimized facet for that case?". The answer was "Yes, I could, but I won't". Yours is a very special case. Frankly, I'd rather not support it. In fact, it would be extremely very simple to do: you just take the facet declared in file detail/degenerate.hpp and change a few lines. However, it would be out of the intent of this library, which is to provide UTF conversion according to Unicode 3.2 requirements, no more, no less. - In the context I want to use the facets [at least those I'm implementing] I don't really want to bother about the details of the encoding. That is, I just want to 'imbue()' an appropriate 'std::locale' object to the stream and have the facet figure out what encoding is used, especially when reading from some file. The basic idea is that each file is either started by a byte order mark from which UTF-16BE or UTF16LE can be deduced or it is in UTF-8. The encoding could eg. be stored in typical 'std::mbstate_t' objects (these are often a struct with a 'wchar_t' and a count or something like this). To enable something like this, it would be helpful if the actual conversion functions were actually separated from the facets: The facets would just call the corresponding conversion functions. The actual conversion is, apart from the state argument, entirely stateless and there is no need to bind it to any facet. There already exist a facility to select the correct facet according to the byte order mark. It's very simple to use: std::wifstream file("MyFile", std::ios_base::binary); boost::utf::imbue_detect_from_bom(file); that's it
[boost] Re: UTF library available for review
Alberto Barbati wrote: * a comprehensive test suite (with Jamfile) I almost forgot! The test suite requires the filesystem library. Special thanks to Beman Dawes for it! Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] UTF library available for review
Hi Boosters, I have put in the Boost file section the first version of my UTF library. You can find it here: http://groups.yahoo.com/group/boost/files/utf/ A couple of months ago, I posted a message to check if there was interest in such a library and I got just one answer from Vladimir Prus (hi, Vladimir!). I hope that in front of a full-blown and working library I might get more attention. What you will find in the library: * codecvt facets for the following external encodings: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. The facets are templated, in order to avoid any reference to the platform wchar_t type (if present). The internal encoding can be either UTF-16 or UTF-32. A convenience interface is provided to automatically select the internal encoding according to the size (2 or 4 bytes) of the character type used internally. The facets will perform correct handling of the following Unicode features: - all 17 character planes - non-characters (U+XFFFE, U+X, U+FDD0 - U+FFEF) - UTF-16 surrogates pairs (both externally and internally) - UTF-8 non-shortest forms (externally) * a convenience interface to autodetect the correct facet according the file signature (BOM) * a comprehensive test suite (with Jamfile) * a little example (with Jamfile) What you won't find in the library: * documentation :-( I'm working on it!!! I swear. Give me some more time! (and a little feedback) * facets for UCS-2 or UCS-4 (these encoding are very similar to UTF-16 and UTF-32 but are *not* the same!) * facets that uses UTF-8 internally (this is too complex and won't work portably, believe me!) Compatibility The test suite and the example have been tested with VS.NET with both the native STL and STLport 4.5.3. However, STLport have major bugs in the codecvt interface and in the basic_filebuf implementation, so in order to compile and run the wchar_t and uint16_t tests you need to apply a patch that is provided with the library and *rebuild* STLport. The uint32_t test won't compile in any case due to an incomplete implementation of the entire locale suite (I am going to contact Boris Fomitchev in order to see how we can make a patch). The test suite will compile and run correctly even in presence of the /Zc:wchar_t option (that's why there are a wchar_t and a uint16_t test in the first place). The facets that have UTF16 internally were a major challenge. I provided two different implementations. The default one is a "compatibility" one that should work with most STL implementations (including VS.NET and STLport that have a minor flaw in them :-( ). The other one should be a little more performant but I don't know on how many compiler it will work. The alternative implementation can be selected from file config.hpp. In that file you can also find a #define that should be changed if your implementation correctly implements Library Issue 75 about the prototype of function do_length(). I hope you enjoy this library and find it useful. According to the feedback I receive, I will go on writing a decent document in view of a formal submission. Thanks in advance for your time and help, Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: filesystem feature request: temporary path and files
Thomas Witt wrote: Hi Alberto, Alberto Barbati wrote: Hi, first of all, I want to thank Beman Dawes and all others that contributed with the design and development of the Filesystem library. It's a wonderful piece of work. I just would like to propose a couple of additions that I believe are very useful. Both features regard temporary files. First proposal: I propose to add a function with a signature of this kind: path generate_path_for_temp_file(); IIRC functions like this are considered a bad idea. They are subject to race conditions and a potential security problem. I agree with you, that the functionality would be really helpfull. The usual solution to the race condition problem would be to have a function that returns a stream. See mkstemp on POSIX. Win32 has a similar facility. Then what do you think about my second proposal? (the tempstream class that was in the attachment.) The best thing to do would be to have that one implemented as a "primitive" and not implementing generate_path_for_temp_file() at all. However, that is not easily achievable in a portable way, because the interface of std::basic_fstream takes a pathname and not a stream id or FILE* :-( Moreover, there's no way to specify that the file is to be open exclusively, so complete security will never be granted if we derive from std::basic_fstream. The weak link here is the std::basic_filebuf class, is there someone out there who wants to write a (possibly portable) replacement of basic_filebuf that overcome these limitations? It seems an interesting but huge task, to me. By the way, the Win32 facility that you are talking about is GetTempFileName()? That function creates the file but does not open it, so it's different from mkstemp() that also opens the file exclusively. So GetTempFileName() is only safe against non-malicious race conditions. On the other hand, mkstemp() can easily be downgraded to GetTempFileName() by just keeping the pathname and closing the file ;) Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] filesystem feature request: temporary path and files
Hi, first of all, I want to thank Beman Dawes and all others that contributed with the design and development of the Filesystem library. It's a wonderful piece of work. I just would like to propose a couple of additions that I believe are very useful. Both features regard temporary files. First proposal: I propose to add a function with a signature of this kind: path generate_path_for_temp_file(); the effect would be to generate a new (potentially unique) path suitable to be used for a temporary files. A sample POSIX implementation could be as simple as: #include path generate_path_for_temp_file() { char tmp_name[L_tmpnam]; return path(tmpname(tmp_name), native); } but there could also be platform-specific implementations. For example, a Win32 sample implementation could use the GetTempPath/GetTempFileName to create the path in the correct directory as in: #include path generate_path_for_temp_file() { char tmp_dir_path[MAX_PATH]; char tmp_name[MAX_PATH]; if(GetTempPathA(sizeof(tmp_dir_path), tmp_dir_path) == 0 || GetTempFileNameA(tmp_dir_path, "$$$", 0, tmp_name) == 0) { boost::throw_exception( filesystem_error("unable to generate path for temporary file", system_error)); } return path(tmp_name, native); } Open issues (to be discussed): 1) on Win32, GetTempFileName also create an empty file with the returned name. Other platforms also have functions that atomically generates the name *and* creates a file with such name. Should there be a postcondition about the existence (or non-existence) of such a file? 2) Another useful signature could be: path generate_path_for_temp_file(const path& location_hint) that would use the specified path as a hint (in an unspecified platform-dependent way) to generate the path. For example, one may want to generate a temporary file in a specified directory or physical drive, overriding the system default, if any. On Win32 such a function could be implemented as: #include path generate_path_for_temp_file(const path& hint) { char tmp_name[MAX_PATH]; if(GetTempFileNameA(hint.native_directory_string().c_str(), "$$$", 0, tmp_name) == 0) { boost::throw_exception( filesystem_error("unable to generate path for temporary file", system_error)); } return path(tmp_name, native); } Second proposal: a stream class that encapsulates a temporary file, that is a stream based on a file that is automatically deleted in the stream's destructor. I am attaching a sample implementation. It's implemented as a simple wrapper around fs::basic_fstream and generate_path_for_temp() above. Cheers, Alberto #ifndef BOOST_FILESYSTEM_TEMPSTREAM_HPP #define BOOST_FILESYSTEM_TEMPSTREAM_HPP #include namespace boost { namespace filesystem { path generate_path_for_temp_file(); path generate_path_for_temp_file(const path& hint); template < class charT, class traits = std::char_traits > class basic_tempstream : public basic_fstream { public: // ctor always opens file explicit basic_tempstream(std::ios_base::openmode mode = std::ios_base::in|std::ios_base::out) : m_path(generate_path_for_temp_file()) { basic_fstream::open(m_path, mode); } explicit basic_tempstream(const path& hint, std::ios_base::openmode mode = std::ios_base::in|std::ios_base::out) : m_path(generate_path_for_temp_file(hint)) { basic_fstream::open(m_path, mode); } virtual ~basic_tempstream() { remove(m_path); } // intentionally hide inherited open() void open(std::ios_base::openmode mode = std::ios_base::in|std::ios_base::out) { basic_fstream::open(m_path, mode); } const path& path() const { return m_path; } private: filesystem::path m_path; }; typedef basic_tempstream tempstream; # ifndef BOOST_NO_STD_WSTRING typedef basic_tempstream wtempstream; # endif } // namespace filesystem } // namespace boost #endif // BOOST_FILESYSTEM_TEMPSTREAM_HPP ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: #pragma once
Gennaro Prota wrote: Hmm... frankly I haven't used it anymore. I'm under the impression they have fixed it now, but last time I checked it had a lot of bugs. It was easy, for instance, to end up including the same file twice if it was reached through different paths (e.g.: #include "subdir/file.h" in A.cpp and #include "file.h" in subdir/file2.h). Usually one doesn't notice the error, because he uses both the pragma and the canonical include guard, but that means of course that the speed gain comes to nothing. Probably it was VC++ 5.0 though. You are missing the point here. "#pragma once" is not really "functional", in the sense that it's not (or should not be) used *alone* to realize the "include me once" effect. That effect is better achieved by the canonical include guards that should be used anyway (and *will* be used anyway in portable code). "#pragma once" is just an optimization issue. If a file is included a second time in the same TU, a dumb compiler will re-open it and give it to the preprocessor who strips its contents entirely because the include guard is already defined. Re-opening + preprocessing takes little but significant time. With the "#pragma once" the programmer just gives an hint to the compiler that re-opening is unnecessary and ignores the #include directive immediately. You see, there *is* a speed gain even in the presence of canonical include guards. Of course, a smarter compiler like g++ will recognize the canonical include guards and deduce that re-opening is unnecessary without any explicit hint given by the programmer. [...] In effect, the whole story about file inclusion should be the other way round: any source file is included at most once for each TU, unless the programmer requires otherwise ;-) I agree, but we can't give the burden of avoiding multiple inclusions to the "user". Realistically, with large projects such intent is hardly achievable by anyone. Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: #pragma once
Gennaro Prota wrote: [snip] The fact that it is useful to reduce compilation time is just a result of some compiler writers' attitude to prefer encouraging the use of non-standard features rather than e.g. recognizing the include guard idiom and optimize away the subsequent #includes (as for instance g++ does). I don't want to start a religion war about which compiler is better or smarter or encourages a better style. I agree that recognizing the include guard idiom is a good thing, probably the best thing to do for a compiler. Yet I'm stuck with MSVC anyway, as many other programmers out there, and #pragma once may have a significant effect on compilation time on that compiler. I just suggested a way to allow other compilers (there may be fewer than I may think of, obviously) to use the #pragma and also to be a little more descriptive. It's just a matter of a three line addition in config/compiler/visualc.hpp: #if _MSC_VER >= 1020 #define BOOST_HAS_PRAGMA_ONCE #endif for example also Metrowerks CodeWarrior supports #pragma once, so the #define could also be added to config/compiler/metrowerks.hpp. Cheers, Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] #pragma once
Hi Everybody, I saw in a lot of boost header files the following lines: #if _MSC_VER >= 1020 #pragma once #endif or even better: #if _MSC_VER+0 >= 1020 #pragma once #endif But not only MS compilers have the pragma once, which is in my opinion very useful. Why don't we define a BOOST_HAS_PRAGMA_ONCE in the compiler-specific config headers? Then we could write: #if BOOST_HAS_PRAGMA_ONCE #pragma once #endif Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: (corrected) review of optional library
Fernando Cacciola wrote: As an example, currently the C++ standard includes T & stack::top(), with precondition !(stack.empty()). Instead, it could be may_be & stack::top(); // no precondition required could be improved also, instead of: pair set::insert(const value_type & x) we would use may_be set::insert(const value_type & x) Those are interesting examples! Thanks. They can all be paralleled with optional<> whatever the model and interface we choose. Which can of shows that the concept is useful. If I can say it, I don't think that they are really good examples. The stack::top signature is wrong, it should return a value and not a reference and thus it may require a copy of the returned object. This makes the proposed signature is potentially less efficient. Moreover, the result of top() could not be used by the caller to modify the top-most element. For this reason, a more correct signature would be may_be stack::top(); but, unfortunately, I don't think it's legal. The proposed signature of set::insert is a downgrade and not an improvement. Even if the element is not inserted, I still may want to have the iterator. In order to perform its operation, insert() will have to compute such iterator, so what's the point in discarding it? Just my opinion, -Alberto ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: unicode support
Vladimir Prus wrote: First interpretation is that you're interested in support for different Unicode encodings, via appropriate facets. Then Alberto Barbati is the last person who touches this matter, in news://news.gmane.org:119/aq72e4$pog$[EMAIL PROTECTED] I assume he's holding a lock on implementation work. Alberto, did you get anywhere? Yes, despite the clear lack of interest from Boosters about this issue, I'm still working on it ( but I don't have any "lock" ;) ). I had a few problems with the interpretation of the standard, but thanks to a few guys from comp.std.c++ I can now say that I have a working implementation of facets to converts from UTF-8/16/32 (external) to UTF-16/32 (internal), with endian variants, a total of 10 facets. The implementation fulfill a basic suite of tests on VS.NET with both the native STL and STLport. The facets are conformant to Unicode 3.2 requirements about non-characters, use of surrogates and non-shortest UTF-8 sequences. After a private discussion with a field expert, I decided to drop the UCS-2 facets, so surrogate support is no longer optional. I also decided to drop facets with UTF-8 as the internal encoding because they are not very useful and the current wording of the C++ standard de facto disallows a portable implementation :(. I hope the LWG would consider clarifying the issue. My next steps would be to polish the code, write the docs and prepare a more complete test suite. If everything goes well, I think I could submit the library for review by the end of the month. Second interpretation is conversion between all the 8-bit encodings out there. E.g. from koi8-r to windows-1251. Since there's GNU iconv already, I'd rather see a tiny wrapper over it. (GNU iconv works on Windows, too). Here things become more complex. UTF conversions are just algorithmic stuff, easy to do. Other conversions like koi8-r o windows-1251 require look-up tables and simply gathering the data for all of them will be equivalent to rewriting a part of ICU, which is a huge piece of work. The idea of wrapping ICU is very interesting. However the Boost policy explictly disallows dependencies from external libraries, so this solution is out of discussion. Moreover, the only things ICU is missing are the conversion facets. I don't see any reason to wrap anything else. Unfortunately, as I said before, not all conversions can be portably expressed as a facet with the current C++ standard, so even writing wrapping facets has little meaning. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: boost::pool feature requests
Thanks Steve, for considering my issues. 4) what's the use of ordered_alloc/ordered_free? I made a few tests and they are indeed a bit slower than regular alloc/free, without any apparent advantage. Am I missing something? Keeping the free list ordered allows algorithms that traverse the free list along with the memory owned by pool to work correctly/more efficiently: 1) array allocations will be more efficient (pool_allocator keeps its free list ordered, whereas fast_pool_allocator does not) 2) release_memory() will work correctly 3) object_pool uses the ordered property to efficiently implement the automatic destructor calls for allocated objects Sorry to bother you, I have a few more questions on this topic: 1) from what I understand, the "ordered" property depends on the fact that allocation and deallocation calls are correctly paired, in the sense that if I always call free() in the opposite order of the respective malloc() calls the pool is still considered to be ordered. Is this right? In this special case, would I get a benefit by calling ordered_malloc()/ordered_free()? 2) can ordered_malloc() be called on a non-ordered pool? Does the call make the pool ordered? In this case, is the complexity still O(1)? 3) can ordered_free() be called on a non-ordered pool? Does the call make the pool ordered? 4) ordered_malloc() is described as "merges the free list to preserve order", does this mean that unused, but potentially usable, chunks are removed from the free list? I think the "order" property is very useful and powerful, but it's not terribly clear from the docs how it can be exploited to full potential. The descriptions of the methods are missing post-conditions that are, in my opinion, as important as pre-conditions. Change in the "ordered" state of the pool could be added more explicitly there. Moreover, if a function can be called on both ordered and un-ordered pools, it would be interesting to state explicitly if there's a difference in behaviour and/or complexity. Thanks for your patience, Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] boost::pool feature requests
Hi Everybody, I recently used library boost::pool, it's a very nice library and extremely useful. I would like to submit a few requests and ask one question about it. 1) purge_memory() does not reset the member next_size, but in my opinion it should. The rationale is that after a purge_memory() the pool should be in the exact state it was at the moment of construction. For example, in an application of mine, I have a long life pool object (in fact a singleton_pool) that is emptied with purge_memory() from time to time. The first allocation after purge_memory() allocates a block of size next_size, then multiplies next_size by 2. As the next_size is never reset, it grows exponentially. Imagine my face when I found that allocating a 12 byte object from an empty pool required more than 1Gb... 2) the name release_memory() confuses me. It makes me think that all memory is being released, a task accomplished by purge_memory(). I think a better name could be release_unused_memory(). This function also should reduce the value of next_size, for example by considering the size of the largest block left in the pool. 3) I would like to have a function to free the entire pool _without_ releasing the memory. The rationale for this is in the example above. Most probably, the number of objects allocated in each iteration of my application is almost the same. At the end of an iteration, I would like to free all allocated objects with one single call (as I have thousands of objects, it makes the difference) yet it's a waste of time to free the memory, because I will need the same amount again in the next iteration. Such function could be called free_all_chunks() or something similar. 4) what's the use of ordered_alloc/ordered_free? I made a few tests and they are indeed a bit slower than regular alloc/free, without any apparent advantage. Am I missing something? Thanks in advance, Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Library Review (pointer serializationfacility)
Matthias Troyer wrote: On Monday, November 18, 2002, at 02:30 PM, Yitzhak Sapir wrote: I think taking out the pointer facility into a separate class would be better design. By this I mean, that register_type<> and the logic for identifying and maintaining pointers would be in a separate class from the archive. The archive would hold an instance of this class (given in the constructor), and use member functions of this container class to determine what to do when it encounters a new pointer/alias. But it seems to me this is not the case yet (again, correct me if I'm wrong). I want to second that vote. I would prefer a separate facility for pointer serialization, as an add-on to a serialization library if that is possible. Robert, what do you think, could it be separated out? I already proposed a similar facility (under the unfortunate name of "registry") in a few previous posts of mine. Yitzhak Sapir has expressed my idea with much better words, so you may count my vote for it, too. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Submission version 6
Robert Ramey wrote: 1) changing the state of the stream while serializing. My implementation initialized the stream and never contemplated that the same stream might be used for other things. That is that serialized data might be "embedded" as part of a larger stream. Apparently this is an issue for some people. I don't see it as a large issue but it as easy to address. In fact the issue is so easy to address that I don't understand why we are still discussing about it :) If you are willing to accept my solution, please say so immediately, so we won't waste any more time. One method of storing/recovering the data is to use a sequence of characters or wide characters. That is a C++ stream. This has some major benefits: a) All the code required to convert any C++ datatype into characters or wide characters exists and is part of the standard library and is guarenteed to work. This is not true, and I proved it to you with a code snippet in a recent post of mine. The standard *does not* provide a way to output (i.e.: to write on a disk file) a stream of wide characters. You can put wide characters into a wide stream but you will always obtain a file of "narrow" characters, obtained through a "degenerate conversion" as explictly specified in the standard. Moreover, I have very bad news. I just found that the C++ implementation shipped with .NET is not conformant on this point. Consider the following program: int main() { std::wofstream out("test.txt", std::ios::binary); out << L"I owe you \x20ac 1\n"; // \x20ac is the Euro sign return 0; } On .NET with STLport you get the incorrect, but ANSI-conforming, result: "I owe you ¬ 1" '¬' being the character of ASCII code 0xac. On .NET with its native STL implementation you get "I owe you " the program chokes when writing the Euro sign and leaves the stream in "failed" state :( Here Microsoft seems to have really screwed up something. Another observation: I note that my test.cpp program includes wchar_t member variables initialized to values in excess of 256. The system doesn't seem to lose any informaton in storing/loading to a stream with classic locale. I double checked. I have functions in both char and wchar_t versions of text archives to handle both strings of chars and wstrings. This created a couple of problems. The most obvious was what about strings containing embedded blanks. - and other punctuation. Single characters such a space was also a problem. First I implemented them a sequence of short integers. That worked fine but I was concerned that it wasted space, was slow, and inconvenient for debugging. So I made special functions for i/o of string and wstring which just write a string length and then stream out the string buffer as binary. So I never have the problem that unicode or local o anything else interfers with my serialization. This is a side effect of the fact that the usage of the stream was carefully limited to the purpose at hand. You should triple check, then. Following my previous example, this program: int main() { std::wstring outs(L"I owe you \x20ac 1"), ins; { std::wofstream out("test.txt", std::ios::binary); boost::woarchive ar(out); ar << outs; } { std::wifstream in("test.txt", std::ios::binary); boost::wiarchive ar(in); ar >> ins; } assert(outs == ins); return 0; } fails on at least two platforms (.NET/native STL and .NET/STLport), in two different ways. Of course this raises the question why support wstreams at all? We're not using its advantages (unless we have a lot of unicode text to store) and it doubles the required space. Let's replace wide streams and archives with narrow ones in the previous example. The program indeed run successfully on both STLport and .NET native STL, but let's have a look at the archive file: ---begin file 22 serialization::archive 1 0 1 13 73 32 111 119 101 32 121 111 117 32 8364 32 49 ---end file this alternative requires from 2 to 6 (six!) bytes per Unicode character. Even up to 12 if you use surrogates, that become 8 if your wchar_t is 32-bit wide (:o another platform-specific issue has leaked in!). If I had lots of Unicode strings I would have no doubt about which is the better solution. I hope you realize that Unicode output is a lot more complex than it seems. I am just asking you to allow the programmer to avoid overriding the locale, which still can be the default option. Am I asking too much? Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Re Serialization - locale
Robert Ramey wrote: How does this strike you? text_iarchive(Stream &_is) : is(_is) { // archives always use classic locale #ifndef BOOST_NO_STD_LOCALE plocale = is.imbue(std::locale::classic()); #endif init(); } ~text_iarchive(){ #ifndef BOOST_NO_STD_LOCALE is.imbue(plocale); #endif } This solution does not address the objections in my last post in the original thread. You seem really concerned about this. We could meet in the middle with this solution, instead: text_iarchive(Stream &_is, bool _overrideLocale = true) : is(_is) { // archives always use classic locale #ifndef BOOST_NO_STD_LOCALE if(_overrideLocale) plocale = is.imbue(std::locale::classic()); else plocale = is.getloc(); #endif init(); } Another observation: I note that my test.cpp program includes wchar_t member variables initialized to values in excess of 256. The system doesn't seem to lose any informaton in storing/loading to a stream with classic locale. Which platform are you working on? On Win32, VC++ 6sp5, STLport the following test program produces the output 52 (0x34) instead of 4660 (0x1234). According to the standard, this behaviour is perfectly conformant. (the ios::binary is required, because the I/O library could apply CRLF translation to a part of a two-byte character). #include #include int main() { wchar_t a = L'\x1234'; std::wofstream out("test.txt", std::ios::binary); out << a; out.close(); std::wifstream in("test.txt", std::ios::binary); in >> a; in.close(); std::cout << unsigned(a) << std::endl; return 0; } If this code produces the correct output on your platform, either your char type has 16 bits or *your* platform is non-conformant, according to my interpretation of the standard. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Submission version 6
Robert Ramey wrote: > register_cross_program_class_identifier(const char *id="T") > An alternative could be to use register_type<> as it is, but augment the serialization traits class to provide a const char* serialization::get_cross_program_class_identifier(); This solution has the advantage that the identifier string can be physically located near the load/save/version functions (which are usually near the class itself). Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: String algorithm library
Rozental, Gennadiy wrote: Seq& trim_copy( Seq& input, Seq& trim_func( Seq&, const std::locale& ) ); In my opinion all algorithms without suffix should perform in-place operations, while the ones that make copies should have suffix "_copy". That would be more intuitive and also more consistent with the STL. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Submission version 6
Vahan Margaryan wrote: Eric Woodruff wrote: type_info is not portable in the slightest. I realize that. I just pointed out that it's not so convenient to have user-supplied string ids because of the template classes. As pointed out by Robert, the user-supplied string id could be made optional. For the lazy user we might imagine a default value obtained in some programmatic way, for example a possibly pre-processed type_info::name(). Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Submission version 6
Robert Ramey wrote: I believe I have found the an acceptable resolution to the "registration" cunundrum. (note: I consider this "registration" topic is a different issue from my registry class proposal. This one relates to "identification" of user classes, while mine is just an issue of factorizing responsibilities among library classes) I agree that this solution of the "identification" issue is quite right and would be very beneficial to the overall usefulness of the library. Bravo to Robert and Vladimir. I conjure up something like (pseudo code): register_cross_program_class_identifier(const char *id="T") The perfect place for this function could be as a method of my registry class, don't you think? ;) Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Submission version 6
Vladimir Prus wrote: Robert Ramey wrote: Now I remember why I included this. Suppose that an archive is created where the default local is a spanish speaking country where the number 123 thousand is written 123.000 The archive is sent to another country where the default locale is an english speaking country where the string 123.000 means 123 That's why I set the local to classic. I'm well aware of this issue. See below. is it a good idea to change stream locale without user's consent. Maybe, archive should create *their own* (i/o)stream, sharing streambuffer with the stream the user has passed, and with appropriately modified locale? In my opinion it the archive should not change the stream locale without the programmer's consent. The main reason for this is that she may indeed want to use her own locale, for example to allow Unicode output. Ovverriding only num_put/num_get (and why not ctype also?) is not a nice solution, in my opinion, it's just a hack. Moreover, I can imagine a brave programmer that is aware that her serialized data will not be read by any other application except hers and decides to have the text output to follow her native language conventions. In the end, between the two possibilities: 1) override the locale (entirely or partially), reducing programmer's freedom of customizing the output but guaranteeing a perfect portable output; 2) not override the locale, leaving to the programmer the complete responsiblity to set the right one that satisfies her specific requirements, with the risk that she messes things up; I vote without doubt for number 2. To be paranoid, on output we could write in each archive a magic number like 12345.678, on input we try to read it and if it doesn't match the magic number, we issue an error. I know that this hack won't catch 100% of the problems, but it will catch most of them and is not less safe than writing the sizeof() of the basic types as we are currently doing. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Library Review
Robert Ramey wrote: Alberto Barbati wrote Please note that the "registry" class I described *does not* attempt to solve the broader issue of UUIDs I read about in the discussion between you and Vladimir. My proposal is just a way to separate the "registration" part from the "serialization" into two different compontents. Still the classes will be matched according to their registration order, as it happens now. I am ready to discuss the opportunity and/or usefulness of this approach, but I don't see the reason why this could not be done. I considered this approach and found the following problem It seems from the objections you raise that I did not explain myself well enough. Before answering to them, I think a code snippet would help. Consider this code, that uses current implementation: ---begin code std::ofstream s("..."); boost::oarchive a(s); a.register_type(); a.register_type(); a.register_type(); a.register_type(); // do serialization ---end code What I am suggesting is to allow, possibly in addition to that form, the following form: ---begin code boost::serialization_registry r; r.register_type(); r.register_type(); r.register_type(); r.register_type(); std::ofstream s("..."); boost::oarchive a(s, r); // in the body of the constructor an // equivalent of register_type() // is immediately called 4 times // do serialization ---end code I am *not* suggesting to change in *any way* the serialization process. The purpose of serialization_registry is just to allow for a finer granularity of responsibilities. > a) register all the types in the global collection in the archive. > bad idea - this would require that the reading program register > all the types of the writing program. An intolerable requirement With my approach you are going to register the same types you would register anyway. Not one less, not one more. Why would it be intolerable? > b) register types as needed as the library is written > wouldn't work - on loading, we wouldn't know which types to register > c) after creating the archive, append a "registration file" on loading, > process the "registration file first. In my view this cure is worse > than the disease. The types are going to be physically output in exactly the same way as it's being done by current implementation. So this two objections do not apply. So you may be wondering, what's the point in having this registry class? There are two main advantages: 1) the module that sets up the registry can be distinct from the one that effectively performs the serialization. This can solve a lot of dependency issues. 2) the registration can be done in *one* place for *both* input and output. With current implementation, the registration code will be duplicated and duplication is always a bad thing. Imagine yourself trying to keep two (possibly very) long lists of register_type calls synchronized. I'm still at a loss. Aside from addressing the "classic" above, what exactly do you recommend I do? How about if I change the wording to specify that the library supports wide streams and leave unicode out of it? Yes, it all boils down to change the wording as you said: you just can't mention Unicode. I got a little carried away because I'm working on the subject and I'm having a hard time getting people aware of the issue. I apologize for that. On a different topic, I found a portability issue. The current implementation record in the archives the size of the basic types int, long, float and double. This gives you a *false* sense of security that the "writing" and "reading" platform agree on the type size. this is for the native binary archive which is explicitly described as being non-portable. It was included because some users felt it would be more efficient. It has no pretensions at all to portability. Knowing that some one will ignore this admonishment and try to move such a binary to another machine architecture, Included free detection so that it would crash in a more graceful manner. All right, but still the size_t issue with read_string/write_string is a bug, as it does not involve cross-platform. With current implementation a program will fail reading its own serialized strings on a least one plaform (i64). Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Library Review
Robert Ramey wrote: From: Alberto Barbati <[EMAIL PROTECTED]> 1) I don't like the non-intrusive way of specifying the version/save/load operation. a non-intrusive method is required to implement serialization for classes that you don't want to change. For example, the library includes serialization for all STL containers with without changing STL itself. This would permit easy and optional addition of serialization to any class that might benefit from it As pointed out by Vladimir, I am not disputing the presence of the non-intrusive method, which is indeed necessary. It's the way that is implemented (through template specialization) that I don't like. Free functions are a much better option, in my opinion. 2) A most needed addition to the design is to provide a sort of "registry" object. This has been a hot topic. It is really not possible to achieve the desired results. I will add a section to th rationale explaining this in detail, Please note that the "registry" class I described *does not* attempt to solve the broader issue of UUIDs I read about in the discussion between you and Vladimir. My proposal is just a way to separate the "registration" part from the "serialization" into two different compontents. Still the classes will be matched according to their registration order, as it happens now. I am ready to discuss the opportunity and/or usefulness of this approach, but I don't see the reason why this could not be done. One note: the library, as it is, *does not* support Unicode output, as stated. The library supports wide streams, yes, but that does not mean Unicode support. So what do I have to do exactly in the warchive specialization to generat Unicode output? As I said in my post, you have to imbue the stream with a locale holding the correct codecvt facet, before using it to serialize. Unfortunately, such a facet is not part of the ANSI standard and is the subject of a separate proposal of mine (see thread "codecvt facets for utf8/utf16/utf32"). BTW, with the current implementation even doing so is completely useless, as there are lines like this: os.imbue(std::locale::classic()); that reset the locale of the streams to the "dumb" default. Such lines are IMO both unnecessary and conceptually wrong, and should be removed. On a different topic, I found a portability issue. The current implementation record in the archives the size of the basic types int, long, float and double. This gives you a *false* sense of security that the "writing" and "reading" platform agree on the type size. First objection: you should check also number of bits of a char, which is not necessarily 8 bits, and the size of shorts. However, it gets worse. On both x86 and i64 plaforms int, long and float are 4 bytes, while double is 8 bytes, so they "agree" according to this test. However, type size_t on x86 has 4 bytes, while on i64 has 8 bytes. It's not hard to imagine an application that tries to serialize a size_t. The library won't detect the issue, but the compiler will use a different overloads of operators << and >> with clearly bad consequences. The same problem may happen with other typedef'd types. Maybe it would be a good idea to add in the documentation a note warning about this case and encouraging the use of fixed-size types (like boost::int8_t, etc.) for the members of a serializable class, at least when multi-plaform is an issue. Moreover, this issue is now present in library itself! In file archive.cpp, function write_string() writes the length of the string as a size_t, while read_string() reads it as an unsigned int, which may have a different size. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] Re: Serialization Library Review
rams with two different compilers. That was not a deep test with live data, although I plan to do it in the next few days. I tested both polymorphic and non-polymorphic types, but no templates. On VC++ 7.0 (.NET) it all went good at the first try. However the restriction, cited in the documentation, of having to add /OPT:NOREF to the linker options is too hard to swallow. In a typical setting, this can increase the executable size by up to 100% or even more, as such is the typical size of unreferenced data. This problem will have to be addressed explicitly. On Metrowerks CodeWarrior I had a few problems. A few of them I already described above. Another one come from the fact that the library uses (correctly) the trait is_base_and_derived in the implementation of template class base_object. is_base_and_derived depends on is_convertible that is broken on CodeWarrior. I had to comment out the static assert line to let the program compile. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I read the documentation quickly, then write those test programs. I traced the execution in the debugger to discover the seekp(0) issue and to have glance at the library internals, but that does not account for in-depth study. - Are you knowledgeable about the problem domain? I can say that I am a bit knowledgeable. I worked on (and know all quirks of) the MFC implementation and tried myself at least two different approaches, although they were limited in the scope of the respective applications and not general-purpose components. - Do you think the library should be accepted as a Boost library? My opinion is that the library should not be accepted as it is, but has huge potential. There is indeed little space for improvements, but a few features, such as the registration of polymorphic types, should defintely be addressed before prime time. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[boost] config missing #define for Metrowerks compiler EH support
Hi, would it be possible to add the following piece of code somewhere in boost\config\Metrowerks.hpp: - begin code #if !__option(exceptions) # define BOOST_NO_EXCEPTIONS #endif - end code I guess the code is self-explanatory. Alberto Barbati ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost