Re: [boost] Filesystem library name

2003-01-16 Thread Alberto Barbati
Rene Rivera wrote:

Not totally right... It should be:

libboost__.lib.

Putting the version at the end is somewhat standard. And in my current case
of OpenBSD required.


That may be standard on OpenBSD, but it's not on Windows, where the last 
part of the filename is used to tell the type of the file. On Windows, 
library files usually end in ".lib".

BTW, Boost already uses (excruciatingly) long pathnames to select among 
different versions of the same library. I suggest the adoption of a 
fully "tagged" name scheme only for those files, like DLLs or shared 
libraries, that are probably going to be installed in some specific 
folder (on the PATH, for example).

For example, although on my system the STLport DLL is named 
stlport_vc750.dll, a name that carries both the platorm "vc7" and the 
version "50", the corresponding library is simply named stlport_vc7.lib. 
I believe the even the "vc7" could have been removed from the lib's name 
if a pathname scheme like Boost had been implemented.

Alberto Barbati


___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] New version of UTF library (was Re: UTF library available for review)

2003-01-12 Thread Alberto Barbati
Hi,

I just uploaded here http://groups.yahoo.com/group/boost/files/utf/ a 
new version of the UTF library. The changes are:

1) Added missing typename keywords and used BOOST_DEDUCED_TYPENAME in 
every applicable place

2) Added safety checks on buffer size. (Thanks to Dietmar Kuehl)

3) Now the state type is not assumed to be an integer type. In order to 
access the state two unqualified free functions get_state() and 
set_state() are used instead. File utf_config.hpp provides a default 
*non-portable* implementation that relies on reinterpret_cast, which 
should be specialized for each platform. (Thanks to Dietmar Kuehl)

The suite now compiles correctly on gcc cygwin yet it fails to link 
because it complains about missing wchar_t specializations. Can anyone 
help me on this?

It also seems that gcc does not provide specialization for any library 
class (basic_filebuf, char_traits, etc.) for internal types different 
from char and wchar_t. Could anyone confirm this? This could be a 
problem if the user want to use UTF-32 facets but its wchar_t is 16 bit 
wide. I can easily provide an implementation of char_traits for 
implementations lacking it. Should I do it?

Alberto Barbati wrote:
> Dietmar Kuehl wrote:
>> Alberto Barbati wrote:
The problem is that if char does not have 8 bits, then I cannot be sure 
that the underlying implementation reads from a file 8 bits at a time. 
Please correct me if I'm wrong on this point. That requirement is 
essential for the UTF-8 encoding.

Has anyone any comment about this? I don't have access to any 
implementation where char has more than 8 bits to verify.

There already exist a facility to select the correct facet according to
the byte order mark. It's very simple to use:

std::wifstream file("MyFile", std::ios_base::binary);
boost::utf::imbue_detect_from_bom(file);

that's it.



I have seen this possibility and I disagree that it is very simple to use
for several reasons:

- There is at least one implementation which does not allow changing
  the locale after the file was opened. This is a reasonable
  restriction which seems to be covered by the standard (I thought
  otherwise myself but haven't found any statement supporting a
  different view).  Thus, changing the code conversion facet without


 >   closing the file may or may not be possible. Closing and reopening
 >   a file may also be impossible for certain kinds of files.

I guess you are mentioning 27.8.1.4, clauses 19 (description of function 
filebuf::imbue):

"Note: This may require reconversion of previously converted characters. 
This in turn may require the implementation to be able to reconstruct 
the original contents of the file."

That may indeed be a problem. In my humble opinion, the use of "may" is 
quite unfortunate... it seems that implementation need not reconvert 
previous characters and leaves unspecified (not even "undefined" nor 
"implementation defined") what happens if the implementation cannot 
perform the reconstruction.

In which way is imbue implemented in the implementation you were 
mentioning?

I looked deeper into the question.

Of the three implementations I checked (VS.Net/Dinkumware, STLport, gcc 
3.2 prerelease) none of them implement clause 19. gcc even has an 
explicit comment about this. All of them allows imbue() in the middle of 
a file. Which implementation where you talking about?

I am considering writing a mega-facet that automatically adapts to the 
file encoding according to the BOM. It could easily be done for UTF-32 
as the conversion code is already factored out of the facet classes 
(splitted into file utfXX_algo.hpp and utf32_strategy.hpp). I plan to do 
the same factorization for UTF-16 facets also; it is already done for 
facet utf8_utf16. However, please bear in mind that such a facet can't 
be as performant as the little ones, because each of 
do_in/do_out/do_length functions have to be a large switch over the 
several implementations and such a switch need to executed each of the 
several times do_XXX is called for each character.

BTW, this mega-facet is ok when reading from a file. How should it 
behave when writing? Will it be ok to return error until a encoding is 
chosen? In fact, reading *and* writing at the same time to a Unicode 
file is IMHO a sure way to disaster, unless writing always occur at end 
of file with std::ios_base::app.

I am considering adding stream classes, derived from std::basic_* 
classes (or maybe from boost::filesystem classes?) as a conveniency. 
What do you think?

Alberto


___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Next revision of boost::thread

2003-01-09 Thread Alberto Barbati
Stefano Delli Ponti wrote:

From: "David Abrahams" <[EMAIL PROTECTED]>


"William E. Kempf" <[EMAIL PROTECTED]> writes:



That's a good idea.  So would users prefer new exception types here,
or should I use the std:: exceptions?


IMO, it's always safer to use an exception type which provides
more-specific information.



Agreed. And we should keep coherence with the filesystem library.


Agreed.

Alberto


___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



[boost] Re: Next revision of boost::thread

2003-01-08 Thread Alberto Barbati
William E. Kempf wrote:

* Are there concerns about using conditional compilation and optional portions of the library, as POSIX does?  I believe this is the only way Boost.Threads and the C++ standard will be able to provide "portable" threading libraries that don't restrict implementation to a least common denominator approach.


What about using property maps? (I mean the Boost Property Map Library).


* Are there issues with throwing std::invalid_argument for both invalid and unsupported values?  Should I define Boost.Threads specific exceptions instead, seperating out the two exception types?


If you want to use std:: exception classes, for "unsupported" value you 
could also use std::domain_error. Defining two new classes in the boost 
namespace is also an option.

Beyond this, I'd appreciate any other feedback as well.


I'm quite busy now, but I'm enjoying the thread library so much that I 
will be pleased help you as much as I can.

Alberto


___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: UTF library available for review

2003-01-08 Thread Alberto Barbati
Dietmar Kuehl wrote:

Alberto Barbati wrote:


One can use a char traits class different from
std::char_traits, that defines a suitable state type.



This is not really viable due to 27.8.1.1 paragraph 4:

  An instance of basic_filebuf behaves as described in lib.filebuf
  provided traits::pos_type is fpos. Otherwise the
  behavior is undefined.


Thanks for pointing that out, I missed it. However it's not really a 
problem, you can add a pos_type typedef to our test_traits, like this:

template 
struct test_traits : public std::char_traits
{
typedef boost::uint32_t state_type;
typedef std::fpos pos_type;
};

It would be possible to create a conversion stream buffer (which is
probably a good idea anyway) which removes this requirement but even
then things don't really work out: stream using a different character
traits are not compatible with the normal streams. I haven't worked
much with wide character and don't know how important it is to have
eg. the possibility of using a wide character file stream and one of
the standard wide character streams (eg. 'std::wcout') be replacable.
I think it is crucial that the library supports 'std::mbstate_t'
although this will require platform specific stuff. It should be
factored out and documented such that porting to new platform consists
basically of looking up how 'std::mbstate_t' is defined.


That's a better argument. I will think about it. As I said, I'm 
definetely not against adding acceessors to mbstate_t, I just have to 
think what's the better way to do it.

I forgot to say in my previous post that this version of the library
only supports platforms where type char is exactly 8 bits. This
assumption is required because I have to work at the octet level while
reading from/writing to a stream.



I don't see why this would be required, however. This would only be
necessary if you try to cast a sequence of 'char's into a sequence
of 'wchar_t's. Converting between these is also possible in a portable
way (well, at least portable across platforms with identical size of
'wchar_t' even if 'char' has different sizes).


The problem is that if char does not have 8 bits, then I cannot be sure 
that the underlying implementation reads from a file 8 bits at a time. 
Please correct me if I'm wrong on this point. That requirement is 
essential for the UTF-8 encoding.

Such decision is very strong, I know. Yet, one of the main problems with
the acceptance of Unicode as a standard is that there are too many
applications around that uses only a subset of it. For example, one of
the first feedback I got, at the beginning of this work, was "I don't
need to handle surrogates, could you provide an optimized facet for that
case?". The answer was "Yes, I could, but I won't".



As I said, I don't have strong feelings about this (and I have
implemented such a facet myself already anyway...). However, note that
I requested something quite different: I definitely want to detect if
a character cannot be represented using the internally used character.
In fact, I would like to see this happen even for a 16 bit internal type
because UTF-16 processing is considerably more complex than UC2
processing and I can see people falling into the trap of testing only
cases where UC2 is used. That is, the implicit choice of using UTF-16
is actually a pretty dangerous one, IMO.


I know it's dangerous, but I prefer that way. I would like this to be 
"The UTF Library", not just some "conversion library". I also want to 
support the Unicode standard to its full extent. Supporting a conversion 
not covered by Unicode, just because someone finds it useful, does not 
go in that direction. If this position would stop my proposal to be 
accepted in Boost, I would just retire it.

There already exist a facility to select the correct facet according to
the byte order mark. It's very simple to use:

std::wifstream file("MyFile", std::ios_base::binary);
boost::utf::imbue_detect_from_bom(file);

that's it.


I have seen this possibility and I disagree that it is very simple to use
for several reasons:

- There is at least one implementation which does not allow changing
  the locale after the file was opened. This is a reasonable
  restriction which seems to be covered by the standard (I thought
  otherwise myself but haven't found any statement supporting a
  different view).  Thus, changing the code conversion facet without

>   closing the file may or may not be possible. Closing and reopening
>   a file may also be impossible for certain kinds of files.

I guess you are mentioning 27.8.1.4, clauses 19 (description of function 
filebuf::imbue):

"Note: This may require reconversion of previously converted characters. 
This in turn may require the implementation to be able to recon

[boost] Re: UTF library available for review

2003-01-07 Thread Alberto Barbati
First of all, thanks to everybody for your feedback. I realized that my 
message was a bit arrogant about the lack of interest... I apologize for 
that and I promise I'll give my best to get this library to boost standards!

Dietmar Kuehl wrote:
- The 'state' argument for the various facet functions is normally accessed
  as a plain integer in your code. However, eg. 'std::mbstate_t' is not an
  integer at all. In fact, it is implementation defined and can thus be
  something entirely different on each platform. I'd suggest to provide some
  from of accessor functions to read or write objects of this type, possibly
  in the form of some traits class. This could then be used to cope with
  differences and/or more complex organization of the state type.


That's a good point. On my platform (Win32) mbstate_t indeed is a 
typedef to "int" and thus satisfies all requirements (i.e.: being an 
integer type capable of containing an unsigned 21-bit value). I was 
aware that this is not the case on other platforms, but I made the 
assumption anyway simply because the user is not really required to use 
mbstate_t. One can use a char traits class different from 
std::char_traits, that defines a suitable state type. For example:

template 
struct MyTraits : public std::char_traits
{
typedef boost::uint32_t state_type;
};

then use basic_fstream > instead of basic_fstream.

[I just tried that and found a typo in correctness_test.hpp that 
prevents the test suite to compile, but nothing too serious]

Of course I have to document this ;)

If you believe that harnessing with the char traits class is 
undesirable, or simply "too much", I see no problems in adding those 
accessors function as you suggest.

- Falling back to UTF-16 in cases where the Unicode characters may not
  fit into the words is one possible approach to deal with the characters.
  Another approach is to just indicate an error. For example, I know that
  certain XML-files I'm using exclusively store ASCII characters and there
  is no need to use a different internal character type than 'char'. If I ever
  come across a non-ASCII character I don't really want to have it encoded
  in something like UTF-8 but I want to fail reading the file (and possibly
  retry reading using a bigger character type). I would appreciate if such a
  possibility would be incorporated, too (this is, however, nothing I feel too
  strongly about).


I forgot to say in my previous post that this version of the library 
only supports platforms where type char is exactly 8 bits. This 
assumption is required because I have to work at the octet level while 
reading from/writing to a stream.

That said, I have deliberately decided not to allow "char" as the 
internal type. The internal type must be an integral type able to 
represent an unsigned 16-bit value (for UTF-16) or an unsigned 21-bit 
value (UTF-32). The choice between UTF-16 and UTF-32 as the internal 
encoding is done at compile-time based on the sizeof of the internal 
char type, which must be either 2 or 4.

Such decision is very strong, I know. Yet, one of the main problems with 
the acceptance of Unicode as a standard is that there are too many 
applications around that uses only a subset of it. For example, one of 
the first feedback I got, at the beginning of this work, was "I don't 
need to handle surrogates, could you provide an optimized facet for that 
case?". The answer was "Yes, I could, but I won't".

Yours is a very special case. Frankly, I'd rather not support it. In 
fact, it would be extremely very simple to do: you just take the facet 
declared in file detail/degenerate.hpp and change a few lines. However, 
it would be out of the intent of this library, which is to provide UTF 
conversion according to Unicode 3.2 requirements, no more, no less.

- In the context I want to use the facets [at least those I'm implementing]
  I don't really want to bother about the details of the encoding. That is, I
  just want to 'imbue()' an appropriate 'std::locale' object to the stream and
  have the facet figure out what encoding is used, especially when reading
  from some file. The basic idea is that each file is either started by a byte
  order mark from which UTF-16BE or UTF16LE can be deduced or it is in
  UTF-8. The encoding could eg. be stored in typical 'std::mbstate_t'
  objects (these are often a struct with a 'wchar_t' and a count or something
  like this). To enable something like this, it would be helpful if the actual
  conversion functions were actually separated from the facets: The facets
  would just call the corresponding conversion functions. The actual
  conversion is, apart from the state argument, entirely stateless and there
  is no need to bind it to any facet.


There already exist a facility to select the correct facet according to 
the byte order mark. It's very simple to use:

std::wifstream file("MyFile", std::ios_base::binary);
boost::utf::imbue_detect_from_bom(file);

that's it

[boost] Re: UTF library available for review

2003-01-05 Thread Alberto Barbati
Alberto Barbati wrote:

* a comprehensive test suite (with Jamfile)


I almost forgot! The test suite requires the filesystem library. Special 
thanks to Beman Dawes for it!

Alberto



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] UTF library available for review

2003-01-05 Thread Alberto Barbati
Hi Boosters,

I have put in the Boost file section the first version of my UTF 
library. You can find it here:

http://groups.yahoo.com/group/boost/files/utf/

A couple of months ago, I posted a message to check if there was 
interest in such a library and I got just one answer from Vladimir Prus 
(hi, Vladimir!). I hope that in front of a full-blown and working 
library I might get more attention.

What you will find in the library:

* codecvt facets for the following external encodings: UTF-8, UTF-16LE, 
UTF-16BE, UTF-32LE, UTF-32BE. The facets are templated, in order to 
avoid any reference to the platform wchar_t type (if present).

The internal encoding can be either UTF-16 or UTF-32. A convenience 
interface is provided to automatically select the internal encoding 
according to the size (2 or 4 bytes) of the character type used internally.

The facets will perform correct handling of the following Unicode features:

  - all 17 character planes
  - non-characters (U+XFFFE, U+X, U+FDD0 - U+FFEF)
  - UTF-16 surrogates pairs (both externally and internally)
  - UTF-8 non-shortest forms (externally)

* a convenience interface to autodetect the correct facet according the 
file signature (BOM)

* a comprehensive test suite (with Jamfile)

* a little example (with Jamfile)

What you won't find in the library:

* documentation :-( I'm working on it!!! I swear. Give me some more 
time! (and a little feedback)

* facets for UCS-2 or UCS-4 (these encoding are very similar to UTF-16 
and UTF-32 but are *not* the same!)

* facets that uses UTF-8 internally (this is too complex and won't work 
portably, believe me!)

Compatibility

The test suite and the example have been tested with VS.NET with both 
the native STL and STLport 4.5.3. However, STLport have major bugs in 
the codecvt interface and in the basic_filebuf implementation, so in 
order to compile and run the wchar_t and uint16_t tests you need to 
apply a patch that is provided with the library and *rebuild* STLport. 
The uint32_t test won't compile in any case due to an incomplete 
implementation of the entire locale suite (I am going to contact Boris 
Fomitchev in order to see how we can make a patch).
The test suite will compile and run correctly even in presence of the 
/Zc:wchar_t option (that's why there are a wchar_t and a uint16_t test 
in the first place).

The facets that have UTF16 internally were a major challenge. I provided 
two different implementations. The default one is a "compatibility" one 
that should work with most STL implementations (including VS.NET and 
STLport that have a minor flaw in them :-( ). The other one should be a 
little more performant but I don't know on how many compiler it will 
work. The alternative implementation can be selected from file 
config.hpp. In that file you can also find a #define that should be 
changed if your implementation correctly implements Library Issue 75 
about the prototype of function do_length().

I hope you enjoy this library and find it useful. According to the 
feedback I receive, I will go on writing a decent document in view of a 
formal submission.

Thanks in advance for your time and help,

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: filesystem feature request: temporary path and files

2003-01-02 Thread Alberto Barbati
Thomas Witt wrote:


Hi Alberto,

Alberto Barbati wrote:


Hi,

first of all, I want to thank Beman Dawes and all others that
contributed with the design and development of the Filesystem library. 
It's a wonderful piece of work.

I just would like to propose a couple of additions that I believe are 
very useful. Both features regard temporary files.

First proposal: I propose to add a function with a signature of this 
kind:

path generate_path_for_temp_file();


IIRC functions like this are considered a bad idea. They are subject to 
race conditions and a potential security problem.

I agree with you, that the functionality would be really helpfull. The 
usual solution to the race condition problem would be to have a function 
that returns a stream. See mkstemp on POSIX. Win32 has a similar facility.

Then what do you think about my second proposal? (the tempstream class 
that was in the attachment.) The best thing to do would be to have that 
one implemented as a "primitive" and not implementing 
generate_path_for_temp_file() at all. However, that is not easily 
achievable in a portable way, because the interface of 
std::basic_fstream takes a pathname and not a stream id or FILE* :-(
Moreover, there's no way to specify that the file is to be open 
exclusively, so complete security will never be granted if we derive 
from std::basic_fstream.

The weak link here is the std::basic_filebuf class, is there someone out 
there who wants to write a (possibly portable) replacement of 
basic_filebuf that overcome these limitations? It seems an interesting 
but huge task, to me.

By the way, the Win32 facility that you are talking about is 
GetTempFileName()? That function creates the file but does not open it, 
so it's different from mkstemp() that also opens the file exclusively. 
So GetTempFileName() is only safe against non-malicious race conditions.

On the other hand, mkstemp() can easily be downgraded to 
GetTempFileName() by just keeping the pathname and closing the file ;)

Alberto



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] filesystem feature request: temporary path and files

2003-01-02 Thread Alberto Barbati
Hi,

first of all, I want to thank Beman Dawes and all others that
contributed with the design and development of the Filesystem library. 
It's a wonderful piece of work.

I just would like to propose a couple of additions that I believe are 
very useful. Both features regard temporary files.

First proposal: I propose to add a function with a signature of this kind:

path generate_path_for_temp_file();

the effect would be to generate a new (potentially unique) path suitable 
to be used for a temporary files. A sample POSIX implementation could be 
as simple as:

#include 
path generate_path_for_temp_file()
{
char tmp_name[L_tmpnam];
return path(tmpname(tmp_name), native);
}

but there could also be platform-specific implementations. For example, 
a Win32 sample implementation could use the GetTempPath/GetTempFileName 
to create the path in the correct directory as in:

#include 
path generate_path_for_temp_file()
{
char tmp_dir_path[MAX_PATH];
char tmp_name[MAX_PATH];

if(GetTempPathA(sizeof(tmp_dir_path), tmp_dir_path) == 0
|| GetTempFileNameA(tmp_dir_path, "$$$", 0, tmp_name) == 0)
{
  boost::throw_exception(
  filesystem_error("unable to generate path for temporary 
file", system_error));
}

return path(tmp_name, native);
}

Open issues (to be discussed):

1) on Win32, GetTempFileName also create an empty file with the returned 
name. Other platforms also have functions that atomically generates the 
name *and* creates a file with such name. Should there be a 
postcondition about the existence (or non-existence) of such a file?

2) Another useful signature could be:

path generate_path_for_temp_file(const path& location_hint)

that would use the specified path as a hint (in an unspecified 
platform-dependent way) to generate the path. For example, one may want 
to generate a temporary file in a specified directory or physical drive, 
overriding the system default, if any. On Win32 such a function could be 
implemented as:

#include 
path generate_path_for_temp_file(const path& hint)
{
char tmp_name[MAX_PATH];

if(GetTempFileNameA(hint.native_directory_string().c_str(), 
"$$$", 0, tmp_name) == 0)
{
  boost::throw_exception(
  filesystem_error("unable to generate path for temporary 
file", system_error));
}

return path(tmp_name, native);
}


Second proposal: a stream class that encapsulates a temporary file, that 
is a stream based on a file that is automatically deleted in the 
stream's destructor. I am attaching a sample implementation. It's 
implemented as a simple wrapper around fs::basic_fstream and 
generate_path_for_temp() above.

Cheers,

Alberto
#ifndef BOOST_FILESYSTEM_TEMPSTREAM_HPP
#define BOOST_FILESYSTEM_TEMPSTREAM_HPP

#include 

namespace boost
{
  namespace filesystem
  {
path generate_path_for_temp_file();
path generate_path_for_temp_file(const path& hint);

template < class charT, class traits = std::char_traits >
class basic_tempstream : public basic_fstream
{
public:
  // ctor always opens file
  explicit basic_tempstream(std::ios_base::openmode mode = 
std::ios_base::in|std::ios_base::out)
: m_path(generate_path_for_temp_file())
  {
basic_fstream::open(m_path, mode);
  }

  explicit basic_tempstream(const path& hint,
  std::ios_base::openmode mode = std::ios_base::in|std::ios_base::out)
: m_path(generate_path_for_temp_file(hint))
  {
basic_fstream::open(m_path, mode);
  }

  virtual ~basic_tempstream()
  {
remove(m_path);
  }

  // intentionally hide inherited open()
  void open(std::ios_base::openmode mode = std::ios_base::in|std::ios_base::out)
  {
basic_fstream::open(m_path, mode);
  }

  const path& path() const
  {
return m_path;
  }

private:
  filesystem::path m_path;
};
 
typedef basic_tempstream tempstream;
#   ifndef BOOST_NO_STD_WSTRING
typedef basic_tempstream wtempstream;
#   endif

  } // namespace filesystem

} // namespace boost

#endif  // BOOST_FILESYSTEM_TEMPSTREAM_HPP

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



[boost] Re: #pragma once

2002-12-30 Thread Alberto Barbati
Gennaro Prota wrote:

Hmm... frankly I haven't used it anymore. I'm under the impression
they have fixed it now, but last time I checked it had a lot of bugs.
It was easy, for instance, to end up including the same file twice if
it was reached through different paths (e.g.: #include "subdir/file.h"
in A.cpp and #include "file.h" in subdir/file2.h). Usually one doesn't
notice the error, because he uses both the pragma and the canonical
include guard, but that means of course that the speed gain comes to
nothing. Probably it was VC++ 5.0 though.


You are missing the point here. "#pragma once" is not really 
"functional",  in the sense that it's not (or should not be) used 
*alone* to realize the "include me once" effect. That effect is better 
achieved by the canonical include guards that should be used anyway (and 
*will* be used anyway in portable code).

"#pragma once" is just an optimization issue. If a file is included a 
second time in the same TU, a dumb compiler will re-open it and give it 
to the preprocessor who strips its contents entirely because the include 
guard is already defined. Re-opening + preprocessing takes little but 
significant time.

With the "#pragma once" the programmer just gives an hint to the 
compiler that re-opening is unnecessary and ignores the #include 
directive immediately. You see, there *is* a speed gain even in the 
presence of canonical include guards.

Of course, a smarter compiler like g++ will recognize the canonical 
include guards and deduce that re-opening is unnecessary without any 
explicit hint given by the programmer.

[...] In effect, the whole story about file inclusion should be
the other way round: any source file is included at most once for each
TU, unless the programmer requires otherwise ;-)


I agree, but we can't give the burden of avoiding multiple inclusions to 
the "user". Realistically, with large projects such intent is hardly 
achievable by anyone.

Alberto



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: #pragma once

2002-12-29 Thread Alberto Barbati
Gennaro Prota wrote:

[snip] The fact that it is useful to reduce compilation
time is just a result of some compiler writers' attitude to prefer encouraging
the use of non-standard features rather than e.g. recognizing the include guard
idiom and optimize away the subsequent #includes (as for instance g++ does).


I don't want to start a religion war about which compiler is better or 
smarter or encourages a better style. I agree that recognizing the 
include guard idiom is a good thing, probably the best thing to do for a 
compiler.

Yet I'm stuck with MSVC anyway, as many other programmers out there, and 
#pragma once may have a significant effect on compilation time on that 
compiler. I just suggested a way to allow other compilers (there may be 
fewer than I may think of, obviously) to use the #pragma and also to be 
a little more descriptive. It's just a matter of a three line addition 
in config/compiler/visualc.hpp:

#if _MSC_VER >= 1020
#define BOOST_HAS_PRAGMA_ONCE
#endif

for example also Metrowerks CodeWarrior supports #pragma once, so the 
#define could also be added to config/compiler/metrowerks.hpp.

Cheers,

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] #pragma once

2002-12-28 Thread Alberto Barbati
Hi Everybody,

I saw in a lot of boost header files the following lines:

#if _MSC_VER >= 1020
#pragma once
#endif

or even better:

#if _MSC_VER+0 >= 1020
#pragma once
#endif

But not only MS compilers have the pragma once, which is in my opinion 
very useful. Why don't we define a BOOST_HAS_PRAGMA_ONCE in the 
compiler-specific config headers? Then we could write:

#if BOOST_HAS_PRAGMA_ONCE
#pragma once
#endif

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: (corrected) review of optional library

2002-12-14 Thread Alberto Barbati
Fernando Cacciola wrote:

As an example, currently the C++ standard includes
   T & stack::top(), with precondition !(stack.empty()).

Instead, it could be
   may_be & stack::top();  // no precondition required


 could be improved also, instead of:
   pair set::insert(const value_type & x)

we would use
   may_be set::insert(const value_type & x)


Those are interesting examples! Thanks.
They can all be paralleled with optional<> whatever the model and interface
we choose.
Which can of shows that the concept is useful.


If I can say it, I don't think that they are really good examples.

The stack::top signature is wrong, it should return a value and not a 
reference and thus it may require a copy of the returned object. This 
makes the proposed signature is potentially less efficient. Moreover, 
the result of top() could not be used by the caller to modify the 
top-most element. For this reason, a more correct signature would be

may_be stack::top();

but, unfortunately, I don't think it's legal.

The proposed signature of set::insert is a downgrade and not an 
improvement. Even if the element is not inserted, I still may want to 
have the iterator. In order to perform its operation, insert() will have 
to compute such iterator, so what's the point in discarding it?

Just my opinion,

-Alberto



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: unicode support

2002-12-05 Thread Alberto Barbati
Vladimir Prus wrote:

First interpretation is that you're interested in support for
different Unicode encodings, via appropriate facets. Then
Alberto Barbati is the last person who touches this matter,
in
news://news.gmane.org:119/aq72e4$pog$[EMAIL PROTECTED]

I assume he's holding a lock on implementation work. Alberto,
did you get anywhere?


Yes, despite the clear lack of interest from Boosters about this issue, 
I'm still working on it ( but I don't have any "lock" ;) ).

I had a few problems with the interpretation of the standard, but thanks 
to a few guys from comp.std.c++ I can now say that I have a working 
implementation of facets to converts from UTF-8/16/32 (external) to 
UTF-16/32 (internal), with endian variants, a total of 10 facets. The 
implementation fulfill a basic suite of tests on VS.NET with both the 
native STL and STLport.

The facets are conformant to Unicode 3.2 requirements about 
non-characters, use of surrogates and non-shortest UTF-8 sequences. 
After a private discussion with a field expert, I decided to drop the 
UCS-2 facets, so surrogate support is no longer optional. I also decided 
to drop facets with UTF-8 as the internal encoding because they are not 
very useful and the current wording of the C++ standard de facto 
disallows a portable implementation :(. I hope the LWG would consider 
clarifying the issue.

My next steps would be to polish the code, write the docs and prepare a 
more complete test suite. If everything goes well, I think I could 
submit the library for review by the end of the month.

Second interpretation is conversion between all the 8-bit encodings
out there. E.g. from koi8-r to windows-1251. Since there's GNU
iconv already, I'd rather see a tiny wrapper over it. (GNU iconv works
on Windows, too).


Here things become more complex. UTF conversions are just algorithmic 
stuff, easy to do. Other conversions like koi8-r o windows-1251 require 
look-up tables and simply gathering the data for all of them will be 
equivalent to rewriting a part of ICU, which is a huge piece of work.

The idea of wrapping ICU is very interesting. However the Boost policy 
explictly disallows dependencies from external libraries, so this 
solution is out of discussion. Moreover, the only things ICU is missing 
are the conversion facets. I don't see any reason to wrap anything else. 
Unfortunately, as I said before, not all conversions can be portably 
expressed as a facet with the current C++ standard, so even writing 
wrapping facets has little meaning.

Alberto Barbati




___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: boost::pool feature requests

2002-11-23 Thread Alberto Barbati
Thanks Steve, for considering my issues.


4) what's the use of ordered_alloc/ordered_free? I made a few tests and 
they are indeed a bit slower than regular alloc/free, without any 
apparent advantage. Am I missing something?

Keeping the free list ordered allows algorithms that traverse the free list
along with the memory owned by pool to work correctly/more efficiently:
  1) array allocations will be more efficient (pool_allocator keeps its free
list ordered, whereas fast_pool_allocator does not)
  2) release_memory() will work correctly
  3) object_pool uses the ordered property to efficiently implement the
automatic destructor calls for allocated objects


Sorry to bother you, I have a few more questions on this topic:

1) from what I understand, the "ordered" property depends on the fact 
that allocation and deallocation calls are correctly paired, in the 
sense that if I always call free() in the opposite order of the 
respective malloc() calls the pool is still considered to be ordered. Is 
this right? In this special case, would I get a benefit by calling 
ordered_malloc()/ordered_free()?

2) can ordered_malloc() be called on a non-ordered pool? Does the call 
make the pool ordered? In this case, is the complexity still O(1)?

3) can ordered_free() be called on a non-ordered pool? Does the call 
make the pool ordered?

4) ordered_malloc() is described as "merges the free list to preserve 
order", does this mean that unused, but potentially usable, chunks are 
removed from the free list?

I think the "order" property is very useful and powerful, but it's not 
terribly clear from the docs how it can be exploited to full potential. 
The descriptions of the methods are missing post-conditions that are, in 
my opinion, as important as pre-conditions. Change in the "ordered" 
state of the pool could be added more explicitly there. Moreover, if a 
function can be called on both ordered and un-ordered pools, it would be 
interesting to state explicitly if there's a difference in behaviour 
and/or complexity.

Thanks for your patience,

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] boost::pool feature requests

2002-11-21 Thread Alberto Barbati
Hi Everybody,

I recently used library boost::pool, it's a very nice library and 
extremely useful. I would like to submit a few requests and ask one 
question about it.

1) purge_memory() does not reset the member next_size, but in my opinion 
it should. The rationale is that after a purge_memory() the pool should 
be in the exact state it was at the moment of construction. For example, 
in an application of mine, I have a long life pool object (in fact a 
singleton_pool) that is emptied with purge_memory() from time to time. 
The first allocation after purge_memory() allocates a block of size 
next_size, then multiplies next_size by 2. As the next_size is never 
reset, it grows exponentially. Imagine my face when I found that 
allocating a 12 byte object from an empty pool required more than 1Gb...

2) the name release_memory() confuses me. It makes me think that all 
memory is being released, a task accomplished by purge_memory(). I think 
a better name could be release_unused_memory(). This function also 
should reduce the value of next_size, for example by considering the 
size of the largest block left in the pool.

3) I would like to have a function to free the entire pool _without_ 
releasing the memory. The rationale for this is in the example above. 
Most probably, the number of objects allocated in each iteration of my 
application is almost the same. At the end of an iteration, I would like 
to free all allocated objects with one single call (as I have thousands 
of objects, it makes the difference) yet it's a waste of time to free 
the memory, because I will need the same amount again in the next 
iteration. Such function could be called free_all_chunks() or something 
similar.

4) what's the use of ordered_alloc/ordered_free? I made a few tests and 
they are indeed a bit slower than regular alloc/free, without any 
apparent advantage. Am I missing something?

Thanks in advance,

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Library Review (pointer serializationfacility)

2002-11-20 Thread Alberto Barbati
Matthias Troyer wrote:


On Monday, November 18, 2002, at 02:30 PM, Yitzhak Sapir wrote:


I think taking out the pointer facility into a separate class would be 
better design.  By this I mean, that register_type<> and the logic for 
identifying and maintaining pointers would be in a separate class from 
the archive.  The archive would hold an instance of this class (given 
in the constructor), and use member functions of this container class 
to determine what to do when it encounters a new pointer/alias.  But 
it seems to me this is not the case yet (again, correct me if I'm wrong).


I want to second that vote. I would prefer a separate facility for 
pointer serialization, as an add-on to a serialization library if that 
is possible. Robert, what do you think, could it be separated out?

I already proposed a similar facility (under the unfortunate name of 
"registry") in a few previous posts of mine. Yitzhak Sapir has expressed 
my idea with much better words, so you may count my vote for it, too.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Submission version 6

2002-11-17 Thread Alberto Barbati
Robert Ramey wrote:

1) changing the state of the stream while serializing.  My implementation initialized the stream
and never contemplated that the same stream might be used for other things.  That is that
serialized data might be "embedded" as part of a larger stream.

Apparently this is an issue for some people. I don't see it as a large issue but
it as easy to address.  

In fact the issue is so easy to address that I don't understand why we 
are still discussing about it :) If you are willing to accept my 
solution, please say so immediately, so we won't waste any more time.

One method of storing/recovering the data is to use a sequence of characters
or wide characters.  That is a C++ stream.

This has some major benefits:

a) All the code required to convert any C++ datatype into characters or wide
characters exists and is part of the standard library and is guarenteed to work.


This is not true, and I proved it to you with a code snippet in a recent 
post of mine. The standard *does not* provide a way to output (i.e.: to 
write on a disk file) a stream of wide characters. You can put wide 
characters into a wide stream but you will always obtain a file of 
"narrow" characters, obtained through a "degenerate conversion" as 
explictly specified in the standard.

Moreover, I have very bad news. I just found that the C++ implementation 
shipped with .NET is not conformant on this point. Consider the 
following program:

int main()
{
std::wofstream out("test.txt", std::ios::binary);
out << L"I owe you \x20ac 1\n"; // \x20ac is the Euro sign
return 0;
}

On .NET with STLport you get the incorrect, but ANSI-conforming, result:

"I owe you ¬ 1"

'¬' being the character of ASCII code 0xac. On .NET with its native STL 
implementation you get

"I owe you "

the program chokes when writing the Euro sign and leaves the stream in 
"failed" state :( Here Microsoft seems to have really screwed up something.

Another observation:

I note that my test.cpp program includes wchar_t member variables initialized
to values in excess of 256.
The system doesn't seem to lose any informaton in storing/loading to a stream
with classic locale.




I double checked.

I have functions in both char and wchar_t versions of text archives to handle both strings
of chars and wstrings.  This created a couple of problems.  The most obvious was what about
strings containing embedded blanks. - and other punctuation.  Single characters such a space
was also a problem. First I implemented them a sequence of short integers. That worked
fine but I was concerned that it wasted space, was slow, and inconvenient for debugging.
So I made special functions for i/o of string and wstring which just write a string length
and then stream out the string buffer as binary.

So I never have the problem that unicode or local o anything else interfers with my serialization.
This is a side effect of the fact that the usage of the stream was carefully limited to the purpose at hand.


You should triple check, then. Following my previous example, this program:

int main()
{
std::wstring outs(L"I owe you \x20ac 1"), ins;

{
std::wofstream out("test.txt", std::ios::binary);
boost::woarchive ar(out);
ar << outs;
}

{
std::wifstream in("test.txt", std::ios::binary);
boost::wiarchive ar(in);
ar >> ins;
}

assert(outs == ins);
return 0;
}

fails on at least two platforms (.NET/native STL and .NET/STLport), in 
two different ways.

Of course this raises the question why support wstreams at all? We're not using its advantages
(unless we have a lot of unicode text to store) and it doubles the required space.


Let's replace wide streams and archives with narrow ones in the previous 
example. The program indeed run successfully on both STLport and .NET 
native STL, but let's have a look at the archive file:

---begin file
22 serialization::archive 1
0 1 13  73 32 111 119 101 32 121 111 117 32 8364 32 49
---end file

this alternative requires from 2 to 6 (six!) bytes per Unicode 
character. Even up to 12 if you use surrogates, that become 8 if your 
wchar_t is 32-bit wide (:o another platform-specific issue has leaked 
in!). If I had lots of Unicode strings I would have no doubt about which 
is the better solution.

I hope you realize that Unicode output is a lot more complex than it 
seems. I am just asking you to allow the programmer to avoid overriding 
the locale, which still can be the default option. Am I asking too much?

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Re Serialization - locale

2002-11-16 Thread Alberto Barbati
Robert Ramey wrote:


How does this strike you?

	text_iarchive(Stream  &_is) : 
		is(_is)
	{
		// archives always use classic locale
		#ifndef BOOST_NO_STD_LOCALE
		plocale = is.imbue(std::locale::classic());
		#endif
		init();
	}
	~text_iarchive(){
		#ifndef BOOST_NO_STD_LOCALE
		is.imbue(plocale);
		#endif
	}

This solution does not address the objections in my last post in the 
original thread. You seem really concerned about this. We could meet in 
the middle with this solution, instead:

 	text_iarchive(Stream  &_is, bool _overrideLocale = true) :
 		is(_is)
 	{
 		// archives always use classic locale
 		#ifndef BOOST_NO_STD_LOCALE
if(_overrideLocale)
 			plocale = is.imbue(std::locale::classic());
		else
			plocale = is.getloc();
 		#endif
 		init();
 	}

Another observation:

I note that my test.cpp program includes wchar_t member variables initialized to values in excess of 256.
The system doesn't seem to lose any informaton in storing/loading to a stream with classic locale.


Which platform are you working on? On Win32, VC++ 6sp5, STLport the 
following test program produces the output 52 (0x34) instead of 4660 
(0x1234). According to the standard, this behaviour is perfectly 
conformant. (the ios::binary is required, because the I/O library could 
apply CRLF translation to a part of a two-byte character).

#include 
#include 

int main()
{
wchar_t a = L'\x1234';

std::wofstream out("test.txt", std::ios::binary);
out << a;
out.close();

std::wifstream in("test.txt", std::ios::binary);
in >> a;
in.close();

std::cout << unsigned(a) << std::endl;
return 0;
}

If this code produces the correct output on your platform, either your 
char type has 16 bits or *your* platform is non-conformant, according to 
my interpretation of the standard.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Submission version 6

2002-11-15 Thread Alberto Barbati
Robert Ramey wrote:
>

register_cross_program_class_identifier(const char *id="T")

>

An alternative could be to use register_type<> as it is, but augment the 
serialization traits class to provide a

const char* serialization::get_cross_program_class_identifier();

This solution has the advantage that the identifier string can be 
physically located near the load/save/version functions (which are 
usually near the class itself).

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: String algorithm library

2002-11-15 Thread Alberto Barbati
Rozental, Gennadiy wrote:

Seq& trim_copy( Seq& input, Seq& trim_func( Seq&, const std::locale& ) );


In my opinion all algorithms without suffix should perform in-place 
operations, while the ones that make copies should have suffix "_copy". 
That would be more intuitive and also more consistent with the STL.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Submission version 6

2002-11-15 Thread Alberto Barbati
Vahan Margaryan wrote:

Eric Woodruff wrote:

type_info is not portable in the slightest.

I realize that. I just pointed out that it's not so convenient to have
user-supplied string ids because of the template classes.


As pointed out by Robert, the user-supplied string id could be made 
optional. For the lazy user we might imagine a default value obtained in 
some programmatic way, for example a possibly pre-processed 
type_info::name().

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Submission version 6

2002-11-15 Thread Alberto Barbati
Robert Ramey wrote:

I believe I have found the an acceptable resolution to the "registration" cunundrum.


(note: I consider this "registration" topic is a different issue from my 
 registry class proposal. This one relates to "identification" of user 
classes, while mine is just an issue of factorizing responsibilities 
among library classes)

I agree that this solution of the "identification" issue is quite right 
and would be very beneficial to the overall usefulness of the library. 
Bravo to Robert and Vladimir.

I conjure up something like (pseudo code):

register_cross_program_class_identifier(const char *id="T")


The perfect place for this function could be as a method of my registry 
class, don't you think? ;)

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Submission version 6

2002-11-15 Thread Alberto Barbati
Vladimir Prus wrote:

Robert Ramey wrote:

Now I remember why I included this.

Suppose that an archive is created where the default local is a 
spanish speaking country where the number 123 thousand is written

123.000

The archive is sent to another country where the default locale is an 
english speaking country where the string

123.000

means 123
That's why I set the local to classic.

I'm well aware of this issue. See below.


is it a good idea to change stream locale without user's consent. Maybe,
archive should create *their own* (i/o)stream, sharing streambuffer with
the stream the user has passed, and with appropriately modified locale?


In my opinion it the archive should not change the stream locale without 
the programmer's consent. The main reason for this is that she may 
indeed want to use her own locale, for example to allow Unicode output.

Ovverriding only num_put/num_get (and why not ctype also?) is not a nice 
solution, in my opinion, it's just a hack. Moreover, I can imagine a 
brave programmer that is aware that her serialized data will not be read 
by any other application except hers and decides to have the text output 
to follow her native language conventions.

In the end, between the two possibilities:

1) override the locale (entirely or partially), reducing programmer's 
freedom of customizing the output but guaranteeing a perfect portable 
output;

2) not override the locale, leaving to the programmer the complete 
responsiblity to set the right one that satisfies her specific 
requirements, with the risk that she messes things up;

I vote without doubt for number 2.

To be paranoid, on output we could write in each archive a magic number 
like 12345.678, on input we try to read it and if it doesn't match the 
magic number, we issue an error. I know that this hack won't catch 100% 
of the problems, but it will catch most of them and is not less safe 
than writing the sizeof() of the basic types as we are currently doing.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Library Review

2002-11-14 Thread Alberto Barbati
Robert Ramey wrote:

Alberto Barbati wrote

Please note that the "registry" class I described *does not* attempt to 
solve the broader issue of UUIDs I read about in the discussion between 
you and Vladimir. My proposal is just a way to separate the 
"registration" part from the "serialization" into two different 
compontents. Still the classes will be matched according to their 
registration order, as it happens now. I am ready to discuss the 
opportunity and/or usefulness of this approach, but I don't see the 
reason why this could not be done.


I considered this approach and found the following problem



It seems from the objections you raise that I did not explain myself 
well enough. Before answering to them, I think a code snippet would 
help. Consider this code, that uses current implementation:

---begin code

  std::ofstream s("...");
  boost::oarchive a(s);
  a.register_type();
  a.register_type();
  a.register_type();
  a.register_type();

  // do serialization

---end code

What I am suggesting is to allow, possibly in addition to that form, the 
following form:

---begin code

  boost::serialization_registry  r;
  r.register_type();
  r.register_type();
  r.register_type();
  r.register_type();

  std::ofstream s("...");
  boost::oarchive a(s, r); // in the body of the constructor an
   // equivalent of register_type()
   // is immediately called 4 times
  // do serialization

---end code

I am *not* suggesting to change in *any way* the serialization process. 
The purpose of serialization_registry is just to allow for a finer 
granularity of responsibilities.

> a) register all the types in the global collection in the archive.
> bad idea - this would require that the reading program register
> all the types of the writing program.  An intolerable requirement

With my approach you are going to register the same types you would 
register anyway. Not one less, not one more. Why would it be intolerable?

> b) register types as needed as the library is written
> wouldn't work - on loading, we wouldn't know which types to register
> c) after creating the archive, append a "registration file"  on loading,
> process the "registration file first. In my view this cure is worse
> than the disease.

The types are going to be physically output in exactly the same way as 
it's being done by current implementation. So this two objections do not 
apply.

So you may be wondering, what's the point in having this registry class? 
There are two main advantages:

1) the module that sets up the registry can be distinct from the one 
that effectively performs the serialization. This can solve a lot of 
dependency issues.

2) the registration can be done in *one* place for *both* input and 
output. With current implementation, the registration code will be 
duplicated and duplication is always a bad thing. Imagine yourself 
trying to keep two (possibly very) long lists of register_type calls 
synchronized.

I'm still at a loss.  Aside from addressing the "classic" above, what exactly
do you recommend I do?  How about if I change the wording to specify
that the library supports wide streams and leave unicode out of it?


Yes, it all boils down to change the wording as you said: you just can't 
mention Unicode. I got a little carried away because I'm working on the 
subject and I'm having a hard time getting people aware of the issue. I 
apologize for that.

On a different topic, I found a portability issue. The current 
implementation record in the archives the size of the basic types int, 
long, float and double. This gives you a *false* sense of security that 
the "writing" and "reading" platform agree on the type size.


this is for the native binary archive which is explicitly described as
being non-portable.  It was included because some users felt it
would be more efficient.  It has no pretensions at all to portability.
Knowing that some one will ignore this admonishment and
try to move such a binary to another machine architecture, 
Included free detection so that it would crash in a more 
graceful manner.

All right, but still the size_t issue with read_string/write_string is a 
bug, as it does not involve cross-platform. With current implementation 
a program will fail reading its own serialized strings on a least one 
plaform (i64).

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Library Review

2002-11-13 Thread Alberto Barbati
Robert Ramey wrote:

From: Alberto Barbati <[EMAIL PROTECTED]>
1) I don't like the non-intrusive way of specifying the 
version/save/load operation.
a non-intrusive method is required to implement serialization for classes that
you don't want to change.  For example, the library includes serialization
for all STL containers with without changing STL itself.  This would
permit easy and optional addition of serialization to any class that
might benefit from it


As pointed out by Vladimir, I am not disputing the presence of the 
non-intrusive method, which is indeed necessary. It's the way that is 
implemented (through template specialization) that I don't like. Free 
functions are a much better option, in my opinion.

2) A most needed addition to the design is to provide a sort of 
"registry" object.

This has been a hot topic.  It is really not possible to achieve the
desired results.  I will add a section to th rationale explaining this
in detail, 

Please note that the "registry" class I described *does not* attempt to 
solve the broader issue of UUIDs I read about in the discussion between 
you and Vladimir. My proposal is just a way to separate the 
"registration" part from the "serialization" into two different 
compontents. Still the classes will be matched according to their 
registration order, as it happens now. I am ready to discuss the 
opportunity and/or usefulness of this approach, but I don't see the 
reason why this could not be done.

One note: the library, as it is, *does not* support Unicode output, as 
stated. The library supports wide streams, yes, but that does not mean 
Unicode support.
So what do I have to do exactly in the warchive specialization to generat
Unicode output?


As I said in my post, you have to imbue the stream with a locale holding 
the correct codecvt facet, before using it to serialize. Unfortunately, 
such a facet is not part of the ANSI standard and is the subject of a 
separate proposal of mine (see thread "codecvt facets for 
utf8/utf16/utf32").

BTW, with the current implementation even doing so is completely 
useless, as there are lines like this:

os.imbue(std::locale::classic());

that reset the locale of the streams to the "dumb" default. Such lines 
are IMO both unnecessary and conceptually wrong, and should be removed.

On a different topic, I found a portability issue. The current 
implementation record in the archives the size of the basic types int, 
long, float and double. This gives you a *false* sense of security that 
the "writing" and "reading" platform agree on the type size. First 
objection: you should check also number of bits of a char, which is not 
necessarily 8 bits, and the size of shorts. However, it gets worse. On 
both x86 and i64 plaforms int, long and float are 4 bytes, while double 
is 8 bytes, so they "agree" according to this test. However, type size_t 
on x86 has 4 bytes, while on i64 has 8 bytes. It's not hard to imagine 
an application that tries to serialize a size_t. The library won't 
detect the issue, but the compiler will use a different overloads of 
operators << and >> with clearly bad consequences. The same problem may 
happen with other typedef'd types. Maybe it would be a good idea to add 
in the documentation a note warning about this case and encouraging the 
use of fixed-size types (like boost::int8_t, etc.) for the members of a 
serializable class, at least when multi-plaform is an issue.

Moreover, this issue is now present in library itself! In file 
archive.cpp, function write_string() writes the length of the string as 
a size_t, while read_string() reads it as an unsigned int, which may 
have a different size.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] Re: Serialization Library Review

2002-11-12 Thread Alberto Barbati
rams with two different compilers. That was not 
a deep test with live data, although I plan to do it in the next few 
days. I tested both polymorphic and non-polymorphic types, but no templates.

On VC++ 7.0 (.NET) it all went good at the first try. However the 
restriction, cited in the documentation, of having to add /OPT:NOREF to 
the linker options is too hard to swallow. In a typical setting, this 
can increase the executable size by up to 100% or even more, as such is 
the typical size of unreferenced data. This problem will have to be 
addressed explicitly.

On Metrowerks CodeWarrior I had a few problems. A few of them I already 
described above. Another one come from the fact that the library uses 
(correctly) the trait is_base_and_derived in the implementation of 
template class base_object. is_base_and_derived depends on 
is_convertible that is broken on CodeWarrior. I had to comment out the 
static assert line to let the program compile.

- How much effort did you put into your evaluation? A glance? A quick 
reading? In-depth study?

I read the documentation quickly, then write those test programs. I 
traced the execution in the debugger to discover the seekp(0) issue and 
to have glance at the library internals, but that does not account for 
in-depth study.

- Are you knowledgeable about the problem domain?

I can say that I am a bit knowledgeable. I worked on (and know all 
quirks of) the MFC implementation and tried myself at least two 
different approaches, although they were limited in the scope of the 
respective applications and not general-purpose components.

- Do you think the library should be accepted as a Boost library?

My opinion is that the library should not be accepted as it is, but has 
huge potential. There is indeed little space for improvements, but a few 
features, such as the registration of polymorphic types, should 
defintely be addressed before prime time.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] config missing #define for Metrowerks compiler EH support

2002-11-11 Thread Alberto Barbati
Hi,

would it be possible to add the following piece of code somewhere in 
boost\config\Metrowerks.hpp:

- begin code

#if !__option(exceptions)
#   define BOOST_NO_EXCEPTIONS
#endif

- end code

I guess the code is self-explanatory.

Alberto Barbati



___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost