RE: [Haskell-cafe] Re: Writing binary files?

2004-09-17 Thread MR K P SCHUPKE

You wouldn't want to have to accumulate the
entire body as a single byte string

Ever heard of lazyness? Haskell does it quite well... Accumulating
the entire body doesn't really do this because haskell is lazy. You
don't need a more complex interface in Haskell!

Keean.
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Re: Writing binary files?

2004-09-17 Thread Glynn Clements

MR K P SCHUPKE wrote:

 You wouldn't want to have to accumulate the
 entire body as a single byte string
 
 Ever heard of lazyness? Haskell does it quite well... Accumulating
 the entire body doesn't really do this because haskell is lazy. You
 don't need a more complex interface in Haskell!

Are you sure that will work in the general case? Or are you assuming
lazy I/O?

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Interoperability with other languages and haskell in industry

2004-09-17 Thread Vincenzo aka Nick Name
On Thursday 16 September 2004 20:27, Andy Moran wrote:
 I'd like to say that this approach has worked for us time and time
 again, but, to date, we've never had to rewrite a slow component in C
 :-)  For us, C interoperability has always been a case of linking to
 third party software, or for writing test harnesses to test generated
 C.


The point is that perhaps we will not have a prototype but a single 
implementation (not that I think it's a good idea in the general case, 
but we will write a relatively simple bookkeeping application). However 
I realize that one can write a great part of the software in a single 
language. The point is providing an escape to java, C++, C#, python or 
other in vogue languages in case we find that it's difficult to 
interface with legacy systems, or we don't find a coder to hire in the 
future. So the point is not to rewrite something in C for efficiency, 
but rather to be able to say ok, this component is written in haskell 
and will stay this way, but the rest of the system won't be haskell 
anymore. However:

 Things are different if your application is multi-process and/or
 distributed, and you're not going to be using an established protocol
 (like HTTP, for instance).  In that case, you might want to look at
 HDirect (giving access to CORBA, COM, DCOM), if you need to talk to
 CORBA/COM/DCOM objects.  There are many simple solutions to RPC
 available too, if that's all you need.

I see that there is for example xmlrpc that should fit my little 
interoperability needs, and would have liked to hear some experience on 
that route. Your reply is incouraging, though, since you didn't need 
any other language at all. That's my hope, too.

Bye and waiting for that other famous hakell-using company that I didn't 
mention to attend this discussion :)

Vincenzo
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Proofs for program testing

2004-09-17 Thread Henning Thielemann

I saw that many introductions to Haskell contain proofs of properties of
implemented functions. This is probably due to the fact that pure
functions can be handled more easily than imperative programs with hidden
states. I wondered whether one can use proofs of Haskell functions for
testing. 

I found QuickCheck
 http://www.cs.chalmers.se/~rjmh/QuickCheck/
  which, as far as I understand, relies on random inputs and I found
 http://homepages.inf.ed.ac.uk/wadler/realworld/era.html
  which sounds like some GUI driven program.

Is there something that can be used for automatical testing, e.g. for
darcs' tests to check patch integrity?

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Writing binary files?

2004-09-17 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 What I'm suggesting in the above is to sidestep the encoding issue
 by keeping filenames as byte strings wherever possible.

Ok, but let it be in addition to, not instead treating them as
character strings.

 And program-generated email notifications frequently include text with
 no known encoding (i.e. binary data).

No, programs don't dump binary data among diagnostic messages. If they
output binary data to stdout, it's their only output and it's redirected
to a file or another process.

 Or are you going to demand that anyone who tries to hack into your
 system only sends it UTF-8 data so that the alert messages are
 displayed correctly in your mail program?

The email protocol is text-only. It may mangle newlines, it has
a maximum line length, some texts may be escaped during transport
(e.g. From  at the beginning of a line). Arbitrary binary data
should be put in base64-or-otherwise-encoded attachments.

If the cron program embeds the output as email body, the cron job
should not dump arbitrary binary data to stdout. Encoding is not the
only problem.

 Processing data in their original byte encodings makes supporting
 multiple languages harder. Filenames which are inexpressible as
 character strings get in the way of clean APIs. When considering only
 filenames, using bytes would be sufficient, but in overall it's more
 convenient to Unicodize them like other strings.

 It also harms reliability. Depending upon the encoding, two distinct
 byte strings may have the same Unicode representation.

Such encodings are not suitable for filenames.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg00376.html

| ISO-2022-JP will never be a satisfactory terminal encoding (like
| ISO-8859-*, EUC-*, UTF-8, Shift_JIS) because
|
| 1) It is a stateful encoding. What happens when a program starts some
| terminal output and then is interrupted using Ctrl-C or Ctrl-Z? The
| terminal will remain in the shifted state, while other programs start
| doing output. But these programs expect that when they start, the
| terminal is in the initial state. The net result will be garbage on
| the screen.
|
| 2) ISO-2022-JP is not filesystem safe. Therefore filenames will never
| be able to carry Japanese characters in this encodings.
|
| Robert Brady writes:
|  Does ISO-2022 see much/any use as the locale encoding, or it it just used
|  for interchange?
|
| Just for interchange.
|
| Paul Eggert searched for uses of ISO-2022-JP as locale encodings (in
| order to convince me), and only came up with a handful of questionable
| URLs. He didn't convince me. And there are no plans to support
| ISO-2022-JP as a locale encoding in glibc - because of 1) and 2) above.

For me ISO-2022 is a brain-damaged concept and should die. Almost
nothing supports it anyway.

 Such tarballs are not portable across systems using different encodings.

 Well, programs which treat filenames as byte strings to be read from
 argv[] and passed directly to open() won't have any problems with this.

The OS itself may have problems with this; only some filesystems
accept arbitrary bytes apart from '\0' and '/' (and with the special
meaning for '.'). Exotic characters in filenames are not very
portable.

 A Haskell program in my world can do that too. Just set the encoding
 to Latin1.

 But programs should handle this by default, IMHO.

IMHO it's more important to make them compatible with the
representation of strings used in other parts of the program.

 Filenames are, for the most part, just tokens to be passed around.

Filenames are often stored in text files, whose bytes are interpreted
as characters. Applying QP to non-ASCII parts of filenames is suitable
only if humans won't edit these files by hand.

  My specific point is that the Haskell98 API has a very big problem due
  to the assumption that the encoding is always known. Existing
  implementations work around the problem by assuming that the encoding
  is always ISO-8859-1.
 
 The API is incomplete and needs to be enhanced. Programs written using
 the current API will be limited to using the locale encoding.

 That just adds unnecessary failure modes.

But otherwise programs would continuously have bugs in handling text
which is not ISO-8859-1, especially with multibyte encoding where
pretending that ISO-8859-2 is ISO-8859-1 too often doesn't work.

I can't switch my environment to UTF-8 yet precisely because too many
programs were written with the attitude you are promoting: they don't
care about the encoding, they just pass bytes around.

Bugs range from small annoyances like tabular output which doesn't
line up, through mangled characters on a graphical display, to
full-screen interactive programs being unusable on a UTF-8 terminal.

 This encoding would be incompatible with most other texts seen by the
 program. In particular reading a filename from a file would not work
 without manual recoding.

 We already have that problem; you can't read non-Latin1 

Re: [Haskell-cafe] Writing binary files?

2004-09-17 Thread Glynn Clements

Marcin 'Qrczak' Kowalczyk wrote:

  What I'm suggesting in the above is to sidestep the encoding issue
  by keeping filenames as byte strings wherever possible.
 
 Ok, but let it be in addition to, not instead treating them as
 character strings.

Provided that you know the encoding, nothing stops you converting them
to strings, should you have a need to do so.

  Processing data in their original byte encodings makes supporting
  multiple languages harder. Filenames which are inexpressible as
  character strings get in the way of clean APIs. When considering only
  filenames, using bytes would be sufficient, but in overall it's more
  convenient to Unicodize them like other strings.
 
  It also harms reliability. Depending upon the encoding, two distinct
  byte strings may have the same Unicode representation.
 
 Such encodings are not suitable for filenames.

Regardless of whether they are suitable, they are used.

 For me ISO-2022 is a brain-damaged concept and should die.

Well, it isn't likely to.

I haven't addressed any of the other stuff about ISO-2022, as it isn't
really relevant. Whether ISO-2022 is good or bad doesn't matter; what
matters is that it is likely to remain in use for the foreseeable
future.

  Such tarballs are not portable across systems using different encodings.
 
  Well, programs which treat filenames as byte strings to be read from
  argv[] and passed directly to open() won't have any problems with this.
 
 The OS itself may have problems with this; only some filesystems
 accept arbitrary bytes apart from '\0' and '/' (and with the special
 meaning for '.'). Exotic characters in filenames are not very
 portable.

No, but most Unix programs manage to handle them without problems.

  A Haskell program in my world can do that too. Just set the encoding
  to Latin1.
 
  But programs should handle this by default, IMHO.
 
 IMHO it's more important to make them compatible with the
 representation of strings used in other parts of the program.

Why?

  Filenames are, for the most part, just tokens to be passed around.
 
 Filenames are often stored in text files,

True.

 whose bytes are interpreted as characters.

Sometimes true, sometimes not.

Where filenames occur in data files, e.g. configuration files, the
program which reads the configuration file typically passes the bytes
directly to the OS without interpretation.

 Applying QP to non-ASCII parts of filenames is suitable
 only if humans won't edit these files by hand.

Who said anything about QP?

   My specific point is that the Haskell98 API has a very big problem due
   to the assumption that the encoding is always known. Existing
   implementations work around the problem by assuming that the encoding
   is always ISO-8859-1.
  
  The API is incomplete and needs to be enhanced. Programs written using
  the current API will be limited to using the locale encoding.
 
  That just adds unnecessary failure modes.
 
 But otherwise programs would continuously have bugs in handling text
 which is not ISO-8859-1, especially with multibyte encoding where
 pretending that ISO-8859-2 is ISO-8859-1 too often doesn't work.

Why?

 I can't switch my environment to UTF-8 yet precisely because too many
 programs were written with the attitude you are promoting: they don't
 care about the encoding, they just pass bytes around.

That's all that many programs should be doing.

 Bugs range from small annoyances like tabular output which doesn't
 line up, through mangled characters on a graphical display, to
 full-screen interactive programs being unusable on a UTF-8 terminal.

IOW:

1. display doesn't work correctly,
2. display doesn't work correctly, and
3. display doesn't work correctly.

You keep citing cases involving graphical display as a reason why all
programs should be working with characters all of the time.

I haven't suggested that programs should never deal with characters,
yet you keep insinuating that is my argument, then proceed to attack
it.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe