Re: Encode-InCharset-0.01 Released

2002-05-03 Thread Roman Vasicek

> I have just released Encode-InCharset-0.01, available as
>
>   http://www.dan.co.jp/~dankogai/Encode-InCharset-0.01.tar.gz and CPAN.
>
> I have developed this module primarily to implement ISO-2022-JP-3 and
> ISO-2022-CN in future.  To implement encode() in these, you have to know
> which character set a given character belongs.  But this module can also
> be used if a string can safely be encoded
> (Though fallback is much faster).
>
> Dan the Encode Maintainer

Great! Good work.

I have one, may be off topic question. Is there module which provide the
same functionality for languages? I mean something like IsGerman, IsCzech,
etc.

-- 
 best regards
  Ing. Roman Vasicek

 software developer
++
 PetaMem s.r.o., Drahobejlova 27/1019, 190 00 Praha 9 - Liben, Czech republic
 http://www.petamem.com/




InLanguage properties? [Was Re: Encode-InCharset-0.01 Released]

2002-05-03 Thread Dan Kogai

On Friday, May 3, 2002, at 04:33 , Roman Vasicek wrote:
>> On Friday, May 3, 2002, at 02:41 , Dan Kogai wrote:
>>
>> I have just released Encode-InCharset-0.01, available as
>>
>>  http://www.dan.co.jp/~dankogai/Encode-InCharset-0.01.tar.gz and CPAN.
>>
>> I have developed this module primarily to implement ISO-2022-JP-3 and 
>> ISO-2022-CN in future.  To implement encode() in these, you have to 
>> know which character set a given character belongs.  But this module 
>> can also be used if a string can safely be encoded
>> (Though fallback is much faster).
>>
> Great! Good work.
>
> I have one, may be off topic question. Is there module which provide the
> same functionality for languages? I mean something like IsGerman, 
> IsCzech,
> etc.

   Be our guest ;)  To my knowledge there is none but it won't be too 
hard to implement -- for Roman script languages.  You just start with 
ISO_8599 variants and subtract the ones you don't need.

   I consider this be one of the problems of Unicode (as of now).  When 
you aggregate anything, usually the source of origin is lost.  It is 
just the same as you can't retrieve 1+1 back from 2 (it could be 0+2 or 
-1+3 or anything).
   To overcome this shortage Unicode does have character properties and 
you can get which I

[Encode] 1.68 Released

2002-05-03 Thread Dan Kogai

I am delighted to add the first female to AUTHORS when I released 
Encode, available as follows;

Whole:
http://www.dan.co.jp/~dankogai/Encode-1.68.tar.gz
Diff against current: 106 lines
http://www.dan.co.jp/~dankogai/current-1.68.diff.gz

Changes is just one paragraph long.

$Revision: 1.68 $ $Date: 2002/05/03 12:20:13 $
! lib/Encode/Alias.pm lib/Encode/Supported.pod t/Alias.t AUTHORS
   UCS-4 added to aliases of UTF-32 by Elizabeth Mattijsen.  Alias.t
   and Supported.pod modified to reflect the change.  Elizabeth added
   to Authors.  And H.M. is also added for forwarding her patch among
   other contributions (I was rather surprised to find his name was not
   there yet!)
   Message-Id: <[EMAIL PROTECTED]>

.if there is one kind of diversity that is lacking in Perl, it is 
definitely sex ratio.  In terms of the sheer number of sex it is already 
diverse than an ordinary world for Perl mongers have female, male, and 
the Borg, :P

Dan the Encode Maintainer / the Equal Opportunity Whippee




Re: Encode-InCharset-0.01 Released

2002-05-03 Thread Jarkko Hietaniemi

On Fri, May 03, 2002 at 09:33:17AM +0200, Roman Vasicek wrote:
> > I have just released Encode-InCharset-0.01, available as
> >
> >   http://www.dan.co.jp/~dankogai/Encode-InCharset-0.01.tar.gz and CPAN.
> >
> > I have developed this module primarily to implement ISO-2022-JP-3 and
> > ISO-2022-CN in future.  To implement encode() in these, you have to know
> > which character set a given character belongs.  But this module can also
> > be used if a string can safely be encoded
> > (Though fallback is much faster).
> >
> > Dan the Encode Maintainer
> 
> Great! Good work.
> 
> I have one, may be off topic question. Is there module which provide the
> same functionality for languages? I mean something like IsGerman, IsCzech,
> etc.

The mapping from/to charsets to languages is very complex.  The best
database I know of can be seen at http://www.eki.ee/

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: Encode-InCharset-0.01 Released

2002-05-03 Thread Jarkko Hietaniemi

> The mapping from/to charsets and languages is very complex.  The best
> database I know of can be seen at http://www.eki.ee/

http://www.eki.ee/letter/

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



[ANNOUNCE] Apache::GuessCharset

2002-05-03 Thread Tatsuhiko Miyagawa

Apache::GuessCharset is a PerlFixupHandler to demonstrate bleeding
edge perl's powerful encoding detection, thanks to Encode module.

It's now going on its way to CPAN, or also on
http://bulknews.net/lib/archives/

NAME
Apache::GuessCharset - adds HTTP charset by guessing file's encoding

SYNOPSIS
  PerlModule Apache::GuessCharset
  SetHandler perl-script
  PerlFixupHandler Apache::GuessCharset

  # how many bytes to read for guessing (default 512)
  PerlSetVar GuessCharsetBufferSize 1024

  # list of encoding suspects
  PerlSetVar GuessCharsetSuspects euc-jp
  PerlAddVar GuessCharsetSuspects shiftjis
  PerlAddVar GuessCharsetSuspects 7bit-jis

DESCRIPTION
Apache::GuessCharset is an Apache handler which adds HTTP charset
attribute by automaticaly guessing file' encodings via Encode::Guess.

CONFIGURATION
This module uses following configuration variables.

GuessCharsetSuspects
a list of encodings for "Encode::Guess" to check. See the
Encode::Guess manpage for details.

GuessCharsetBufferSize
specifies how many bytes for this module to read from source file,
to properly guess encodings. default is 512.

AUTHOR
Tatsuhiko Miyagawa <[EMAIL PROTECTED]>

This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

SEE ALSO
the Encode::Guess manpage, the Apache::File manpage




-- 
Tatsuhiko Miyagawa <[EMAIL PROTECTED]>