Re: [Python-Dev] Unicode database

2007-08-10 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:

 Sure. But (again): you don't need to have the mappings at all for
 what you want to achieve. So there is no point in downloading them

Sigh.  No, I don't.  But, if I want to be able to merge anything
back into the main Python source, it is a VERY good idea to use the
existing mechanisms and not invent new ones.

The easiest thing would have been to hack re.py to create a Unicode
table using unicodedata.py directly, and that would indeed be a rather
cleaner solution in the long term.  But it would have meant that there
were now multiple different ways of generating the Unicode data for
_sre.c, and that would have led to inconsistencies.

As I pointed out, there is already a problem where upgrading the data
needs a complete rebuild to get all of the Unicode data back in step;
'make all' in itself does not work.  That is precisely the sort of
problem that is caused by having duplicate update mechanisms.


Now, IF I can work out how the _sre.c engine works enough to put
atomic/possessive quantifiers in, this problem will return.  My
question would be how best to make a suitable proposal that, inter
alia, includes changes that can't be made by the normal building
mechanisms.

And I still don't have a clue about that one.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-10 Thread Martin v. Löwis
 Sure. But (again): you don't need to have the mappings at all for
 what you want to achieve. So there is no point in downloading them
 
 Sigh.  No, I don't.  But, if I want to be able to merge anything
 back into the main Python source, it is a VERY good idea to use the
 existing mechanisms and not invent new ones.

I think you still don't understand. Why I keep calling mappings
is *unrelated* to unicodedata. unicodedata is a different database, and
not related at all to the makefile. It never was.

 As I pointed out, there is already a problem where upgrading the data
 needs a complete rebuild to get all of the Unicode data back in step;
 'make all' in itself does not work.  That is precisely the sort of
 problem that is caused by having duplicate update mechanisms.

Right. Downloading the necessary files is a completely manual process,
not supported at all by make all, which is designed to do something
entirely different.

 Now, IF I can work out how the _sre.c engine works enough to put
 atomic/possessive quantifiers in, this problem will return.  My
 question would be how best to make a suitable proposal that, inter
 alia, includes changes that can't be made by the normal building
 mechanisms.
 
 And I still don't have a clue about that one.

You lost me somewhere. What are changes that can't be made by the
normal building process, and what is this problem that will
return?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-09 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:

  I think that you will find that you are using a non-standard
  environment and set of Python sources.

 Please trust me that I didn't. See below.

I always trust people as much as I trust myself, but I do tend to
check up.  See below.

 Ah, the makefile. I don't think you use it create the Unicode database.
 
 It's only good for generating the codecs (Lib/encodings)

Yes, but it DOES attempt to download the mappings, and is the ONLY
script which attempts to do so.

beelzebub$find Python-2.5.1 -type f | wc
   34583460  135981
beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org
Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available from 
\url{ftp://ftp.unicode.org/}.
grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory
grep: Image.icns: No such file or directory
grep: Python-2.5.1/Mac/Icons/Python: No such file or directory
grep: Folder.icns: No such file or directory
Python-2.5.1/Misc/NEWS:  at ftp.unicode.org and contain a few updates (e.g. the 
Mac OS
Python-2.5.1/Tools/unicode/Makefile:# files available at ftp://ftp.unicode.org/
Python-2.5.1/Tools/unicode/Makefile:ncftpget -R ftp.unicode.org . 
Public/MAPPINGS
Python-2.5.1/Tools/unicode/gencodec.py:site 
(ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec
Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the
Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT
Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:#   
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:#   
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT
Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from 
ftp://ftp.unicode.org/.\n

 AFAICT, the mappings are still where they always were: at the
 location given in the Makefile. (e.g.
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
 )

Then you DEFINITELY are using a non-standard set of files.  That
above was from the source of Python 2.5.1 that I have just downloaded.

 Did you really believe the Unicode consortium doesn't have the
 old versions of the character database online? Do you think
 they are complete fools?

Please don't be offensive.  I said that I had failed to find them,
after searching the Unicode Web site.  Now that you have give me
the actual file name, I can find them, but searching on the version
and request for that database leads to unhelpful files.

 Googling for unicode 3.2 ucd gives me
 
 http://unicode.org/Public/3.2-Update/
 
 as the top hit (of course, you have to know that they call
 the character database ucd to invoke that query).

Generally, I distrust Google for such things, as it is as likely
to lead to you the wrong information as the right one.  For example,
that hit you found was on a different logical server, and could
well be an incorrect version of the database.  It is VERY common
for such things to 'escape' into Google.

Have you checked whether or not that file is correct with the
Unicode consortium?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-09 Thread M.-A. Lemburg
Nick Maclaren wrote:
 Ah, the makefile. I don't think you use it create the Unicode database.

 It's only good for generating the codecs (Lib/encodings)
 
 Yes, but it DOES attempt to download the mappings, and is the ONLY
 script which attempts to do so.

Of course it does. The Tools/unicode/Makefile is meant to simplify
recreating the codecs from the (possibly updated) mapping on the Unicode
site.

If it doesn't work for you, that may well be possible, since I wrote
the Makefile and the other related stuff in that directory to help me
with updating the codecs from the mappings. It's only checked in for
convenience.

 beelzebub$find Python-2.5.1 -type f | wc
34583460  135981
 beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org
 Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available 
 from \url{ftp://ftp.unicode.org/}.
 grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory
 grep: Image.icns: No such file or directory
 grep: Python-2.5.1/Mac/Icons/Python: No such file or directory
 grep: Folder.icns: No such file or directory
 Python-2.5.1/Misc/NEWS:  at ftp.unicode.org and contain a few updates (e.g. 
 the Mac OS
 Python-2.5.1/Tools/unicode/Makefile:# files available at 
 ftp://ftp.unicode.org/
 Python-2.5.1/Tools/unicode/Makefile:ncftpget -R ftp.unicode.org . 
 Public/MAPPINGS
 Python-2.5.1/Tools/unicode/gencodec.py:site 
 (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec
 Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the
 Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT
 Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
 Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT
 Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from 
 ftp://ftp.unicode.org/.\n
 
 AFAICT, the mappings are still where they always were: at the
 location given in the Makefile. (e.g.
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
 )
 
 Then you DEFINITELY are using a non-standard set of files.  That
 above was from the source of Python 2.5.1 that I have just downloaded.

No idea where you get that impression from, but then I'm not really
sure what you're after anyway ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 09 2007)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode database

2007-08-09 Thread Martin v. Löwis
 Ah, the makefile. I don't think you use it create the Unicode database.

 It's only good for generating the codecs (Lib/encodings)
 
 Yes, but it DOES attempt to download the mappings, and is the ONLY
 script which attempts to do so.

Sure. But (again): you don't need to have the mappings at all for
what you want to achieve. So there is no point in downloading them

 beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org
 Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available 
 from \url{ftp://ftp.unicode.org/}.
 grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory
 grep: Image.icns: No such file or directory
 grep: Python-2.5.1/Mac/Icons/Python: No such file or directory
 grep: Folder.icns: No such file or directory
 Python-2.5.1/Misc/NEWS:  at ftp.unicode.org and contain a few updates (e.g. 
 the Mac OS
 Python-2.5.1/Tools/unicode/Makefile:# files available at 
 ftp://ftp.unicode.org/
 Python-2.5.1/Tools/unicode/Makefile:ncftpget -R ftp.unicode.org . 
 Public/MAPPINGS
 Python-2.5.1/Tools/unicode/gencodec.py:site 
 (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec
 Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the
 Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT
 Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
 Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:#   
 ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT
 Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from 
 ftp://ftp.unicode.org/.\n
 
 AFAICT, the mappings are still where they always were: at the
 location given in the Makefile. (e.g.
 ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
 )
 
 Then you DEFINITELY are using a non-standard set of files.  That
 above was from the source of Python 2.5.1 that I have just downloaded.

I don't understand. Why does this follow? What should I read out
of the grep lines above, and why does my citing of a URL prove
that I did something to my build environment?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com