style
This tag doesn't seem to be handled in my plucker (version 1.1 I think). I think it requires a start_style and end_style in TextParser.py where each function behaves like the head (i.e. turn off visibility on start, turn it on in end). I think I checked out the latest cvs version and didn't see it in there either. I may have missed something though. Bill
Re: character sets in HTML files?
Bill Janssen [EMAIL PROTECTED] writes: I've been reading the HTTP and HTML specs about character sets. Shouldn't you be using the xhtml specs now? -- MJR
Re: character sets in HTML files?
I've been reading the HTTP and HTML specs about character sets. Shouldn't you be using the xhtml specs now? -- MJR As soon as we add an XML component to the parser... It's on my list. Actually, if you read the XHTML specs, you'll see that they refer you back to the HTML specs for many, even most, things. Bill
Re: making plucker-build easier to use?
Bill == Bill Janssen [EMAIL PROTECTED] writes: Is there are a way to tell Python to output this in Binary mode? Bill If you want to try it, change line 494 in parser/python/PyPlucker/Writer.py Bill from Bill self._pdb_file = prc.File (sys.stdout, read=0, write=1) Bill to Bill self._pdb_file = prc.File (os.fdopen(sys.stdout.fileno(), 'wb'), read=0, write=1) Bill Seems to work for me. Sorry, it does not work :-( BTW: Calling: plucker-build file://C:\test\textfile.txt out Result in the DB name file://C:\test\textfile.txt and if its to large it will be shorten from the _biginning_. Calling: plucker-build -H file://C:\test\textfile.txt -f out The DB are named out (Should it not be named textfile?) (With Databasename i'm talking about the name on the palm) cu, Dirk -- Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer - For the Webpage: http://www.dirk-heiser.de/plucker - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]
Re: added charset info to plucker doc; metadata record type
Bill == Bill Janssen [EMAIL PROTECTED] writes: If i call locale.py i get this output: Language: de_DE Encoding: cp1252 Bill I've just looked at the base Python code in 1.5.2 for locale.py so FYI: This are the output from python 2.0 i get: --- Converted plucker:/home.html Default charset is 1252 1252 0 Converted plucker:/~special~/index -- and 04E4 (1252) are written in the plucker DBs metadata record. But the http://www.iana.org/assignments/character-sets say: Bill I've tried to duplicate this, but just get an error saying that 'cp1252' Bill doesn't name a charset. I guess its because you see every only number charset as an MIBenum. BTW: why you expect that the return from getlocale are in MIBenum? And whats if the user specify 427, is it the charset name or the MIBenum? Need we a way to specify the MIBenum or is it enought to allow to specify the charset name? And at the end one idea: Whats about writing a parser that parse the http://www.iana.org/assignments/character-sets and create and complete NamedCharsets array? It seams to be easy and i think i could do this. Bill Great idea! I have checked in this. Please take a look if it is OK. cu, Dirk -- Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer - For the Webpage: http://www.dirk-heiser.de/plucker - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]
more charset work checked in
I've updated the CVS with my more-or-less final charset work. Please check it out and try it. Note that this does not involve any viewer changes, just parser stuff. Bill
Re: making plucker-build easier to use?
BTW: Calling: plucker-build file://C:\test\textfile.txt out Result in the DB name file://C:\test\textfile.txt and if its to large it will be shorten from the _biginning_. Yes. What would you suggest? I thought that shortening it from the front made more sense than cutting off the end. I don't see another reasonable way to construct a name for the DB, either, but am open to suggestions. Of course, --doc-name can still be used to specify a document name: plucker-build --doc-name=textfile file://C:\test\textfile.txt out Calling: plucker-build -H file://C:\test\textfile.txt -f out The DB are named out (Should it not be named textfile?) Well, the way it's always worked is that the DB filename is used, unless a doc-name is explicitly specified. So I'd say it's still working correctly. Bill
Re: added charset info to plucker doc; metadata record type
Looks great, Dirk. I'll switch over the code in TextParser.py to use it. Bill
Re: making plucker-build easier to use?
Bill == Bill Janssen [EMAIL PROTECTED] writes: BTW: Calling: plucker-build file://C:\test\textfile.txt out Result in the DB name file://C:\test\textfile.txt and if its to large it will be shorten from the _biginning_. Bill Yes. What would you suggest? I thought that shortening it from the Bill front made more sense than cutting off the end. I don't see another What about to use the filename only (cut of the drive, path and extension) and if it still to long cut from the end? At least the both ways to call the parser (-f filename or filename) should end in the _same_ DBname. BTW: since the writing to stdout does not work on windows (and i guess also not on OS/2 on MAC) what about writing an sys.platform == 'linux' around this stdout code, so the parser do not create broken DBs on non LINUX systems? cu, Dirk -- Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer - For the Webpage: http://www.dirk-heiser.de/plucker - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]
Re: more charset work checked in
OK, I've now updated the code to use Dirk's generated table of charset names. Bill
Re: making plucker-build easier to use?
BTW: since the writing to stdout does not work on windows (and i guess also not on OS/2 on MAC) what about writing an sys.platform == 'linux' around this stdout code, so the parser do not create broken DBs on non LINUX systems? I'm going to pursue this a bit further on Windows, first. There must be some way to write binary output. What about to use the filename only (cut of the drive, path and extension) and if it still to long cut from the end? I suppose we could do that. I'd just as soon save as much of the earlier info as possible. Bill
Re: making plucker-build easier to use?
I'm going to pursue this a bit further on Windows, first. There must be some way to write binary output. And indeed, the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python) has the answer. Here it is: import sys if sys.platform == win32: import os, msvcrt msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) I've incorporated that into the code. Try it out! Bill