/*COMMENTS ON TOKENISER MODULE*/ (PDF Reference, version 1.7 )
UNIT TEST:
/**************************/
/*function pdf_token_read */
/**************************/
1- COMMENTS:
test1: comments are ignored (similarly to white space) (already done)
test2: macro "PDF_TOKEN_RET_COMMENT" has to be defined if we need to return them
test3: two exceptions: -%PDF-n.m
-%%EOF
should we return them all the times whether or not the macro is defined ? Or is
it an issue left to the caller ?
test4: test long comments
question 1: "The comment consists of all characters between the percent sign
and the end of the line" but in the code (handle_char()) when we are detecting
'%' we are storing it ?
question 2: Do we need to store every character in the case we ignore comments
? would it be more efficient to decide whether we ignore them or consider them
in the handle_char function instead of the flush_token
2- BOOLEAN:
test1: keyword true and false
3- INTEGER:
test 1: one or more digits with optional sign
test 2: Limit [+2 ^31-1 ; -2 ^31]
4- REAL:
test 1: one or more digits with optional sign with a leading, trailing,
embedded decimal point
test 2: Limit [+3.403 x 10 ^38; -3.403 x 10 ^38]
test 3: 5 is the number of significant decimal digits of precision in
fractional part
5- STRING:
*literal characters enclosed with "()"
test 1: unbalanced parentheses forbidden
test 2: In a string, if the character immediately following a REVERSE SOLIDUS
(\) is not one of n, r, t, b, f, (, ), \ or numbers specifying an octal value,
the REVERSE SOLIDUS should be ignored. (already done)
test 3: In a string, an end-of-line marker appearing within a literal string
without a preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah),
irrespective of whether the end-of-line marker was a CARRIAGE RETURN (0Dh), a
LINE FEED (0ah), or both.(almost done, left to be tested \n alone)
test 4: "\LF" "\CR""\CR+LF" are not considered part of the string (left to be
tested "\LF" "\CR")
test 5: High-order overflow in an octal character representation \ddd in a
string should be ignored by the tokeniser. (done)
test 6: In an octal character representation \ddd in a string, three octal
digits shall be used, with leading zeros as needed, if the next character of
the string is also a digit. Otherwise it can use one or two octal digits.(can
only be tested on pdf-token-write())
question 1: would it be useful to differentiate hexadecimal and literal string
as token types. Like that we could check that there is not unbalanced
parentheses in literal strings (test 1).
question 2: Limit fixed at 32767 characters is valid only inside content
streams. Couldn't it be longer ? (see Appendix C). Should we introduce a
continuation as in comments ?
*hexadecimal characters enclosed with "<>"
test 1: In a hexadecimal string, SPACE, HORIZONTAL TAB, CARRIAGE RETURN, LINE
FEED and FORM FEED shall be ignored by the tokeniser.
test 2: In a hexadecimal string, if there is an odd number of digits, the final
digit shall be assumed to be 0.(already done)
6- NAMES:
test 1:In a name, A NUMBER SIGN (#) shall (MUST) be written by using its
2-digit hexadecimal code (23), preceded by a NUMBER SIGN.
test 2: In a name, any character that is a regular character (other than NUMBER
SIGN) shall be written as itself or by using its 2-digit hexadecimal code,
preceded by the NUMBER SIGN. (would be useful to automatically test for every
possible regular character and his octal equivalence).
Do you mean to check that all 2-digit hexadecimal code gives the right regular
character ? why do you talk about octal values ?
test 3: In a name, any character that is not a regular character shall (MUST)
be written using its 2-digit hexadecimal code, preceded by the NUMBER SIGN
only. (test negative cases with non-regular characters directly included in the
name).
this test only concerns pdf-token-write, right ? because in pdf-token-read, any
non regular characters (white spaces or delimiters) ends the NAME token.
test 4: In a name, regular characters that are outside the range EXCLAMATION
MARK(21h) to TILDE (7Eh) should (RECOMMENDED) be written using the hexadecimal
notation. (test negative cases)
I don't see what should I do here that is not done before (test 2) ?
test 5: The token SOLIDUS (a slash followed by no regular characters)
introduces a unique valid name defined by the empty sequence of characters.
test 6: null character is forbidden (test pdf_token_name_new()) as well as #00
(test pdf-token-read())
Question 1: The test to verify that Names token don't contain null characters,
done with the creation of the token in pdf_token_name_new introduces redundance
since it is already verified when reading the stream (pdf_token_read). We could
instead let only pdf_token_read and pdf_token_write functions verify that. This
question also goes for COMMENTS (eol characters) and KEYWORD tokens
(non-regular characters).
GENERAL QUESTIONS:
questions 1: in pdf-token.c, why do we add a null character at the end of some
tokens(Names) and not others (Comments, Strings) (see pdf_token_buffer_new and
its pdf_bool_t nullterm)
question 2: should we create a test function (START_TEST) for each case (test1,
test2...), or per token evaluated (COMMENTS, BOOLEAN...), or can we regroup
them inside the same function as it has already been done in
torture/unit/base/token/pdf-token-read.c
Thanks in advance
/Pierre
__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible
contre les messages non sollicités
http://mail.yahoo.fr Yahoo! Mail