Re: Normalizing space and special spaces

Maria2009 Wed, 25 Mar 2009 18:30:42 -0700

Hi Andreas,

I had not success yesterday as well. I thought I should search more
intensively on Sunday, but since you made the same experience though knowing
better where to search, I think I will forget about this idea for a while.

Andreas Delmelle-2 wrote:
> 
> ...
> 
> Some follow-up on this:
> Hope you have better luck than I did. I could have sworn I had read  
> something like that...  :-/
> 
> Maybe I'm mixing up things, and the reference I remember is not  
> Unicode at all.
> 
> ...
> 
> At any rate, my original intention was not so much to 'correct' as to  
> 'inform'. XSL-FO mentions absolutely nothing about non-XML white-space  
> characters. The only references to be found are to the four codepoints  
> I mentioned earlier. As far as XSL-FO is concerned, a zero-width space  
> is just another character.
> On the other hand, things like Unicode line-separators or paragraph- 
> breaks aren't spoken about either, but FOP does deal with them as an  
> end-user would expect.
> 

These informations are highly appreciated! 

Andreas Delmelle-2 wrote:
> 
> 
> In the meantime, I have re-read your original post, and:
> 
>> Now, for some of these nested elements I have to create special  
>> whitespace
>> (like non-breaking small spaces), so I do not need the whitespace  
>> from the
>> normalization. In other words: Instead of "text_text-from- 
>> element_text"
>> (with _ for normal spaces between the elements), I want
>> "text*text-from-element_text" (with * for non breaking whitespaces)  
>> but get
>> "text_*text-from-element_text".
> 
> What does the generated FO look like *exactly*? If there is any  
> indentation or a linefeed, one regular space-character will always  
> survive default settings for "white-space-collapse", "white-space- 
> treatment" and "linefeed-treatment".
> You could experiment by setting the property 'linefeed- 
> treatment="ignore"' on the surrounding block. If that relieves the  
> issue, then it means that an unnecessary linefeed is present in the  
> FO. For example the FO looks like (literally):
> 
> <fo:block>
>    text
>      <fo:inline>*text-from-element</fo:inline>
> ...
> 
> The white-space between the word 'text' and the fo:inline would yield  
> one space that survives, since from the point-of-view of the processor  
> (FOP) it just appears in between two characters.
> 
> If you were to make that:
> 
> <fo:block>text<fo:inline>*text-from-element</fo:inline>
> ...
> 
> then the output would be precisely what you seem to expect. Could be  
> that this particular issue is solvable by adhering to best practices  
> and simply producing processor-friendly (as opposed to human-readable)  
> XML.
> 
> In other words: make sure the FO result does not contain ANY white- 
> space that you don't want there, and FOP will behave. It treats  
> regular spaces as XSL-FO prescribes, and for the rest, it only assumes  
> that every other type of space is there because it was intended.
> 
> 
> HTH!
> 
> Andreas
> 
> 

It helps! Setting the XSLT to stripping whitespace and ignoring linefeeds at
least produced a predictable and reasonable outcome similar to what you
presented above. It took me some time to come to this point... 

I also came to the conclusion that I have to offer the FO-Processor a
document that it can reliably transform, and this means, the original
document has to follow some rules. My first idea was to "clean up" the
original XML document and then transform it to the FO-document, but then I
thought, maybe, the whitespace can be dealt with in the FOP. I think that
was stupid :(, never mind. 

My problem is that in the document, the elements look like this (_ and * for
normal and small nonbreaking whitespace at the crucial part): 

<p>The text_<note>to which the note refers_<item>the note
itselft</item></note>.</p>

This produces good results, as long as the note is not a footnote, to which
I have to add a number in the text, which always follows a small whitespace.
The text will then look like this:

The text to which the note refers_*1. 

It should look like this: 

The text to whch the note refers*1.

To achieve this result, some of my notes (marked up with certain attributes)
have to be positioned differently in the text: 

<p>The text_<note>to which the note refers<item>the note
itselft</item></note>.</p>

I should in these cases try to get rid of the normal space in the original
document before the item element tag. 

Thinking about this problem and your advice about producing a document that
the processor can handle, it becomes clear, that such a clean-up step for
the original document to follow some formal rules is very important before
the transformation to the FO document can start.

The original document is academic text, and the aim of the XML language I am
working is to offer a tool for academic writers (me in the first place)
where we only have to think about content, nothing about style, tag names
and formal rules. I try to achieve this in several fields of text creation,
but one important factor is that it should not make a difference in the
writing stage, whether there is a whitespace character before or after an
element tag. 

Like this: 

<p>The text_<note>_to which the note refers_<item>_the note
itselft</item></note>.</p>

should produce identical results like this: 

<p>The text<note>to which the note refers<item>the note
itselft</item></note>.</p>

If anybody has a hint of how to get create a unified original document from
these different options, I will be happy. I thought of adressing these
points (spaces before and after element tags) with XPointer or RegEx, so if
there is an advice about which one to use, I will start learning this
technique at first. Or anything else if necessary!

Thanks for listening, 
Maria

-- 
View this message in context: 
http://www.nabble.com/Normalizing-space-and-special-spaces-tp22678087p22714087.html
Sent from the FOP - Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Normalizing space and special spaces

Reply via email to