[ 
https://issues.apache.org/jira/browse/NIFI-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656209#comment-17656209
 ] 

Daniel Stieglitz edited comment on NIFI-10582 at 1/10/23 5:27 PM:
------------------------------------------------------------------

[~gkonar] I believe I see what the issue here is but I am not an XSLT guru to 
solve the exact problem. I believe you are experiencing the issue seen in the 
following Stackoverflow post [Why is xsl:value-of behaving completely different 
depending on the xsl:stylesheet 
version|https://stackoverflow.com/questions/73497698/why-is-xslvalue-of-behaving-completely-different-depending-on-the-xslstyleshee].
 Please note under the hood TransformXml is using Saxon HE 10.6 which conforms 
with the W3C Recommendations for XSLT 3.0, XPath 3.1, and XQuery 3.1 while 
xsltproc is an XSLT 1.0 processor as stated 
[here|https://stackoverflow.com/questions/25061696/xsltproc-doesnt-recognize-xslt-2-0].
 Hence you are seeing a difference in how xsl:value-of is being interpreted on 
line 182 of your XSLT:

{code:java}
<xsl:value-of select="text()" />
{code}

Please note the the new line is being inserted for both files when there is no 
text for the node but rather a sequence of empty spaces.
In 14_R01.svg lines 49-50
{code:java}
<text id="Text21" font-family="Arial" font-size="14" x="255" y="82" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

In 21_R01.svg (lines 43-44)
{code:java}
<text id="Text18" font-family="Arial" font-size="14" x="171" y="422" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

Based on this [documentation|https://xsltdev.com/xslt/xsl-value-of/] there is a 
difference how XSLT 1.0 and 2.0 (and I believe the same goes for 3.0) if the 
select expression evaluates to a sequence containing more than one item. 
Without a separator specified (which svgTest.xsl does not specify) in XSLT 1.0 
only the first item is considered but in XSLT 2.0 (and 3.0) the sequence of 
items separated by the default separator a space is used. When using xsltproc 
which uses XSLT 1.0, there is no new line since only the first item a blank 
space is chosen, while when using TransformXml a "new line" is inserted as it 
is the sequence of spaces found in the node separated by single spaces. So in a 
sense it is part of that data line. It looks like you need some sort of trim 
function to get rid of the sequence of spaces when capturing the text. As for a 
possible backwards compatible mode for XSLT 1.0 which seemed possible based on 
the first article I quoted, it seems there is none for XSLT 3.0 based on the 
following [support ticket|https://saxonica.plan.io/issues/4266]. 
Hence I believe what you have observed is not a bug but rather a consequence of 
using XSLT 3.0. Please let me know if you concur with this conclusion.



was (Author: JIRAUSER294662):
[~gkonar] I believe I see what the issue here is but I am not an XSLT guru to 
solve the exact problem. I believe you are experiencing the issue seen in the 
following Stackoverflow post [Why is xsl:value-of behaving completely different 
depending on the xsl:stylesheet 
version|https://stackoverflow.com/questions/73497698/why-is-xslvalue-of-behaving-completely-different-depending-on-the-xslstyleshee].
 Please note under the hood TransformXml is using Saxon HE 10.6 which conforms 
with the W3C Recommendations for XSLT 3.0, XPath 3.1, and XQuery 3.1 while 
xsltproc is an XSLT 1.0 processor as stated 
[here|https://stackoverflow.com/questions/25061696/xsltproc-doesnt-recognize-xslt-2-0].
 Hence you are seeing a difference in how xsl:value-of is being interpreted on 
line 182 of your XSLT:

{code:java}
<xsl:value-of select="text()" />
{code}

Please note the the new line is being inserted for both files when there is no 
text for the node but rather a sequence of empty spaces.
In 14_R01.svg lines 49-50
{code:java}
<text id="Text21" font-family="Arial" font-size="14" x="255" y="82" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

In 21_R01.svg (lines 43-44)
{code:java}
<text id="Text18" font-family="Arial" font-size="14" x="171" y="422" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

Based on this [documentation|https://xsltdev.com/xslt/xsl-value-of/] there is a 
difference how XSLT 1.0 and 2.0 (and I believe the same goes for 3.0) if the 
select expression evaluates to a sequence containing more than one item. 
Without a separator specified (which svgTest.xsl does not specify) in XSLT 1.0 
only the first item is considered but in XSLT 2.0 (and 3.0) the sequence of 
items separated by the default separator a space is used. When using xsltproc 
which uses XSLT 1.0, there is no new line since only the first item a blank 
space is chosen, while when using TransformXml a "new line" is inserted as it 
is the sequence of spaces found in the node separated by single spaces. So in a 
sense it is part of that data line. It looks like you need some sort of trim 
function to get rid of the sequence of spaces when capturing the text. As for a 
possible backwards compatible mode for XSLT 1.0 which seemed possible based on 
the first article I quoted, it seems there is none for XSLT 3.0 based on the 
following [support ticket|https://saxonica.plan.io/issues/4266]. 
Hence I believe what you have observed is not a bug but rather a consequence of 
using XSLT 3.0. Please let me know if you concur with this conclusion.


> <xsl:strip-space> XSLT element does not work in NiFi 1.15.3
> -----------------------------------------------------------
>
>                 Key: NIFI-10582
>                 URL: https://issues.apache.org/jira/browse/NIFI-10582
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.15.3
>         Environment: Windows 10 Pro, 21H2, 64-bit O/S, 64GB RAM
> Apache NiFi 1.15.3
> openjdk version "11" 2018-09-25
> OpenJDK Runtime Environment 18.9 (build 11+28)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>            Reporter: Greg Konar
>            Priority: Major
>         Attachments: 14_R01.csv, 14_R01.csv_err, 14_R01.svg, 21_R01.csv, 
> 21_R01.csv_err, 21_R01.svg, Reproduce_NiFi_Error.xml, svgTest.xsl
>
>
> I was using NiFi to convert SVG files to pipe delimited format so I can load 
> and convert them to a proprietary XML structure required by our application.
> One client sent us files that contained rogue newline characters which caused 
> nearly 25% of the files to fail to load.  Using *<xsl:strip-space>* in my 
> XSLT file, I was able to manually "repair" the files.
> When the _*TransformXML*_ processor was pointed to my svgTest.xsl XSLT file, 
> the files still failed to load.
> To prove the file was good, I created a bash shell script which applied the 
> XML transformation to my SVG files, then I injected the delimited files at a 
> later point in my flow.  ALL FILES LOADED.
> Please find my _*svgTest.xsl*_ file attached.
> Please fix this bug in NiFi and let me know which version contains the fix as 
> I am our company's "NiFi Champion".
> If you have any questions or need additional information, please let me know.
> Thank you in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to