[ 
https://issues.apache.org/jira/browse/NIFI-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656209#comment-17656209
 ] 

Daniel Stieglitz edited comment on NIFI-10582 at 1/9/23 7:04 PM:
-----------------------------------------------------------------

[~gkonar] I believe I see what the issue here is but I am not an XSLT guru to 
solve the exact problem. I believe you are experiencing the issue seen in the 
following Stackoverflow post [Why is xsl:value-of behaving completely different 
depending on the xsl:stylesheet 
version|https://stackoverflow.com/questions/73497698/why-is-xslvalue-of-behaving-completely-different-depending-on-the-xslstyleshee].
 Please note under the hood TransformXml is using Saxon HE 10.6 which conforms 
with the W3C Recommendations for XSLT 3.0, XPath 3.1, and XQuery 3.1 while 
xsltproc is an XSLT 1.0 processor as stated 
[here|https://stackoverflow.com/questions/25061696/xsltproc-doesnt-recognize-xslt-2-0].
 Hence you are seeing a difference in how xsl:value-of is being interpreted on 
line 182 of your XSLT:

{code:java}
<xsl:value-of select="text()" />
{code}

Please note the the new line is being inserted for both files when there is no 
text for the node.
In 14_R01.svg lines 49-50
{code:java}
<text id="Text21" font-family="Arial" font-size="14" x="255" y="82" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

In 21_R01.svg (lines 43-44)
{code:java}
<text id="Text18" font-family="Arial" font-size="14" x="171" y="422" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

Based on this [documentation|https://xsltdev.com/xslt/xsl-value-of/] there is a 
difference how XSLT 1.0 and 2.0 (and I believe the same goes for 3.0) if the 
select expression evaluates to a sequence containing more than one item. 
Without a separator specified (which is your case) in XSLT 1.0 only the first 
item which is a single space is considered but in XSLT 2.0 (and 3.0) the 
sequence of spaces separated by the default separator a space is used. I 
believe the "new line" we are seeing is the sequence of spaces found in the 
node separated by single spaces. So in a sense it is part of that data line. It 
looks like you need some sort of trim function to get rid of that space when 
capturing the text. As for a possible backwards compatible mode for XSLT 1.0 
which seemed possible based on the first article I quoted, it seems there is 
none for XSLT 3.0 based on the following [support 
ticket|https://saxonica.plan.io/issues/4266]. 
Hence I believe this is not a bug but rather a consequence of using XSLT 3.0. 
Please let me know if you concur with this conclusion.



was (Author: JIRAUSER294662):
[~gkonar] I believe I see what the issue here is but I am not an XSLT guru to 
solve the exact problem. I believe you are experiencing the issue seen in the 
following Stackoverflow post [Why is xsl:value-of behaving completely different 
depending on the xsl:stylesheet 
version|https://stackoverflow.com/questions/73497698/why-is-xslvalue-of-behaving-completely-different-depending-on-the-xslstyleshee].
 Please note under the hood TransformXml is using Saxon HE 10.6 which conforms 
with the W3C Recommendations for XSLT 3.0, XPath 3.1, and XQuery 3.1 while 
xsltproc is an XSLT 1.0 processor as stated 
[here|https://stackoverflow.com/questions/25061696/xsltproc-doesnt-recognize-xslt-2-0].
 Hence you are seeing a difference in how xsl:value-of is being interpreted on 
line 182 of your XSLT:

{code:java}
<xsl:value-of select="text()" />
{code}

Please note the the new line is being inserted for both files when there is no 
text for the node.
In 14_R01.svg lines 49-50
{code:java}
<text id="Text21" font-family="Arial" font-size="14" x="255" y="82" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

In 21_R01.svg (lines 43-44)
{code:java}
<text id="Text18" font-family="Arial" font-size="14" x="171" y="422" 
xml:space="preserve" style="dominant-baseline: auto;" text-anchor="start" 
fill="rgb(32,32,32)">
    </text>
{code}

Based on this [documentation|https://xsltdev.com/xslt/xsl-value-of/] there is a 
difference how XSLT 1.0 and 2.0 (and I believe the same goes for 3.0) if the 
select expression evaluates to a sequence containing more than one item. 
Without a separator specified (which is your case) in XSLT 1.0 only the first 
item which is a single space is considered but in XSLT 2.0 (and 3.0) the 
sequence of spaces separated by the default separator a space is used. I 
believe the "new line" we are seeing is the sequence of spaces found in the 
node separated by single spaces. So in a sense it is part of that data line. It 
looks like you need some sort of trim function to get rid of that space when 
capturing the text. As for a possible backwards compatible mode for XSLT 1.0 
which seemed possible based on the first article I quoted, it seems though for 
XSLT 3.0 based on the following [support 
ticket|https://saxonica.plan.io/issues/4266] there is none. 
Hence I believe this is not a bug but rather a consequence of using XSLT 3.0. 
Please let me know if you concur with this conclusion.


> <xsl:strip-space> XSLT element does not work in NiFi 1.15.3
> -----------------------------------------------------------
>
>                 Key: NIFI-10582
>                 URL: https://issues.apache.org/jira/browse/NIFI-10582
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.15.3
>         Environment: Windows 10 Pro, 21H2, 64-bit O/S, 64GB RAM
> Apache NiFi 1.15.3
> openjdk version "11" 2018-09-25
> OpenJDK Runtime Environment 18.9 (build 11+28)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>            Reporter: Greg Konar
>            Priority: Major
>         Attachments: 14_R01.csv, 14_R01.csv_err, 14_R01.svg, 21_R01.csv, 
> 21_R01.csv_err, 21_R01.svg, Reproduce_NiFi_Error.xml, svgTest.xsl
>
>
> I was using NiFi to convert SVG files to pipe delimited format so I can load 
> and convert them to a proprietary XML structure required by our application.
> One client sent us files that contained rogue newline characters which caused 
> nearly 25% of the files to fail to load.  Using *<xsl:strip-space>* in my 
> XSLT file, I was able to manually "repair" the files.
> When the _*TransformXML*_ processor was pointed to my svgTest.xsl XSLT file, 
> the files still failed to load.
> To prove the file was good, I created a bash shell script which applied the 
> XML transformation to my SVG files, then I injected the delimited files at a 
> later point in my flow.  ALL FILES LOADED.
> Please find my _*svgTest.xsl*_ file attached.
> Please fix this bug in NiFi and let me know which version contains the fix as 
> I am our company's "NiFi Champion".
> If you have any questions or need additional information, please let me know.
> Thank you in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to