[ 
https://issues.apache.org/jira/browse/RANGER-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caijialiang updated RANGER-4166:
--------------------------------
    Description: 
Here we mainly discuss how to reason and reproduce this compilation error 
stably.

environment
[root@gs-server-12223 ~]# locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8

lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core

uname -a
Linux gs-server-12223 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 
x86_64 x86_64 x86_64 GNU/Linux

maven version 3.6.3

description:

There are compilation errors when building Ranger 2.3 and Ranger 2.4 in a Linux 
environment.

Compilation command:
mvn -Pall clean compile package install -Dmaven.test.skip=true -DskipTests=true 
-Dfindbugs.skip=true -Dcheckstyle.skip=true -Djacoco.skip=true -Dpmd.skip=true 
-Drat.skip=true -Dspotbugs.skip=true -Dhadoop.version=3.3.4 
-Dhbase.version=2.4.13 -Dhive.version=3.1.3 -Dkafka.version=2.8.1 
-Dsolr.version=8.11.2 -Dzookeeper.version=3.6.4

The following two patches were applied to ranger2.3 in order to compile 
successfully.
git apply ../ranger/patch1-RANGER-3818.diff
git apply ../patch0-RANGER-3373.diff

*The compilation of ranger 2.3 fails with the following error:*
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on project 
ranger-distro: Failed to create assembly: Error creating assembly archive 
schema-registry-plugin: Problem creating jar: 
jar:file:/home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar!/META-INF/maven/org.apache.ranger/ranger-distro/pom.xml:
 JAR entry META-INF/maven/org.apache.ranger/ranger-distro/pom.xml not found in 
/home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar -> [Help 1] 
{code}
 

*ranger2.4 did not apply any patches, and compilation errors are as follows:*
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on project 
ranger-distro: Failed to create assembly: Error creating assembly archive 
schema-registry-plugin: IOException when zipping 
rMETA-INF/maven/org.apache.ranger/ranger-distro/pom.properties: invalid code 
lengths set -> [Help 1]{code}
According to the compilation error message of ranger2.4, it is suspected that 
the issue is related to encoding. After checking the encoding format of the 
corresponding file, it is found to be ASCII, while Linux defaults to UTF-8

 

file ./distro/target/maven-archiver/pom.properties

./distro/target/maven-archiver/pom.properties: ASCII text

 

Therefore, it is possible that it is a encoding problem. In addition, the error 
message mentions "Error creating assembly archive." The Maven Assembly Plugin 
is executed during the package phase of Maven, after compilation, testing, and 
other operations are completed, to prepare the build artifacts for distribution 
as archive files.

This error occurs when the Assembly Plugin is creating a distributable archive, 
such as a zip or tar.gz format, from the build artifacts. Therefore, it is 
related to how the archive tool used by Maven Assembly Plugin handles encoding.

In both ranger2.3 and ranger2.4, the 
<assembly.plugin.version>2.6</assembly.plugin.version> is used. Hence, it is 
necessary to investigate the code of this version of the Assembly Plugin."

[https://github.com/apache/maven-assembly-plugin],

[https://github.com/apache/maven-assembly-plugin/blob/maven-assembly-plugin-2.6/pom.xml]

>From the pom file and the compression logic in the code, it can be concluded 
>that the compression tool used is plexus-archiver, version 3.0.1.

!image-2023-04-04-10-29-41-802.png!

The release note for plexus-archiver is as follows
[https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]

Searching for the keyword 'encod' in the release note reveals that many 
encoding-related issues have been fixed since version 3.0, including
 * [Issue #37|https://github.com/codehaus-plexus/plexus-archiver/issues/37] - 
Deprecate Manifest(Reader) and update all related Implemenation does not 
properly map characters to map and makes assumptions about character encoding 
which might lead to failures. Deprecate and rely on Java Manifest reader to do 
the right thing.
 * [Issue #39|https://github.com/codehaus-plexus/plexus-archiver/issues/39] - 
Updated to stop falling back to the unicode path extra field policy 
NOT_ENCODEABLE. If a name is not encodeable in UTF-8, it also is not encodeable 
in the extra field. Updated to always add the Info-ZIP Unicode Path Extra Field 
when creating an archive using an encoding different from UTF-8 instead of only 
when a name is not encodeable. Additionally support that extra field when 
unarchiving.
 * [Pull Request 
#73|https://github.com/codehaus-plexus/plexus-archiver/pull/73] - Symbolic 
links not properly encoded in ZIP archives

then download the plexus-archiver code and search for the error message 
'IOException when zipping' in the source code

!image-2023-04-04-10-29-26-077.png!

!image-2023-04-04-10-29-20-811.png!

By reading the plexus-archiver code, it was found that setting encoding is 
necessary when creating a jar file using plexus-archiver, because the jar file 
contains text files such as the manifest file, which may have non-ASCII 
characters and need to be correctly encoded to avoid potential issues. 
Therefore, setting the encoding ensures that the text files in the jar file are 
properly encoded.

However, when creating a tar.gz file using plexus-archiver, there is no need 
for the setEncoding() method, because tar.gz files do not have a text encoding 
format. They are binary files that contain compressed data.

At this point, we can explain why only the schema-registry in the distro 
packaging will have an error. The descriptor of the schema-registry is 
specified as follows:

<descriptor>src/main/assembly/plugin-schema-registry.xml</descriptor> the 
format specified is jar!

!image-2023-04-04-10-29-56-998.png!

And all other formats specified in the assembly, except for this one, are tar.gz

!image-2023-04-04-10-30-06-393.png!

We can use the file command to check the encoding format of all files generated 
during the compilation of all modules:
bashCopy code
file ./xxx/target/maven-archiver/pom.properties
And all of them are encoded in ASCII. This is why all of them are encoded in 
ASCII and only assembly packaging of schema-registry will result in an error.

Based on the above inference, I modified the 'format' in 
plugin-schema-registry.xml from 'jar' to 'tar' and it passed the compilation 
smoothly. Adding the line '<encoding>UTF-8</encoding>' in the distro's pom file 
also allowed it to pass the compilation.

!image-2023-04-04-10-18-02-975.png!

However, these are not the fundamental solutions. The root cause is a bug in 
plexus-archiver that re-encodes when packaging jars. This bug has been fixed in 
the latest version of plexus-archiver. Our assembly plugin was using an older 
version of plexus-archiver, causing the issue. Therefore, upgrading to the 
latest version can solve the problem.

By checking the pom file of the assembly plugin, I found that the 
maven-assembly-plugin-3.4.2 uses plexus-archiver 4.4. Therefore, I updated the 
ranger's <assembly.plugin.version>2.6</assembly.plugin.version> to 
<assembly.plugin.version>3.4.2</assembly.plugin.version> and the compilation 
problem was also solved.

!image-2023-04-04-10-18-31-532.png!

I have tested both ranger 2.3 and ranger 2.4, and upgrading the assembly plugin 
and modifying the encoding can solve the compilation issue on Linux.

https://issues.apache.org/jira/browse/RANGER-2721

Therefore, this issue does not solve the problem of compilation errors. Here we 
are just avoiding using the assembly command to prevent triggering this 
compilation error 100% of the time. In reality, even if assembly is removed, 
many environments will still encounter compilation errors in the final step.

How to reproduce and test stably: We use ranger2.4 for testing because it does 
not require a patch to be applied. Before testing, clear the ranger directory 
installed in the Maven M2 repository.

ranger2.4

1.To reproduce the error, compile using the following command without making 
any modifications.
{code:java}
[root@gs-server-12223 ranger]# git branch -vv master 460a176 [origin/master] 
RANGER-4085: Search filter hint is not available where you search for policy * 
ranger-2.4 50ad9c1 [origin/ranger-2.4] RANGER-4155 : Structure of resource(UI) 
hierarchy in policy form not proper formatted for multiple values. 
release-ranger-2.3.0 ce3339c RANGER-3730: use reload4j to replace log4j-1.2 
[root@gs-server-12223 ranger]

# git diff [root@gs-server-12223 ranger]# rm -rf 
/home/jzhou/m2/org/apache/ranger 

[root@gs-server-12223 ranger]# /usr/local/src/apache-maven-3.6.3/bin/mvn -Pall 
clean compile package install assembly:single -Dmaven.test.skip=true 
-DskipTests=true -Dfindbugs.skip=true -Dcheckstyle.skip=true -Djacoco.skip=true 
-Dpmd.skip=true -Drat.skip=true -Dspotbugs.skip=true -Dhadoop.version=3.3.4 
-Dhbase.version=2.4.13 -Dhive.version=3.1.3 -Dkafka.version=2.8.1 
-Dsolr.version=8.11.2 -Dzookeeper.version=3.6.4 {code}
 

!image-2023-04-04-10-21-17-574.png!

2.Upgrade the assembly.plugin.version in the ranger project to 3.4.2, and 
continue to compile using the above command. The error disappears and the 
compilation can proceed smoothly.

!image-2023-04-04-10-21-38-104.png!

!image-2023-04-04-10-21-50-064.png!

3.Reverting the changes still cannot compile successfully.

!image-2023-04-04-10-22-07-056.png!

A regrettable point here is that it has not yet been figured out which line of 
code, under what circumstances, causes the compilation problem to occur, as 
well as the reason why the issue cannot be stably reproduced without adding 
assembly:single. If someone is interested, they can continue to dig deeper, and 
the answer may be in the maven-assembly-plugin, plexus-archiver, and 
commons-compress libraries.

[https://github.com/apache/maven-assembly-plugin]

[https://github.com/codehaus-plexus/plexus-archiver/|https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]

[https://github.com/apache/commons-compress]

  was:
Here we mainly discuss how to reason and reproduce this compilation error 
stably.



environment
[root@gs-server-12223 ~]# locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8


lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core)
Release:        7.4.1708
Codename:       Core

uname -a
Linux gs-server-12223 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 
x86_64 x86_64 x86_64 GNU/Linux

maven 版本3.6.3


description:

There are compilation errors when building Ranger 2.3 and Ranger 2.4 in a Linux 
environment.



Compilation command:
mvn  -Pall clean compile package install  -Dmaven.test.skip=true 
-DskipTests=true  -Dfindbugs.skip=true -Dcheckstyle.skip=true   
-Djacoco.skip=true -Dpmd.skip=true -Drat.skip=true  -Dspotbugs.skip=true  
-Dhadoop.version=3.3.4  -Dhbase.version=2.4.13  -Dhive.version=3.1.3  
-Dkafka.version=2.8.1  -Dsolr.version=8.11.2  -Dzookeeper.version=3.6.4


The following two patches were applied to ranger2.3 in order to compile 
successfully.
git apply ../ranger/patch1-RANGER-3818.diff
git apply ../patch0-RANGER-3373.diff


*The compilation of ranger 2.3 fails with the following error:*
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on project 
ranger-distro: Failed to create assembly: Error creating assembly archive 
schema-registry-plugin: Problem creating jar: 
jar:file:/home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar!/META-INF/maven/org.apache.ranger/ranger-distro/pom.xml:
 JAR entry META-INF/maven/org.apache.ranger/ranger-distro/pom.xml not found in 
/home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar -> [Help 1] 
{code}
 



*ranger2.4 did not apply any patches, and compilation errors are as follows:*
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on project 
ranger-distro: Failed to create assembly: Error creating assembly archive 
schema-registry-plugin: IOException when zipping 
rMETA-INF/maven/org.apache.ranger/ranger-distro/pom.properties: invalid code 
lengths set -> [Help 1]{code}


According to the compilation error message of ranger2.4, it is suspected that 
the issue is related to encoding. After checking the encoding format of the 
corresponding file, it is found to be ASCII, while Linux defaults to UTF-8



 

file ./distro/target/maven-archiver/pom.properties

./distro/target/maven-archiver/pom.properties: ASCII text



 

Therefore, it is possible that it is a encoding problem. In addition, the error 
message mentions "Error creating assembly archive." The Maven Assembly Plugin 
is executed during the package phase of Maven, after compilation, testing, and 
other operations are completed, to prepare the build artifacts for distribution 
as archive files. 

This error occurs when the Assembly Plugin is creating a distributable archive, 
such as a zip or tar.gz format, from the build artifacts. Therefore, it is 
related to how the archive tool used by Maven Assembly Plugin handles encoding. 

In both ranger2.3 and ranger2.4, the 
<assembly.plugin.version>2.6</assembly.plugin.version> is used. Hence, it is 
necessary to investigate the code of this version of the Assembly Plugin."



[https://github.com/apache/maven-assembly-plugin],

[https://github.com/apache/maven-assembly-plugin/blob/maven-assembly-plugin-2.6/pom.xml]



>From the pom file and the compression logic in the code, it can be concluded 
>that the compression tool used is plexus-archiver, version 3.0.1.


!image-2023-04-04-10-15-39-889.png!


The release note for plexus-archiver is as follows
[https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]



Searching for the keyword 'encod' in the release note reveals that many 
encoding-related issues have been fixed since version 3.0, including 
 * [Issue #37|https://github.com/codehaus-plexus/plexus-archiver/issues/37] - 
Deprecate Manifest(Reader) and update all related Implemenation does not 
properly map characters to map and makes assumptions about character encoding 
which might lead to failures. Deprecate and rely on Java Manifest reader to do 
the right thing.
 * [Issue #39|https://github.com/codehaus-plexus/plexus-archiver/issues/39] - 
Updated to stop falling back to the unicode path extra field policy 
NOT_ENCODEABLE. If a name is not encodeable in UTF-8, it also is not encodeable 
in the extra field. Updated to always add the Info-ZIP Unicode Path Extra Field 
when creating an archive using an encoding different from UTF-8 instead of only 
when a name is not encodeable. Additionally support that extra field when 
unarchiving.
 * [Pull Request 
#73|https://github.com/codehaus-plexus/plexus-archiver/pull/73] - Symbolic 
links not properly encoded in ZIP archives


then download the plexus-archiver code and search for the error message 
'IOException when zipping' in the source code

!image-2023-04-04-10-16-01-609.png!


!image-2023-04-04-10-16-29-682.png!


By reading the plexus-archiver code, it was found that setting encoding is 
necessary when creating a jar file using plexus-archiver, because the jar file 
contains text files such as the manifest file, which may have non-ASCII 
characters and need to be correctly encoded to avoid potential issues. 
Therefore, setting the encoding ensures that the text files in the jar file are 
properly encoded.

However, when creating a tar.gz file using plexus-archiver, there is no need 
for the setEncoding() method, because tar.gz files do not have a text encoding 
format. They are binary files that contain compressed data.

At this point, we can explain why only the schema-registry in the distro 
packaging will have an error. The descriptor of the schema-registry is 
specified as follows:

<descriptor>src/main/assembly/plugin-schema-registry.xml</descriptor> the 
format specified is jar!

!image-2023-04-04-10-17-22-307.png!





And all other formats specified in the assembly, except for this one, are tar.gz

!image-2023-04-04-10-16-59-645.png!



We can use the file command to check the encoding format of all files generated 
during the compilation of all modules:
bashCopy code
file ./xxx/target/maven-archiver/pom.properties
And all of them are encoded in ASCII. This is why all of them are encoded in 
ASCII and only assembly packaging of schema-registry will result in an error.



Based on the above inference, I modified the 'format' in 
plugin-schema-registry.xml from 'jar' to 'tar' and it passed the compilation 
smoothly. Adding the line '<encoding>UTF-8</encoding>' in the distro's pom file 
also allowed it to pass the compilation.

!image-2023-04-04-10-18-02-975.png!



However, these are not the fundamental solutions. The root cause is a bug in 
plexus-archiver that re-encodes when packaging jars. This bug has been fixed in 
the latest version of plexus-archiver. Our assembly plugin was using an older 
version of plexus-archiver, causing the issue. Therefore, upgrading to the 
latest version can solve the problem.

By checking the pom file of the assembly plugin, I found that the 
maven-assembly-plugin-3.4.2 uses plexus-archiver 4.4. Therefore, I updated the 
ranger's <assembly.plugin.version>2.6</assembly.plugin.version> to 
<assembly.plugin.version>3.4.2</assembly.plugin.version> and the compilation 
problem was also solved.



!image-2023-04-04-10-18-31-532.png!



I have tested both ranger 2.3 and ranger 2.4, and upgrading the assembly plugin 
and modifying the encoding can solve the compilation issue on Linux.



https://issues.apache.org/jira/browse/RANGER-2721

Therefore, this issue does not solve the problem of compilation errors. Here we 
are just avoiding using the assembly command to prevent triggering this 
compilation error 100% of the time. In reality, even if assembly is removed, 
many environments will still encounter compilation errors in the final step.





How to reproduce and test stably: We use ranger2.4 for testing because it does 
not require a patch to be applied. Before testing, clear the ranger directory 
installed in the Maven M2 repository.

 ranger2.4 

1.To reproduce the error, compile using the following command without making 
any modifications.
{code:java}
[root@gs-server-12223 ranger]# git branch -vv master 460a176 [origin/master] 
RANGER-4085: Search filter hint is not available where you search for policy * 
ranger-2.4 50ad9c1 [origin/ranger-2.4] RANGER-4155 : Structure of resource(UI) 
hierarchy in policy form not proper formatted for multiple values. 
release-ranger-2.3.0 ce3339c RANGER-3730: use reload4j to replace log4j-1.2 
[root@gs-server-12223 ranger]

# git diff [root@gs-server-12223 ranger]# rm -rf 
/home/jzhou/m2/org/apache/ranger 

[root@gs-server-12223 ranger]# /usr/local/src/apache-maven-3.6.3/bin/mvn -Pall 
clean compile package install assembly:single -Dmaven.test.skip=true 
-DskipTests=true -Dfindbugs.skip=true -Dcheckstyle.skip=true -Djacoco.skip=true 
-Dpmd.skip=true -Drat.skip=true -Dspotbugs.skip=true -Dhadoop.version=3.3.4 
-Dhbase.version=2.4.13 -Dhive.version=3.1.3 -Dkafka.version=2.8.1 
-Dsolr.version=8.11.2 -Dzookeeper.version=3.6.4 {code}
 

!image-2023-04-04-10-21-17-574.png!



2.Upgrade the assembly.plugin.version in the ranger project to 3.4.2, and 
continue to compile using the above command. The error disappears and the 
compilation can proceed smoothly.

!image-2023-04-04-10-21-38-104.png!

!image-2023-04-04-10-21-50-064.png!



3.Reverting the changes still cannot compile successfully.

!image-2023-04-04-10-22-07-056.png!

A regrettable point here is that it has not yet been figured out which line of 
code, under what circumstances, causes the compilation problem to occur, as 
well as the reason why the issue cannot be stably reproduced without adding 
assembly:single. If someone is interested, they can continue to dig deeper, and 
the answer may be in the maven-assembly-plugin, plexus-archiver, and 
commons-compress libraries.


[https://github.com/apache/maven-assembly-plugin]

[https://github.com/codehaus-plexus/plexus-archiver/|https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]

[https://github.com/apache/commons-compress]


> ranger2.3 build failed
> ----------------------
>
>                 Key: RANGER-4166
>                 URL: https://issues.apache.org/jira/browse/RANGER-4166
>             Project: Ranger
>          Issue Type: Bug
>          Components: Ranger
>    Affects Versions: 2.3.0
>            Reporter: caijialiang
>            Priority: Major
>         Attachments: image-2023-04-01-18-31-58-091.png, 
> image-2023-04-01-18-33-29-756.png, image-2023-04-04-10-28-23-029.png, 
> image-2023-04-04-10-29-20-811.png, image-2023-04-04-10-29-26-077.png, 
> image-2023-04-04-10-29-41-802.png, image-2023-04-04-10-29-56-998.png, 
> image-2023-04-04-10-30-06-393.png, image-2023-04-04-10-30-48-140.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here we mainly discuss how to reason and reproduce this compilation error 
> stably.
> environment
> [root@gs-server-12223 ~]# locale
> LANG=zh_CN.UTF-8
> LC_CTYPE="zh_CN.UTF-8"
> LC_NUMERIC="zh_CN.UTF-8"
> LC_TIME="zh_CN.UTF-8"
> LC_COLLATE="zh_CN.UTF-8"
> LC_MONETARY="zh_CN.UTF-8"
> LC_MESSAGES="zh_CN.UTF-8"
> LC_PAPER="zh_CN.UTF-8"
> LC_NAME="zh_CN.UTF-8"
> LC_ADDRESS="zh_CN.UTF-8"
> LC_TELEPHONE="zh_CN.UTF-8"
> LC_MEASUREMENT="zh_CN.UTF-8"
> LC_IDENTIFICATION="zh_CN.UTF-8"
> LC_ALL=zh_CN.UTF-8
> lsb_release -a
> LSB Version: :core-4.1-amd64:core-4.1-noarch
> Distributor ID: CentOS
> Description: CentOS Linux release 7.4.1708 (Core)
> Release: 7.4.1708
> Codename: Core
> uname -a
> Linux gs-server-12223 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> maven version 3.6.3
> description:
> There are compilation errors when building Ranger 2.3 and Ranger 2.4 in a 
> Linux environment.
> Compilation command:
> mvn -Pall clean compile package install -Dmaven.test.skip=true 
> -DskipTests=true -Dfindbugs.skip=true -Dcheckstyle.skip=true 
> -Djacoco.skip=true -Dpmd.skip=true -Drat.skip=true -Dspotbugs.skip=true 
> -Dhadoop.version=3.3.4 -Dhbase.version=2.4.13 -Dhive.version=3.1.3 
> -Dkafka.version=2.8.1 -Dsolr.version=8.11.2 -Dzookeeper.version=3.6.4
> The following two patches were applied to ranger2.3 in order to compile 
> successfully.
> git apply ../ranger/patch1-RANGER-3818.diff
> git apply ../patch0-RANGER-3373.diff
> *The compilation of ranger 2.3 fails with the following error:*
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on 
> project ranger-distro: Failed to create assembly: Error creating assembly 
> archive schema-registry-plugin: Problem creating jar: 
> jar:file:/home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar!/META-INF/maven/org.apache.ranger/ranger-distro/pom.xml:
>  JAR entry META-INF/maven/org.apache.ranger/ranger-distro/pom.xml not found 
> in /home/jialiang/prjs/ranger/distro/target/ranger-distro-2.3.0.jar -> [Help 
> 1] {code}
>  
> *ranger2.4 did not apply any patches, and compilation errors are as follows:*
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default) on 
> project ranger-distro: Failed to create assembly: Error creating assembly 
> archive schema-registry-plugin: IOException when zipping 
> rMETA-INF/maven/org.apache.ranger/ranger-distro/pom.properties: invalid code 
> lengths set -> [Help 1]{code}
> According to the compilation error message of ranger2.4, it is suspected that 
> the issue is related to encoding. After checking the encoding format of the 
> corresponding file, it is found to be ASCII, while Linux defaults to UTF-8
>  
> file ./distro/target/maven-archiver/pom.properties
> ./distro/target/maven-archiver/pom.properties: ASCII text
>  
> Therefore, it is possible that it is a encoding problem. In addition, the 
> error message mentions "Error creating assembly archive." The Maven Assembly 
> Plugin is executed during the package phase of Maven, after compilation, 
> testing, and other operations are completed, to prepare the build artifacts 
> for distribution as archive files.
> This error occurs when the Assembly Plugin is creating a distributable 
> archive, such as a zip or tar.gz format, from the build artifacts. Therefore, 
> it is related to how the archive tool used by Maven Assembly Plugin handles 
> encoding.
> In both ranger2.3 and ranger2.4, the 
> <assembly.plugin.version>2.6</assembly.plugin.version> is used. Hence, it is 
> necessary to investigate the code of this version of the Assembly Plugin."
> [https://github.com/apache/maven-assembly-plugin],
> [https://github.com/apache/maven-assembly-plugin/blob/maven-assembly-plugin-2.6/pom.xml]
> From the pom file and the compression logic in the code, it can be concluded 
> that the compression tool used is plexus-archiver, version 3.0.1.
> !image-2023-04-04-10-29-41-802.png!
> The release note for plexus-archiver is as follows
> [https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]
> Searching for the keyword 'encod' in the release note reveals that many 
> encoding-related issues have been fixed since version 3.0, including
>  * [Issue #37|https://github.com/codehaus-plexus/plexus-archiver/issues/37] - 
> Deprecate Manifest(Reader) and update all related Implemenation does not 
> properly map characters to map and makes assumptions about character encoding 
> which might lead to failures. Deprecate and rely on Java Manifest reader to 
> do the right thing.
>  * [Issue #39|https://github.com/codehaus-plexus/plexus-archiver/issues/39] - 
> Updated to stop falling back to the unicode path extra field policy 
> NOT_ENCODEABLE. If a name is not encodeable in UTF-8, it also is not 
> encodeable in the extra field. Updated to always add the Info-ZIP Unicode 
> Path Extra Field when creating an archive using an encoding different from 
> UTF-8 instead of only when a name is not encodeable. Additionally support 
> that extra field when unarchiving.
>  * [Pull Request 
> #73|https://github.com/codehaus-plexus/plexus-archiver/pull/73] - Symbolic 
> links not properly encoded in ZIP archives
> then download the plexus-archiver code and search for the error message 
> 'IOException when zipping' in the source code
> !image-2023-04-04-10-29-26-077.png!
> !image-2023-04-04-10-29-20-811.png!
> By reading the plexus-archiver code, it was found that setting encoding is 
> necessary when creating a jar file using plexus-archiver, because the jar 
> file contains text files such as the manifest file, which may have non-ASCII 
> characters and need to be correctly encoded to avoid potential issues. 
> Therefore, setting the encoding ensures that the text files in the jar file 
> are properly encoded.
> However, when creating a tar.gz file using plexus-archiver, there is no need 
> for the setEncoding() method, because tar.gz files do not have a text 
> encoding format. They are binary files that contain compressed data.
> At this point, we can explain why only the schema-registry in the distro 
> packaging will have an error. The descriptor of the schema-registry is 
> specified as follows:
> <descriptor>src/main/assembly/plugin-schema-registry.xml</descriptor> the 
> format specified is jar!
> !image-2023-04-04-10-29-56-998.png!
> And all other formats specified in the assembly, except for this one, are 
> tar.gz
> !image-2023-04-04-10-30-06-393.png!
> We can use the file command to check the encoding format of all files 
> generated during the compilation of all modules:
> bashCopy code
> file ./xxx/target/maven-archiver/pom.properties
> And all of them are encoded in ASCII. This is why all of them are encoded in 
> ASCII and only assembly packaging of schema-registry will result in an error.
> Based on the above inference, I modified the 'format' in 
> plugin-schema-registry.xml from 'jar' to 'tar' and it passed the compilation 
> smoothly. Adding the line '<encoding>UTF-8</encoding>' in the distro's pom 
> file also allowed it to pass the compilation.
> !image-2023-04-04-10-18-02-975.png!
> However, these are not the fundamental solutions. The root cause is a bug in 
> plexus-archiver that re-encodes when packaging jars. This bug has been fixed 
> in the latest version of plexus-archiver. Our assembly plugin was using an 
> older version of plexus-archiver, causing the issue. Therefore, upgrading to 
> the latest version can solve the problem.
> By checking the pom file of the assembly plugin, I found that the 
> maven-assembly-plugin-3.4.2 uses plexus-archiver 4.4. Therefore, I updated 
> the ranger's <assembly.plugin.version>2.6</assembly.plugin.version> to 
> <assembly.plugin.version>3.4.2</assembly.plugin.version> and the compilation 
> problem was also solved.
> !image-2023-04-04-10-18-31-532.png!
> I have tested both ranger 2.3 and ranger 2.4, and upgrading the assembly 
> plugin and modifying the encoding can solve the compilation issue on Linux.
> https://issues.apache.org/jira/browse/RANGER-2721
> Therefore, this issue does not solve the problem of compilation errors. Here 
> we are just avoiding using the assembly command to prevent triggering this 
> compilation error 100% of the time. In reality, even if assembly is removed, 
> many environments will still encounter compilation errors in the final step.
> How to reproduce and test stably: We use ranger2.4 for testing because it 
> does not require a patch to be applied. Before testing, clear the ranger 
> directory installed in the Maven M2 repository.
> ranger2.4
> 1.To reproduce the error, compile using the following command without making 
> any modifications.
> {code:java}
> [root@gs-server-12223 ranger]# git branch -vv master 460a176 [origin/master] 
> RANGER-4085: Search filter hint is not available where you search for policy 
> * ranger-2.4 50ad9c1 [origin/ranger-2.4] RANGER-4155 : Structure of 
> resource(UI) hierarchy in policy form not proper formatted for multiple 
> values. release-ranger-2.3.0 ce3339c RANGER-3730: use reload4j to replace 
> log4j-1.2 [root@gs-server-12223 ranger]
> # git diff [root@gs-server-12223 ranger]# rm -rf 
> /home/jzhou/m2/org/apache/ranger 
> [root@gs-server-12223 ranger]# /usr/local/src/apache-maven-3.6.3/bin/mvn 
> -Pall clean compile package install assembly:single -Dmaven.test.skip=true 
> -DskipTests=true -Dfindbugs.skip=true -Dcheckstyle.skip=true 
> -Djacoco.skip=true -Dpmd.skip=true -Drat.skip=true -Dspotbugs.skip=true 
> -Dhadoop.version=3.3.4 -Dhbase.version=2.4.13 -Dhive.version=3.1.3 
> -Dkafka.version=2.8.1 -Dsolr.version=8.11.2 -Dzookeeper.version=3.6.4 {code}
>  
> !image-2023-04-04-10-21-17-574.png!
> 2.Upgrade the assembly.plugin.version in the ranger project to 3.4.2, and 
> continue to compile using the above command. The error disappears and the 
> compilation can proceed smoothly.
> !image-2023-04-04-10-21-38-104.png!
> !image-2023-04-04-10-21-50-064.png!
> 3.Reverting the changes still cannot compile successfully.
> !image-2023-04-04-10-22-07-056.png!
> A regrettable point here is that it has not yet been figured out which line 
> of code, under what circumstances, causes the compilation problem to occur, 
> as well as the reason why the issue cannot be stably reproduced without 
> adding assembly:single. If someone is interested, they can continue to dig 
> deeper, and the answer may be in the maven-assembly-plugin, plexus-archiver, 
> and commons-compress libraries.
> [https://github.com/apache/maven-assembly-plugin]
> [https://github.com/codehaus-plexus/plexus-archiver/|https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md]
> [https://github.com/apache/commons-compress]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to