DickJC123 opened a new issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738


   ## Description
   Here are two independent PR's with the failure:
   
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-20635/38/pipeline
   
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-20734/5/pipeline
   
   The failure has been reported as an issue with the mirrors supplying oneapi: 
https://community.intel.com/t5/Registration-Download-Licensing/OneAPI-apt-repository-broken/m-p/1329104
   
   I'm a little suspicious there might be more to it based on 2 observations:
   
   1. The onednn lib is installed by a RUN command in Dockerfile.build.ubuntu.  
This creates an intermediate docker image that is pulled in from cache in the 
failing builds:
   ```
   [2021-11-11T23:00:22.939Z] Step 5/20 : RUN export 
DEBIAN_FRONTEND=noninteractive ...
   [2021-11-11T23:00:23.196Z]  ---> Using cache
   [2021-11-11T23:00:23.196Z]  ---> 1a09ef0af63e
   ```
   The image tag is the same as we've seen for a week or more, well before 
apparent changes to the mirrors.  So are we not handling cached docker images 
properly?
   
   2. The actual error is in a `apt-get update` performed by a later RUN 
command that is installing tensor-rt and cudnn.  Perhaps the intel repo used to 
install onednn in the earlier RUN command should be removed from the container 
in that same step, since the installation is complete?  It's possible that the 
command `add-apt-repository -r "deb https://apt.repos.intel.com/oneapi all 
main"` would perform that action.  If the intel repo were no longer in 
/etc/apt/sources.list, presumably the currently failing `apt-get update` would 
succeed.
   
   ### Error Message
   ```
   [2021-11-11T23:00:39.105Z] Err:9 https://apt.repos.intel.com/oneapi all/main 
all Packages
   [2021-11-11T23:00:39.105Z]   Hash Sum mismatch
   [2021-11-11T23:00:39.105Z]   Hashes of expected file:
   [2021-11-11T23:00:39.105Z]    - Filesize:21072 [weak]
   [2021-11-11T23:00:39.105Z]    - 
SHA512:7082767f95f6e40ad31deb8a9df205fa726ef3f4821ff6982d507f2f91adb57c282d1fbe3253f610b3e07f77a0c3c2320ed2c78b8d4b5b648928dd5c1fea271e
   [2021-11-11T23:00:39.105Z]    - 
SHA256:7e91d4ace2815407f999e88e5296f678447b9577e1f84af4addc7212c8eb32b0
   [2021-11-11T23:00:39.105Z]    - 
SHA1:53e523680f4f09015f82673434772a6ec112e8f2 [weak]
   [2021-11-11T23:00:39.105Z]    - MD5Sum:3f125fa13d509dd4e66fa49ae3d5af96 
[weak]
   [2021-11-11T23:00:39.105Z]   Hashes of received file:
   [2021-11-11T23:00:39.105Z]    - 
SHA512:5af0e2266d2ef7cfd42b907c68d21b020e8e1f6c516e9fb35c7affcd52d047ffedec885f14685eaf6539edfc23c0da8e9c7035bcede483a331d9c66e5dce8c54
   [2021-11-11T23:00:39.105Z]    - 
SHA256:97bb376982553d6f5ae07c29a79fd653295caf7599cd6deb3c051c90a0290af1
   [2021-11-11T23:00:39.105Z]    - 
SHA1:9e1ac9d3f961d4e376cbc55758a334cc158a9603 [weak]
   [2021-11-11T23:00:39.105Z]    - MD5Sum:db23233f3ef8572c745ff537a2b2fdb8 
[weak]
   [2021-11-11T23:00:39.105Z]    - Filesize:21072 [weak]
   [2021-11-11T23:00:39.105Z]   Last modification reported: Tue, 05 Oct 2021 
04:38:36 +0000
   ```
   
   ## To Reproduce
   Have not repro'd outside of CI runs.
   
   ### Steps to reproduce
   
   ## What have you tried to solve it?
   
   I was not able to repro the failure using the recipe posted to the intel 
site, i.e. it worked fine for me.
   ## Environment
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to