[ 
https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628904#comment-17628904
 ] 

Anton Ippolitov commented on SPARK-40954:
-----------------------------------------

Thank you for the suggestion [~dcoliversun]!

The hyperkit driver is not supported on M1 
([https://github.com/kubernetes/minikube/issues/11885)] however I managed to 
run Minikube Spark on Kubernetes integration tests with the experimental 
[qemu2|https://github.com/kubernetes/minikube/pull/13639] driver. I also had to 
use the experimental 
[socket_vmnet|https://minikube.sigs.k8s.io/docs/drivers/qemu/#networking] 
network in order for the `[minikube 
service|https://github.com/apache/spark/blob/01014aa99fa851411262a6719058dde97319bbb3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala#L111-L113]`
 call to work:
{noformat}
minikube start --driver qemu2 --network socket_vmnet{noformat}
The tests pass now.

I think it would be good to document this as a workaround for running Minikube 
integration tests on M1? There is also the experimental 
[podman|https://minikube.sigs.k8s.io/docs/drivers/podman/] Minikube driver 
which is supported on M1 but I haven't tried it.

 

> Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-40954
>                 URL: https://issues.apache.org/jira/browse/SPARK-40954
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Tests
>    Affects Versions: 3.3.1
>         Environment: MacOS 12.6 (Mac M1)
> Minikube 1.27.1
> Docker 20.10.17
>            Reporter: Anton Ippolitov
>            Priority: Minor
>         Attachments: TestProcess.scala
>
>
> h2. Description
> I tried running Kubernetes integration tests with the Minikube backend (+ 
> Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on 
> Spark's master branch. I ran them with the following command:
>  
> {code:java}
> mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
>                         -Pkubernetes -Pkubernetes-integration-tests \
>                         -Phadoop-3 \
>                         -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
>                         -Dspark.kubernetes.test.imageRepo=docker.io/kubespark 
> \
>                         -Dspark.kubernetes.test.namespace=spark \
>                         -Dspark.kubernetes.test.serviceAccountName=spark \
>                         -Dspark.kubernetes.test.deployMode=minikube  {code}
> However the test suite got stuck literally for hours on my machine. 
>  
> h2. Investigation
> I ran {{jstack}} on the process that was running the tests and saw that it 
> was stuck here:
>  
> {noformat}
> "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 
> tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.FileInputStream.readBytes(Native Method)
>     at java.io.FileInputStream.read(FileInputStream.java:255)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>     - locked <0x000000076c0b6f40> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>     at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>     - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>     at java.io.BufferedReader.fill(BufferedReader.java:161)
>     at java.io.BufferedReader.readLine(BufferedReader.java:324)
>     - locked <0x000000076c0bb410> (a java.io.InputStreamReader)
>     at java.io.BufferedReader.readLine(BufferedReader.java:389)
>     at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
>  Source)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
>  Source)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
>     at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
>     at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
>     at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
>     at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown
>  Source)
> [...]{noformat}
>  So the issue is coming from {{DepsTestsSuite}} when it is setting up 
> {{{}minio{}}}. After [creating the minio StatefulSet and 
> Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85],
>  it 
> [executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281]
>  the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets 
> stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout 
> [here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50]
> I then ran the same command from my shell and confirmed that it never returns 
> until a CTRL+C:
> {noformat}
> $ minikube service -n spark minio-s3 --url
> http://127.0.0.1:63114
> ❗  Because you are using a Docker driver on darwin, the terminal needs to be 
> open to run it.
> <COMMAND IS STILL RUNNING HERE>{noformat}
> So it looks like it's the normal behaviour for the 'minikube service' command 
> on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick 
> look at Minikube's source code and it seems to be happening here: 
> [https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154]
> It also seems to be confirmed by the docs: 
> [https://minikube.sigs.k8s.io/docs/handbook/accessing/] 
> So the code which reads from stdout hangs indefinitely because of that. I was 
> able to reproduce with a self-contained example as well, see attached 
> {{TestProcess.scala}} file (it assumes that there is a {{minio-s3}} Service 
> in the {{spark}} Namespace).
>  
> I am not sure what would be the best solution here. I think ideally, we 
> should run the  'minikube service' command, then retrieve the URL without 
> blocking but at the same time we should make sure to leave the command 
> running. When the {{DepsTestsSuite}} terminates, we shouldn't forget to 
> terminate the minikube too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to