[ https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628904#comment-17628904 ]
Anton Ippolitov commented on SPARK-40954: ----------------------------------------- Thank you for the suggestion [~dcoliversun]! The hyperkit driver is not supported on M1 ([https://github.com/kubernetes/minikube/issues/11885)] however I managed to run Minikube Spark on Kubernetes integration tests with the experimental [qemu2|https://github.com/kubernetes/minikube/pull/13639] driver. I also had to use the experimental [socket_vmnet|https://minikube.sigs.k8s.io/docs/drivers/qemu/#networking] network in order for the `[minikube service|https://github.com/apache/spark/blob/01014aa99fa851411262a6719058dde97319bbb3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala#L111-L113]` call to work: {noformat} minikube start --driver qemu2 --network socket_vmnet{noformat} The tests pass now. I think it would be good to document this as a workaround for running Minikube integration tests on M1? There is also the experimental [podman|https://minikube.sigs.k8s.io/docs/drivers/podman/] Minikube driver which is supported on M1 but I haven't tried it. > Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker > --------------------------------------------------------------------------- > > Key: SPARK-40954 > URL: https://issues.apache.org/jira/browse/SPARK-40954 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests > Affects Versions: 3.3.1 > Environment: MacOS 12.6 (Mac M1) > Minikube 1.27.1 > Docker 20.10.17 > Reporter: Anton Ippolitov > Priority: Minor > Attachments: TestProcess.scala > > > h2. Description > I tried running Kubernetes integration tests with the Minikube backend (+ > Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on > Spark's master branch. I ran them with the following command: > > {code:java} > mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \ > -Pkubernetes -Pkubernetes-integration-tests \ > -Phadoop-3 \ > -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \ > -Dspark.kubernetes.test.imageRepo=docker.io/kubespark > \ > -Dspark.kubernetes.test.namespace=spark \ > -Dspark.kubernetes.test.serviceAccountName=spark \ > -Dspark.kubernetes.test.deployMode=minikube {code} > However the test suite got stuck literally for hours on my machine. > > h2. Investigation > I ran {{jstack}} on the process that was running the tests and saw that it > was stuck here: > > {noformat} > "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 > tid=0x00007f78d580b800 nid=0x2503 runnable [0x0000000304749000] > java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x000000076c0b6f40> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x000000076c0bb410> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x000000076c0bb410> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown > Source) > at > org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown > Source) > at > org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184) > at > org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196) > at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.getServiceUrl(DepsTestsSuite.scala:278) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:325) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$178/1750286943.apply$mcV$sp(Unknown > Source) > [...]{noformat} > So the issue is coming from {{DepsTestsSuite}} when it is setting up > {{{}minio{}}}. After [creating the minio StatefulSet and > Service|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L85], > it > [executes|https://github.com/apache/spark/blob/5ea2b386eb866e20540660cdb6ed43792cb29969/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L280-L281] > the '{{{}minikube service -n spark minio-s3 --url'{}}} command. It then gets > stuck in {{ProcessUtils}} while reading {{{}minikube{}}}'s stdout > [here.|https://github.com/apache/spark/blob/c8b7a09d39bdbda1502a7580fe2b54b7cb0ac4e3/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala#L44-L50] > I then ran the same command from my shell and confirmed that it never returns > until a CTRL+C: > {noformat} > $ minikube service -n spark minio-s3 --url > http://127.0.0.1:63114 > ❗ Because you are using a Docker driver on darwin, the terminal needs to be > open to run it. > <COMMAND IS STILL RUNNING HERE>{noformat} > So it looks like it's the normal behaviour for the 'minikube service' command > on Mac with the Docker driver: it needs to keep an open tunnel. I had a quick > look at Minikube's source code and it seems to be happening here: > [https://github.com/kubernetes/minikube/blob/abed8b7d347ae15fe9c0acd91b5b49b3b6494a53/cmd/minikube/cmd/service.go#L154] > It also seems to be confirmed by the docs: > [https://minikube.sigs.k8s.io/docs/handbook/accessing/] > So the code which reads from stdout hangs indefinitely because of that. I was > able to reproduce with a self-contained example as well, see attached > {{TestProcess.scala}} file (it assumes that there is a {{minio-s3}} Service > in the {{spark}} Namespace). > > I am not sure what would be the best solution here. I think ideally, we > should run the 'minikube service' command, then retrieve the URL without > blocking but at the same time we should make sure to leave the command > running. When the {{DepsTestsSuite}} terminates, we shouldn't forget to > terminate the minikube too. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org