andygrove commented on issue #1411: URL: https://github.com/apache/datafusion-comet/issues/1411#issuecomment-2663580349
Ho @Noah-FetchRewards. It looks like you are trying to do two things at once - run TPC-H on Spark on k8s, and run Comet. If your requirement is to run in k8s then I would recommend getting regular Spark running in k8s first, and then enable Comet. Spark has very good documentation on running with k8s: https://spark.apache.org/docs/3.5.4/running-on-kubernetes.html > however, the sample data to run the data at, conf spark.kubernetes.executor.volumes.hostPath.tpcdata.options.path=/mnt/bigdata/tpcds/sf100/ is obviously not in the image and I gave up pretty quickly, as I am unsure how to get the data loaded into the container itself. I would not recommend loading data into the container. You would typically want to use a PersistentVolumeClaim if you want to store data directly on host machines, or more typically, you could read the data from an S3 bucket or similar. This isn't specific to Comet. If you want to run locally on a single node i.e. laptop/desktop then I would recommend running Spark in standalone mode. I hope these are helpful suggestions. I'm happy to help provide more guidance if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org