Re: [I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

via GitHub Mon, 17 Feb 2025 08:25:21 -0800


andygrove commented on issue #1411:
URL: 
https://github.com/apache/datafusion-comet/issues/1411#issuecomment-2663580349


   Ho @Noah-FetchRewards. It looks like you are trying to do two things at once 
- run TPC-H on Spark on k8s, and run Comet.
   
   If your requirement is to run in k8s then I would recommend getting regular 
Spark running in k8s first, and then enable Comet.
   
   Spark has very good documentation on running with k8s: 
   
   https://spark.apache.org/docs/3.5.4/running-on-kubernetes.html
   
   > however, the sample data to run the data at,
   conf 
spark.kubernetes.executor.volumes.hostPath.tpcdata.options.path=/mnt/bigdata/tpcds/sf100/
   is obviously not in the image and I gave up pretty quickly, as I am unsure 
how to get the data loaded into the container itself.
   
   I would not recommend loading data into the container. You would typically 
want to use a PersistentVolumeClaim if you want to store data directly on host 
machines, or more typically, you could read the data from an S3 bucket or 
similar. This isn't specific to Comet.
   
   If you want to run locally on a single node i.e. laptop/desktop then I would 
recommend running Spark in standalone mode.
   
   I hope these are helpful suggestions. I'm happy to help provide more 
guidance if needed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

Reply via email to