Re: running pyspark on kubernetes - no space left on device
Hi George, You can try mounting a larger PersistentVolume to the work directory as described here instead of using localdir which might have site-specific size constraints: https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes -Matt > On Sep 1, 2022, at 09:16, Manoj GEORGE > wrote: > > > CONFIDENTIAL & RESTRICTED > > Hi Team, > > I am new to spark, so please excuse my ignorance. > > Currently we are trying to run PySpark on Kubernetes cluster. The setup is > working fine for some jobs, but when we are processing a large file ( 36 gb), > we run into one of space issues. > > Based on what was found on internet, we have mapped the local dir to a > persistent volume. This still doesn’t solve the issue. > > I am not sure if it is still writing to /tmp folder on the pod. Is there some > other setting which need to be changed for this to work. > > Thanks in advance. > > > > Thanks, > Manoj George > Manager Database Architecture > M: +1 3522786801 > manoj.geo...@amadeus.com > www.amadeus.com > > > Disclaimer: This email message and information contained in or attached to > this message may be privileged, confidential, and protected from disclosure > and is intended only for the person or entity to which it is addressed. Any > review, retransmission, dissemination, printing or other use of, or taking of > any action in reliance upon, this information by persons or entities other > than the intended recipient is prohibited. If you receive this message in > error, please immediately inform the sender by reply email and delete the > message and any attachments. Thank you.
Re: Spark 3 + Delta 0.7.0 Hive Metastore Integration Question
Hi Jay, Some things to check: Do you have the following set in your Spark SQL config: "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" Is the JAR for the package delta-core_2.12:0.7.0 available on both your driver and executor classpaths? (More info https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake) Since you are using non-default metastore version have you set the config for spark.sql.hive.metastore.version (More info https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore) Finally are you able to read/write Delta tables outside of Hive? -Matt > On Dec 19, 2020, at 13:03, Jay wrote: > > Hi All - > > I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which > is connected to an external hive metastore. > > I run the below set of commands :- > > val tableName = tblname_2 > spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta > options(path='GCS_PATH')") > 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: > Couldn't find corresponding Hive SerDe for data source provider delta. > Persisting data source table `default`.`tblname_2` into Hive metastore in > Spark SQL specific format, which is NOT compatible with Hive. > > spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9") > res51: org.apache.spark.sql.DataFrame = [] > > spark.sql(s"SELECT * FROM $tableName").show() > org.apache.spark.sql.AnalysisException: Table does not support reads: > default.tblname_2; > > I see a warning which is related to integration with Hive Metastore which > essentially tells that this table cannot be queried via Hive or Presto which > is fine but when I try to read the data from the same spark session I am > getting an error. Can someone suggest what can be the problem ?