Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread ayan guha
Just a consideration: Is there a value in backup/restore metadata within spark? I would strongly argue if the metadata is valuable enough and persistent enough, why dont just use external metastore? It is fairly straightforward process. Also regardless you are in cloud or not, database bkp is a

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
By yarn mode I meant dealing with issues raised in a cluster wide. >From personal experience, I find it easier to trace these sorts of errors when I run the code in local mode as it could be related to the set-up and easier to track where things go wrong when one is dealing with local mode. This

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Sean Owen
I don't see any reason to think this is related to YARN. You haven't shown the actual error @rajat so not sure there is anything to say. On Fri, May 7, 2021 at 3:08 PM Mich Talebzadeh wrote: > I have suspicion that this may be caused by your cluster as it appears > that you are running this in

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread Tianchen Zhang
For now we are thinking about adding two methods in Catalog API, not SQL commands: 1. spark.catalog.backup, which backs up the current catalog. 2. spark.catalog.restore(file), which reads the DFS file and recreates the entities described in that file. Can you please give an example of exposing

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
I have suspicion that this may be caused by your cluster as it appears that you are running this in YARN mode like below spark-submit --master yarn --deploy-mode client xyx.py What happens if you try running it in local mode? spark-submit --master local[2] xyx.py Is this run in a managed

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-07 Thread Wenchen Fan
If a catalog implements backup/restore, it can easily expose some client APIs to the end-users (e.g. REST API), I don't see a strong reason to expose the APIs to Spark. Do you plan to add new SQL commands in Spark to backup/restore a catalog? On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang wrote:

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread rajat kumar
Thanks Mich and Sean for the response . Yes Sean is right. This is a batch job. I am having only 10 records in the dataframe still it is giving this exception Following are the full logs. File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 584, in foreach

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Sean Owen
foreach definitely works :) This is not a streaming question. The error says that the JVM worker died for some reason. You'd have to look at its logs to see why. On Fri, May 7, 2021 at 11:03 AM Mich Talebzadeh wrote: > Hi, > > I am not convinced foreach works even in 3.1.1 > Try doing the same

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
Hi, I am not convinced foreach works even in 3.1.1 Try doing the same with foreachBatch foreachBatch(sendToSink). \ trigger(processingTime='2 seconds'). \ and see it works HTH view my Linkedin profile

Issue while calling foreach in Pyspark

2021-05-07 Thread rajat kumar
Hi Team, I am using Spark 2.4.4 with Python While using below line: dataframe.foreach(lambda record : process_logs(record)) My use case is , process logs will download the file from cloud storage using Python code and then it will save the processed data. I am getting the following error

Re: Updating spark-env.sh per application

2021-05-07 Thread Mich Talebzadeh
Hi, Environment variables Re read in when spark-submit kicks off. What exactly you need to refresh at the application level? HTH On Fri, 7 May 2021 at 11:34, Renu Yadav wrote: > Hi Team, > > Is it possible to override the variable of spark-env.sh on application > level ? > > Thanks &

Re: Updating spark-env.sh per application

2021-05-07 Thread Renu Yadav
Hi Team, Is it possible to override the variable of spark-env.sh on application level ? Thanks & Regards, Renu Yadav On Fri, May 7, 2021 at 12:16 PM Renu Yadav wrote: > Hi Team, > > Is it possible to override the variable of spark-env.sh on application > level ? > > Thanks & Regards, >