Just a consideration:
Is there a value in backup/restore metadata within spark? I would strongly
argue if the metadata is valuable enough and persistent enough, why dont
just use external metastore? It is fairly straightforward process. Also
regardless you are in cloud or not, database bkp is a
By yarn mode I meant dealing with issues raised in a cluster wide.
>From personal experience, I find it easier to trace these sorts of errors
when I run the code in local mode as it could be related to the set-up and
easier to track where things go wrong when one is dealing with local mode.
This
I don't see any reason to think this is related to YARN.
You haven't shown the actual error @rajat so not sure there is anything to
say.
On Fri, May 7, 2021 at 3:08 PM Mich Talebzadeh
wrote:
> I have suspicion that this may be caused by your cluster as it appears
> that you are running this in
For now we are thinking about adding two methods in Catalog API, not SQL
commands:
1. spark.catalog.backup, which backs up the current catalog.
2. spark.catalog.restore(file), which reads the DFS file and recreates the
entities described in that file.
Can you please give an example of exposing
I have suspicion that this may be caused by your cluster as it appears that
you are running this in YARN mode like below
spark-submit --master yarn --deploy-mode client xyx.py
What happens if you try running it in local mode?
spark-submit --master local[2] xyx.py
Is this run in a managed
If a catalog implements backup/restore, it can easily expose some client
APIs to the end-users (e.g. REST API), I don't see a strong reason to
expose the APIs to Spark. Do you plan to add new SQL commands in Spark to
backup/restore a catalog?
On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang
wrote:
Thanks Mich and Sean for the response . Yes Sean is right. This is a batch
job.
I am having only 10 records in the dataframe still it is giving this
exception
Following are the full logs.
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
584, in foreach
foreach definitely works :)
This is not a streaming question.
The error says that the JVM worker died for some reason. You'd have to look
at its logs to see why.
On Fri, May 7, 2021 at 11:03 AM Mich Talebzadeh
wrote:
> Hi,
>
> I am not convinced foreach works even in 3.1.1
> Try doing the same
Hi,
I am not convinced foreach works even in 3.1.1
Try doing the same with foreachBatch
foreachBatch(sendToSink). \
trigger(processingTime='2 seconds'). \
and see it works
HTH
view my Linkedin profile
Hi Team,
I am using Spark 2.4.4 with Python
While using below line:
dataframe.foreach(lambda record : process_logs(record))
My use case is , process logs will download the file from cloud storage
using Python code and then it will save the processed data.
I am getting the following error
Hi,
Environment variables Re read in when spark-submit kicks off. What exactly
you need to refresh at the application level?
HTH
On Fri, 7 May 2021 at 11:34, Renu Yadav wrote:
> Hi Team,
>
> Is it possible to override the variable of spark-env.sh on application
> level ?
>
> Thanks &
Hi Team,
Is it possible to override the variable of spark-env.sh on application
level ?
Thanks & Regards,
Renu Yadav
On Fri, May 7, 2021 at 12:16 PM Renu Yadav wrote:
> Hi Team,
>
> Is it possible to override the variable of spark-env.sh on application
> level ?
>
> Thanks & Regards,
>
12 matches
Mail list logo