Is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true

2020-12-17 Thread 疯狂的哈丘
`spark.sql.statistics.size.autoUpdate.enabled` is only work for table stats update.But for partition stats,I can only update it with `ANALYZE TABLE tablename PARTITION(part) COMPUTE STATISTICS`.So is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true?

[Spark Structured Streaming] Not working while worker node is on different machine

2020-12-17 Thread bannya
Hi, I have a spark structured streaming application that is reading data from a Kafka topic (16 partitions). I am using standalone mode. I have two workers node, one node is on the same machine with masters and another one is on a different machine. Both of the worker nodes has 8 cores and 16G

Re: Issue while installing dependencies Python Spark

2020-12-17 Thread Artemis User
Wheel is used for package management and setting up your virtual environment , not used as a library package.  To run spark-submit in a virtual env, use the --py-files option instead.  Usage: --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the

Re: Getting error message

2020-12-17 Thread Vikas Garg
Thanks On Fri, 18 Dec 2020, 00:30 Patrick McCarthy, wrote: > Possibly. In that case maybe you should step back from spark and see if > there are OS-level tools to understand what's going on, like looking for > evidence of the OOM killer - >

Re: Getting error message

2020-12-17 Thread Patrick McCarthy
Possibly. In that case maybe you should step back from spark and see if there are OS-level tools to understand what's going on, like looking for evidence of the OOM killer - https://docs.memset.com/other/linux-s-oom-process-killer On Thu, Dec 17, 2020 at 1:45 PM Vikas Garg wrote: > I am running

Re: Getting error message

2020-12-17 Thread Vikas Garg
I am running code in a local machine that is single node machine. Getting into logs, it looked like the host is killed. This is happening very frequently an I am unable to find the reason of this. Could low memory be the reason? On Fri, 18 Dec 2020, 00:11 Patrick McCarthy, wrote: > 'Job

Re: Getting error message

2020-12-17 Thread Patrick McCarthy
'Job aborted due to stage failure: Task 1 in stage 39.0 failed 1 times' You may want to change the number of failures to a higher number like 4. A single failure on a task should be able to be tolerated, especially if you're on a shared cluster where resources can be preempted. It seems that a

Re: Getting error message

2020-12-17 Thread Vikas Garg
Mydomain is named by me while pasting the logs Also, there are multiple class files in my project, if I run any 1 or 2 at a time, then they run fine, sometimes they too give this error. But running all the classes at the same time always give this error. Once this error come, I can't run any

Re: Issue while installing dependencies Python Spark

2020-12-17 Thread Patrick McCarthy
I'm not very familiar with the environments on cloud clusters, but in general I'd be reluctant to lean on setuptools or other python install mechanisms. In the worst case, you might encounter /usr/bin/pip not having permissions to install new packages, or even if you do a package might require

Issue while installing dependencies Python Spark

2020-12-17 Thread Sachit Murarka
Hi Users I have a wheel file , while creating it I have mentioned dependencies in setup.py file. Now I have 2 virtual envs, 1 was already there . another one I created just now. I have switched to new virtual env, I want spark to download the dependencies while doing spark-submit using wheel.