JDBCConnectionProvider in Spark

2022-01-05 Thread Artemis User
Could someone provide some insight/examples on the usage of this API? https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/jdbc/JdbcConnectionProvider.html Why is it needed since this is an abstract class and there isn't any concrete implementation of it?   Thanks a lot in

Re: Newbie pyspark memory mgmt question

2022-01-05 Thread Andrew Davidson
Thanks Sean Andy From: Sean Owen Date: Wednesday, January 5, 2022 at 3:38 PM To: Andrew Davidson , Nicholas Gustafson Cc: "user @spark" Subject: Re: Newbie pyspark memory mgmt question There is no memory leak, no. You can .cache() or .persist() DataFrames, and that can use memory until

Re: Newbie pyspark memory mgmt question

2022-01-05 Thread Sean Owen
There is no memory leak, no. You can .cache() or .persist() DataFrames, and that can use memory until you .unpersist(), but you're not doing that and they are garbage collected anyway. Hard to say what's running out of memory without knowing more about your data size, partitions, cluster size, etc

Newbie pyspark memory mgmt question

2022-01-05 Thread Andrew Davidson
Hi I am running into OOM problems. My cluster should be much bigger than I need. I wonder if it has to do with the way I am writing my code. Below are three style cases. I wonder if they cause memory to be leaked? Case 1 : df1 = spark.read.load( cvs file) df1 = df1.someTransform() df1 =

Re: Spark 3.2 - ReusedExchange not present in join execution plan

2022-01-05 Thread Abdeali Kothari
Just thought I'd do a quick bump and add the dev mailing list - in case there is some insight there Feels like this should be categorized as a bug for spark 3.2.0 On Wed, Dec 29, 2021 at 5:25 PM Abdeali Kothari wrote: > Hi, > I am using pyspark for some projects. And one of the things we are

Re: pyspark

2022-01-05 Thread Mich Talebzadeh
hm, If I understand correctly from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContext, HiveContext import sys def spark_session(appName): return SparkSession.builder \ .appName(appName) \ .enableHiveSupport() \

Re: pyspark

2022-01-05 Thread Artemis User
Did you install and configure the proper Spark kernel (SparkMagic) on your Jupyter Lab or Hub?  See https://github.com/jupyter/jupyter/wiki/Jupyter-kernels for more info... On 1/5/22 4:01 AM, 流年以东” wrote: In the process of using pyspark,there is no spark context when opening jupyter and