Re: PyCharm IDE throws spark error

2020-11-13 Thread Wim Van Leuven
No Java installed? Or process can but find it? Java-home not set? On Fri, 13 Nov 2020 at 23:24, Mich Talebzadeh wrote: > Hi, > > This is basically a simple module > > from pyspark import SparkContext > from pyspark.sql import SQLContext > from pyspark.sql import HiveContext > from pyspark.sql

PyCharm IDE throws spark error

2020-11-13 Thread Mich Talebzadeh
Hi, This is basically a simple module from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.types import StringType, ArrayType from pyspark.sql.functions import

Re: Refreshing Data in Spark Memory (DataFrames)

2020-11-13 Thread Lalwani, Jayesh
* When you say refresh happens for only batch or non-streaming sources, I am assuming all kinds of DB sources like RDBMS, Distributed data store, file system etc as batch sources. Please correct if required. It depends on how you read the data frame. Any dataframe that you get by doing

Re: Refreshing Data in Spark Memory (DataFrames)

2020-11-13 Thread Arti Pande
Thanks for quick response. This is a batch use case in as-is world. We are redesigning it and intend to use streaming. Good to know that spark streaming will refresh data for every microbatch. When you say refresh happens for only batch or non-streaming sources, I am assuming all kinds of DB

Re: Refreshing Data in Spark Memory (DataFrames)

2020-11-13 Thread Lalwani, Jayesh
Is this a streaming application or a batch application? Normally, for batch applications, you want to keep data consistent. If you have a portfolio of mortgages that you are computing payments for and the interest rate changes while you are computing payments, you don’t want to compute half

Refreshing Data in Spark Memory (DataFrames)

2020-11-13 Thread Arti Pande
Hi In the financial systems world, if some data is being updated too frequently, and that data is to be used as reference data by a Spark job that runs for 6/7 hours, most likely Spark job may read that data at the beginning and keep it in memory as DataFrame and will keep running for remaining

Spark on Kubernetes

2020-11-13 Thread Arti Pande
Hi, Is it recommended to use Spark on K8S in production? Spark operator for Kubernetes seems to be in beta state.