Hello All,
We have a requirement to run PySpark in standalone cluster mode and also
reference python libraries (egg/wheel) which are not local but placed in a
distributed storage like HDFS. From the code it looks like none of cases are
supported.
Questions are:
1. Why is PySpark
Hello Everyone,
I have my parquet files stored on HDFS. I am trying to create a table in
Hive Metastore from Spark SQL. I have an Avro schema file from which I
generated the parquet files.
I am doing the following to create the table.
1) Firstly create an Avro dummy table from the schema
Hi,
Checking the Spark Sources, I faced with a type BDV:
breeze.linalg.{DenseVector => BDV}
and they used it in calculating IDF from Term Frequencies. What is it
exactly?
Hi Spark Community,
I am reaching out to see if there are current large scale production or
pre-production deployment of Spark on k8s for batch and micro batch jobs.
Large scale means running 100s of thousand spark jobs daily and 1000s of
concurrent spark jobs on a single k8s cluster and 10s of
Could you please give some feedback.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Thanks this is a great news
Can you please lemme if dynamic resource allocation is available in spark
2.4?
I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor
memory options as part of spark submit command or spark will manage
required executor memory based on the spark job