[no subject]

2023-08-07 Thread Bode, Meikel
unsubscribe


Unsubscribe

2023-07-10 Thread Bode, Meikel
Unsubscribe


RE: Conda Python Env in K8S

2021-12-06 Thread Bode, Meikel, NMA-CFD
Hi Mich,

Thanks for your response. Yes -py-files options works. I also tested it.
The question is why the -archives option doesn't?

>From Jira I can see that it should be available since 3.1.0:

https://issues.apache.org/jira/browse/SPARK-33530
https://issues.apache.org/jira/browse/SPARK-33615

Best,
Meikel


From: Mich Talebzadeh 
Sent: Samstag, 4. Dezember 2021 18:36
To: Bode, Meikel, NMA-CFD 
Cc: dev ; u...@spark.apache.org
Subject: Re: Conda Python Env in K8S



Hi Meikel



In the past I tried with


   --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \
   --archives 
hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/pyspark_venv.zip#pyspark_venv \


which is basically what you are doing. the first line --py-files works but the 
second one fails



It tried to unpack them ? It tries to unpack them



Unpacking an archive 
hdfs://50.140.197.220:9000/minikube/codes/pyspark_venv.zip#pyspark_venv<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2F50.140.197.220%3A9000%2Fminikube%2Fcodes%2Fpyspark_venv.zip%23pyspark_venv=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cf9716ed642fe4c92be6f08d9b74c98bd%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637742362326413635%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=UinOKIfYC16iRnLiibB9kXsvoiEZ10DfVzHlKqJZTHk%3D=0>
 from /tmp/spark-502a5b57-0fe6-45bd-867d-9738e678e9a3/pyspark_venv.zip to 
/opt/spark/work-dir/./pyspark_venv



But it failed.



This could be due to creating the virtual environment inside the docker in the 
work-dir or sometimes when there is not enough available memory to gunzip and 
untar the file, especially if your executors are built on cluster nodes with 
less memory than the driver node.



However, The most convenient way to add additional packages to the docker image 
is to add them directly to the docker image at time of creating the image. So 
external packages are bundled as a part of my docker image because it is fixed 
and if an application requires those set of dependencies every time, they are 
there. Also note that every time you put RUN statement it creates an 
intermediate container and hence it increases build time. So reduce it as 
follows

RUN pip install pyyaml numpy cx_Oracle --no-cache-dir

The --no-cheche-dir option to pip is to prevent the downloaded binaries from 
being added to the image, reducing the image size. It is also advisable to 
install all packages in one line. Every time you put RUN statement it creates 
an intermediate container and hence it increases the build time. So reduce it 
by putting all packages in one line.

Log in to the docker image and check for Python packages installed

docker run -u 0 -it 
spark/spark-py:3.1.1-scala_2.12-8-jre-slim-buster_java8PlusPackages bash

root@5bc049af7278:/opt/spark/work-dir# pip list

PackageVersion

-- ---

cx-Oracle  8.3.0

numpy  1.21.4

pip21.3.1

PyYAML 6.0

setuptools 59.4.0

wheel  0.34.2
HTH

 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cf9716ed642fe4c92be6f08d9b74c98bd%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637742362326413635%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=i0NSWMcUHWBNBMV2Qe%2BejnJyFSNfGQkEs9KMh0OS5uY%3D=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Sat, 4 Dec 2021 at 07:52, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hi Mich,

sure thats possible. But distributing the complete env would be more practical.
A workaround at the moment is, that we build different environments and store 
them in a pv and then we mount it into the pods and refer from the 
SparkApplication resource to the desired env..

But actually these options exist and I want to understand what the issue is...
Any hints on that?

Best,
Meikel

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Sent: Freitag, 3. Dezember 2021 13:27
To: Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>>
Cc: dev mailto:dev@spark.apache.org>>; 
u...@spark.apache.org<mailto:u...@spark.apache.org>
Subject: Re: Conda Python Env in K8S

Build python packages into the docker image itself first with pip install

RUN pip install panda . . -no-cache

HTH

On Fri, 3 Dec 2021 at 11:58, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hello,

I am trying to run spa

RE: Conda Python Env in K8S

2021-12-03 Thread Bode, Meikel, NMA-CFD
Hi Mich,

sure thats possible. But distributing the complete env would be more practical.
A workaround at the moment is, that we build different environments and store 
them in a pv and then we mount it into the pods and refer from the 
SparkApplication resource to the desired env..

But actually these options exist and I want to understand what the issue is...
Any hints on that?

Best,
Meikel

From: Mich Talebzadeh 
Sent: Freitag, 3. Dezember 2021 13:27
To: Bode, Meikel, NMA-CFD 
Cc: dev ; u...@spark.apache.org
Subject: Re: Conda Python Env in K8S

Build python packages into the docker image itself first with pip install

RUN pip install panda . . -no-cache

HTH

On Fri, 3 Dec 2021 at 11:58, Bode, Meikel, NMA-CFD 
mailto:meikel.b...@bertelsmann.de>> wrote:
Hello,

I am trying to run spark jobs using Spark Kubernetes Operator.
But when I try to bundle a conda python environment using the following 
resource description the python interpreter is only unpack to the driver and 
not to the executors.

apiVersion: 
"sparkoperator.k8s.io/v1beta2<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsparkoperator.k8s.io%2Fv1beta2=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cb5110ae39caf431d2dbb08d9b65ac233%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637741323186317880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=yMQzpK2mKyoEThxxOakqJJmV7JbbX14nW4w46pZk3KQ%3D=0>"
kind: SparkApplication
metadata:
  name: ...
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  mainApplicationFile: local:///path/script.py
..
  sparkConf:
"spark.archives": "local:///path/conda-env.tar.gz#environment"
"spark.pyspark.python": "./environment/bin/python"
"spark.pyspark.driver.python": "./environment/bin/python"


The driver is unpacking the archive and the python scripts gets executed.
On executors there is no log message indicating that the archive gets unpacked.
Executors then fail as they cant find the python executable at the given 
location "./environment/bin/python".

Any hint?

Best,
Meikel
--




 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cb5110ae39caf431d2dbb08d9b65ac233%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637741323186327824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=KyGbLLVXpJxSumXg7GHnYIYiP2J7q%2Fe4UJJWJefjAnI%3D=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




Conda Python Env in K8S

2021-12-03 Thread Bode, Meikel, NMA-CFD
Hello,

I am trying to run spark jobs using Spark Kubernetes Operator.
But when I try to bundle a conda python environment using the following 
resource description the python interpreter is only unpack to the driver and 
not to the executors.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: ...
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  mainApplicationFile: local:///path/script.py
..
  sparkConf:
"spark.archives": "local:///path/conda-env.tar.gz#environment"
"spark.pyspark.python": "./environment/bin/python"
"spark.pyspark.driver.python": "./environment/bin/python"


The driver is unpacking the archive and the python scripts gets executed.
On executors there is no log message indicating that the archive gets unpacked.
Executors then fail as they cant find the python executable at the given 
location "./environment/bin/python".

Any hint?

Best,
Meikel


RE: HiveThrift2 ACID Transactions?

2021-11-11 Thread Bode, Meikel, NMA-CFD
Hi all,

I now have some more input related to the issues I face at the moment:

When I try to UPDATE an external table via JDBC connection to HiveThrift2 
server I get the following exception:

java.lang.UnsupportedOperationException: UPDATE TABLE is not supported 
temporarily.

Whey doing an DELETE I see:

org.apache.spark.sql.AnalysisException: DELETE is only supported with v2 tables.

INSERT is working as expected.

We are using Spark 3.1.2 with Hadoop 3.2.0 and an external Hive 3.0.0 metastore 
on K8S.
Warehouse dir is located at AWS s3 attached using protocol s3a.

I learned so far that  that we need to use an ACID compatible file format for 
external tables such as ORC order DELTA.
In addition to that we would need to set some ACID related properties either as 
first commands after session creation or via appropriate configuration files:

SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

Now, when I try to create the following table:

create external table acidtab (id string, val string)
stored as ORC location '/data/acidtab.orc'
tblproperties ('transactional'='true');

I see the following exception:

org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:The 
table must be stored using an ACID compliant format (such as ORC): 
default.acidtab)

Even when I try to create the file in ORC format the exception makes the 
suggestion to use ORC as it is required for ACID compliance.

Another point is that external tables are not getting deleted via DROP TABLE 
command. The only are being removed from the metastore but they remain 
physically available at their s3 bucket.

I tried with:

SET `hive.metastore.thrift.delete-files-on-drop`=true;

And also by setting:

TBLPROPERTIES ('external.table.purge'='true')


Any help on these issues would be very appreciated!

Many thanks,
Meikel Bode

From: Bode, Meikel, NMA-CFD 
Sent: Mittwoch, 10. November 2021 08:23
To: user ; dev 
Subject: HiveThrift2 ACID Transactions?

Hi all,

We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on 
parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.

At the moment, when executing UPDATE or DELETE operations those are getting 
blocked.

Anyone out who uses ACID transactions in combination with thrift2?

Best,
Meikel


HiveThrift2 ACID Transactions?

2021-11-09 Thread Bode, Meikel, NMA-CFD
Hi all,

We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on 
parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.

At the moment, when executing UPDATE or DELETE operations those are getting 
blocked.

Anyone out who uses ACID transactions in combination with thrift2?

Best,
Meikel


RE: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Bode, Meikel, NMA-CFD
Many thanks! 

From: Gengliang Wang 
Sent: Dienstag, 19. Oktober 2021 16:16
To: dev ; user 
Subject: [ANNOUNCE] Apache Spark 3.2.0

Hi all,

Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.

We'd like to thank our contributors and users for their contributions and early 
feedback to this release. This release would not have been possible without you.

To download Spark 3.2.0, head over to the download page: 
https://spark.apache.org/downloads.html

To view the release notes: 
https://spark.apache.org/releases/spark-release-3-2-0.html


RE: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-12 Thread Bode, Meikel, NMA-CFD
Yes.  Genliang. Many thanks.

From: Mich Talebzadeh 
Sent: Dienstag, 12. Oktober 2021 09:25
To: Gengliang Wang 
Cc: dev 
Subject: Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

great work Gengliang. Thanks for your tremendous contribution!




 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Tue, 12 Oct 2021 at 08:15, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
The vote passes with 28 +1s (10 binding +1s).
Thanks to all who helped with the release!

(* = binding)
+1:
- Gengliang Wang
- Michael Heuer
- Mridul Muralidharan *
- Sean Owen *
- Ruifeng Zheng
- Dongjoon Hyun *
- Yuming Wang
- Reynold Xin *
- Cheng Su
- Peter Toth
- Mich Talebzadeh
- Maxim Gekk
- Chao Sun
- Xinli Shang
- Huaxin Gao
- Kent Yao
- Liang-Chi Hsieh *
- Kousuke Saruta *
- Ye Zhou
- Cheng Pan
- Angers Zhu
- Wenchen Fan *
- Holden Karau *
- Yi Wu
- Ricardo Almeida
- DB Tsai *
- Thomas Graves *
- Terry Kim

+0: None

-1: None


RE: Time to start publishing Spark Docker Images?

2021-08-13 Thread Bode, Meikel, NMA-CFD
Hi all,

I am Meikel Bode and only an interested reader of dev and user list. Anyway, I 
would appreciate to have official docker images available.
Maybe one could get inspiration from the Jupyter docker stacks and provide an 
hierarchy of different images like this:

https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships

Having a core image only supporting Java, an extended supporting Python and/or 
R etc.

Looking forward to the discussion.

Best,
Meikel

From: Mich Talebzadeh 
Sent: Freitag, 13. August 2021 08:45
Cc: dev 
Subject: Re: Time to start publishing Spark Docker Images?

I concur this is a good idea and certainly worth exploring.

In practice, preparing docker images as deployable will throw some challenges 
because creating docker for Spark  is not really a singular modular unit, say  
creating docker for Jenkins. It involves different versions and different 
images for Spark and PySpark and most likely will end up as part of Kubernetes 
deployment.



Individuals and organisations will deploy it as the first cut. Great but I 
equally feel that good documentation on how to build a consumable deployable 
image will be more valuable.  FRom my own experience the current documentation 
should be enhanced, for example how to deploy working directories, additional 
Python packages, build with different Java versions  (version 8 or version 11) 
etc.



HTH


 
[https://docs.google.com/uc?export=download=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]
   view my Linkedin 
profile



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Fri, 13 Aug 2021 at 01:54, Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
Awesome, I've filed an INFRA ticket to get the ball rolling.

On Thu, Aug 12, 2021 at 5:48 PM John Zhuge 
mailto:jzh...@apache.org>> wrote:
+1

On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
+1, I think we generally agreed upon having it. Thanks Holden for headsup and 
driving this.

+@Dongjoon Hyun FYI

2021년 7월 22일 (목) 오후 12:22, Kent Yao 
mailto:yaooq...@gmail.com>>님이 작성:
+1

Bests,

Kent Yao
@ Data Science Center, Hangzhou Research Institute, NetEase Corp.
a spark enthusiast
kyuubiis
 a unified multi-tenant JDBC interface for large-scale data processing and 
analytics, built on top of Apache 
Spark.
spark-authorizerA
 Spark SQL extension which provides SQL Standard Authorization for Apache 
Spark.

RE: Recursive Queries or Recursive UDF?

2021-05-01 Thread Bode, Meikel, NMA-CFD
Hi all,

I created a running example of my data set and I describe what I want to 
achieve. The idea is to create a view over the resulting table and use it for 
later joins.
Instead if applying a UDF to a column using a dict with 20+ (growing) million 
records.

Example data set:

spark.createDataFrame(
[
("inquiry1", "quotation1"),

("inquiry2", "quotation2"),
("quotation2", "order2"),
("order2", "invoice2"),

("order3", "invoice3")
],
['parent', 'child']
).createOrReplaceTempView("hierarchy")

We see several hierarchies in the df above but we don’t have records indicating 
that e.g. inquiry1 is the root of one of the hierarchies.
So we have:

1: inquiry1 > quotation1
2: inquiry2 > quotation2 > order2
3: order3 > invoice3

What I need is the following. For every child I need the level 0 parent like 
this:

child, lvl-0-parent
quotation1, inquiry1
quotation2, inquiry2
order2, inquiry2
invoice2, inquiry2
invoice3, order3

It would be perfect to see that some of the entries actually are the root by 
indicating:
child, lvl-0-parent
inquiry1, null
inquiry2, null
order3, null

Actually that’s what I realized with my recursive UDF I put into the initial 
post.

Thank you for any hints on that issue! Any hints on the UDF solution are also 
very welcome:

Thx and best,
Meikel

From: Bode, Meikel, NMA-CFD
Sent: Freitag, 30. April 2021 12:16
To: user @spark 
Subject: Recursive Queries or Recursive UDF?

Hi all,

I implemented a recursive UDF, that tries to find a document number in a long 
list of predecessor documents. This can be a multi-level hierarchy:
C is successor of B is successor of A (but many more levels are possible)

As input to that UDF I prepare a dict that contains the complete document flow 
reduced to the required fields to follow the path back to the originating 
document.
The dict is broadcasted and then used  by the UDF. Actually this approach is 
very slow and now – as data growth – it kills my executors regularly so that 
RDDs get lost and task fail. Sometimes also the workers (docker containers) 
become unresponsive and are getting killed.

Here is the coding of the methods:

1.: Prepare and define the UDF, broadcast dict.

# Define function for recursive lookup of root document
def __gen_caseid_udf_sales_document_flow(self):
global bc_document_flow, udf_sales_document_flow

# Prepare docflow for broadcasting by only selecting required fields
df_vbfa_subset = self.spark.table("FLOWTABLE").select("clnt", 
"predecessor_head", "predecessor_item", "doc_num", "doc_item")

# Prepare dictionary for broadcast
document_flow_dic = {}
for clnt, predecessor_head, predecessor_item, doc_num, doc_item in 
df_subset.rdd.collect():
document_flow_dic[(clnt, doc_num, doc_item)] = predecessor_head, 
predecessor_item

# Broadcast dictionary to workers
bc_document_flow = self.spark.sparkContext.broadcast(document_flow_dic)

# Register new user defined function UDF
udf_vbfa_sales_document_flow = func.udf(gen_caseid_udf_sale_root_lookup)


2.: The recursive function used in the UDF
# Find root document
def gen_caseid_udf_sale_get_root_doc(lt, clnt, docnr, posnr):
global bc_document_flow

if not clnt or not docnr or not posnr:
return None, None

key = clnt, docnr, posnr

if key in lt:
docnr_tmp, item_tmp = lt[key]
if docnr_tmp == docnr and item_tmp == posnr:
return docnr, posnr
else:
return gen_caseid_udf_sale_get_root_doc(lt, clnt, docnr_tmp, 
item_tmp)
else:
return docnr, posnr

3: The UDF
# Define udf function to look up root document
def gen_caseid_udf_sale_root_lookup(clnt, doc_num, posnr):
global bc_document_flow # Name of the broad cast variable

lt = bc_document_flow.value
h, p = gen_caseid_udf_vbfa_sale_get_root_doc(lt, clnt, doc_num, posnr)
return str(clnt) + str(h) + str(p)

##
4. Usage of the UDF on a DF that might contain several ten thousands of rows:

# Lookup root document from document flow
documents = documents.withColumn("root_doc", 
udf_sales_document_flow(func.col('clnt'),
 
func.col('document_number’),
 
func.col('item_number')))

Do you have any hint on my coding or are there any ideas how to implement a 
recursive select without implement a potential unoptimizable UDF?
I came along 
https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL which 
might an option, does Spark support this kind of construct?

Thanks and all the best,
Meikel



AW: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Bode, Meikel, NMA-CFD
Congrats!

Von: Hyukjin Kwon 
Gesendet: Mittwoch, 3. März 2021 02:41
An: user @spark ; dev 
Betreff: [ANNOUNCE] Announcing Apache Spark 3.1.1

We are excited to announce Spark 3.1.1 today.

Apache Spark 3.1.1 is the second release of the 3.x line. This release adds
Python type annotations and Python dependency management support as part of 
Project Zen.
Other major updates include improved ANSI SQL compliance support, history 
server support
in structured streaming, the general availability (GA) of Kubernetes and node 
decommissioning
in Kubernetes and Standalone. In addition, this release continues to focus on 
usability, stability,
and polish while resolving around 1500 tickets.

We'd like to thank our contributors and users for their contributions and early 
feedback to
this release. This release would not have been possible without you.

To download Spark 3.1.1, head over to the download page:
http://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-1-1.html