Re: [VOTE] Release Spark 3.2.1 (RC1)

Bjørn Jørgensen Sat, 15 Jan 2022 11:19:00 -0800

2. Things

I did change the dockerfile from jupyter/docker-stacks to
https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile
then I build, tag and push.
And I start it with docker-compose like


version: '2.1'
services:
    jupyter:
        image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1
        restart: 'no'
        volumes:
            - ./notebooks:/home/jovyan/notebooks
        ports:
            - "8881:8888"
            - "8181:8080"
            - "7077:7077"
            - "4040:4040"
        environment:
            NB_UID: ${UID}
            NB_GID: ${GID}


1. If I change the java version to 17 I did get an error which I did not
copy. But have you built this with java 11 or java 17? I have notis that we
test using java 17, so I was hoping to update java to version 17.

2.

In a notebook I start spark by

from pyspark import pandas as ps
import re
import numpy as np
import os
#import pandas as pd

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField,
StringType,IntegerType

os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

def get_spark_session(app_name: str, conf: SparkConf):
    conf.setMaster('local[*]')
    conf \
      .set('spark.driver.memory', '64g')\
      .set("fs.s3a.access.key", "minio") \
      .set("fs.s3a.secret.key", "KEY") \
      .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
      .set("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
      .set("spark.hadoop.fs.s3a.path.style.access", "true") \
      .set("spark.sql.repl.eagerEval.enabled", "True") \
      .set("spark.sql.adaptive.enabled", "True") \
      .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
      .set("spark.sql.repl.eagerEval.maxNumRows", "10000")

    return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("Falk", SparkConf())

Then I run this code

f06 =
spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json")

pf06 = f06.to_pandas_on_spark()

pf06.info()



And I did not get any errors or warnings. But acording to
https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6

(Spark)DataFrame.to_pandas_on_spark is deprecated.

So I was supposed to get some info to change to pandas_api. Which I did not
get.





fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao <huaxin.ga...@gmail.com>:

> The two regressions have been fixed. I will cut RC2 tomorrow late
> afternoon.
>
> Thanks,
> Huaxin
>
> On Wed, Jan 12, 2022 at 9:11 AM huaxin gao <huaxin.ga...@gmail.com> wrote:
>
>> Thank you all for testing and voting!
>>
>> I will -1 this RC because
>> https://issues.apache.org/jira/browse/SPARK-37855 and
>> https://issues.apache.org/jira/browse/SPARK-37859 are regressions. These
>> are not blockers but I think it's better to fix them in 3.2.1. I will
>> prepare for RC2.
>>
>> Thanks,
>> Huaxin
>>
>> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao <y...@apache.org> wrote:
>>
>>> +1 (non-binding).
>>>
>>> Chao Sun <sunc...@apache.org> 于2022年1月12日周三 16:10写道：
>>>
>>>> +1 (non-binding). Thanks Huaxin for driving the release!
>>>>
>>>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng <ruife...@foxmail.com>
>>>> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Thanks, ruifeng zheng
>>>>>
>>>>> ------------------ Original ------------------
>>>>> *From:* "Cheng Su" <chen...@fb.com.INVALID>;
>>>>> *Date:* Wed, Jan 12, 2022 02:54 PM
>>>>> *To:* "Qian Sun"<qian.sun2...@gmail.com>;"huaxin gao"<
>>>>> huaxin.ga...@gmail.com>;
>>>>> *Cc:* "dev"<dev@spark.apache.org>;
>>>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>>
>>>>> +1 (non-binding). Checked commit history and ran some local tests.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Cheng Su
>>>>>
>>>>>
>>>>>
>>>>> *From: *Qian Sun <qian.sun2...@gmail.com>
>>>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM
>>>>> *To: *huaxin gao <huaxin.ga...@gmail.com>
>>>>> *Cc: *dev <dev@spark.apache.org>
>>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>
>>>>> Looks good. All integration tests passed.
>>>>>
>>>>>
>>>>>
>>>>> Qian
>>>>>
>>>>>
>>>>>
>>>>> 2022年1月11日 上午2:09，huaxin gao <huaxin.ga...@gmail.com> 写道：
>>>>>
>>>>>
>>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.2.1.
>>>>>
>>>>>
>>>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if
>>>>> a majority
>>>>>
>>>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>>>>>
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 3.2.1
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> There are currently no issues targeting 3.2.1 (try project = SPARK AND
>>>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In
>>>>> Progress"))
>>>>>
>>>>> The tag to be voted on is v3.2.1-rc1 (commit
>>>>> 2b0ee226f8dd17b278ad11139e62464433191653):
>>>>>
>>>>> https://github.com/apache/spark/tree/v3.2.1-rc1
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1395/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
>>>>>
>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>> URL:
>>>>> https://s.apache.org/7tzik
>>>>>
>>>>> This release is using the release script of the tag v3.2.1-rc1.
>>>>>
>>>>> FAQ
>>>>>
>>>>>
>>>>> =========================
>>>>> How can I help test this release?
>>>>> =========================
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>>> you can add the staging repository to your projects resolvers and test
>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>> you don't end up building with an out of date RC going forward).
>>>>>
>>>>> ===========================================
>>>>> What should happen to JIRA tickets still targeting 3.2.1?
>>>>> ===========================================
>>>>>
>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>> Version/s" = 3.2.1
>>>>>
>>>>> Committers should look at those and triage. Extremely important bug
>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>> be worked on immediately. Everything else please retarget to an
>>>>> appropriate release.
>>>>>
>>>>> ==================
>>>>> But my bug isn't fixed?
>>>>> ==================
>>>>>
>>>>> In order to make timely releases, we will typically not hold the
>>>>> release unless the bug in question is a regression from the previous
>>>>> release. That being said, if there is something which is a regression
>>>>> that has not been correctly targeted please ping me or a committer to
>>>>> help target the issue.
>>>>>
>>>>>
>>>>>
>>>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: [VOTE] Release Spark 3.2.1 (RC1)

Reply via email to