2. Things I did change the dockerfile from jupyter/docker-stacks to https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile then I build, tag and push. And I start it with docker-compose like
version: '2.1' services: jupyter: image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1 restart: 'no' volumes: - ./notebooks:/home/jovyan/notebooks ports: - "8881:8888" - "8181:8080" - "7077:7077" - "4040:4040" environment: NB_UID: ${UID} NB_GID: ${GID} 1. If I change the java version to 17 I did get an error which I did not copy. But have you built this with java 11 or java 17? I have notis that we test using java 17, so I was hoping to update java to version 17. 2. In a notebook I start spark by from pyspark import pandas as ps import re import numpy as np import os #import pandas as pd from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr from pyspark.sql.types import StructType, StructField, StringType,IntegerType os.environ["PYARROW_IGNORE_TIMEZONE"]="1" def get_spark_session(app_name: str, conf: SparkConf): conf.setMaster('local[*]') conf \ .set('spark.driver.memory', '64g')\ .set("fs.s3a.access.key", "minio") \ .set("fs.s3a.secret.key", "KEY") \ .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \ .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \ .set("spark.hadoop.fs.s3a.path.style.access", "true") \ .set("spark.sql.repl.eagerEval.enabled", "True") \ .set("spark.sql.adaptive.enabled", "True") \ .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .set("spark.sql.repl.eagerEval.maxNumRows", "10000") return SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() spark = get_spark_session("Falk", SparkConf()) Then I run this code f06 = spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json") pf06 = f06.to_pandas_on_spark() pf06.info() And I did not get any errors or warnings. But acording to https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6 (Spark)DataFrame.to_pandas_on_spark is deprecated. So I was supposed to get some info to change to pandas_api. Which I did not get. fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao <huaxin.ga...@gmail.com>: > The two regressions have been fixed. I will cut RC2 tomorrow late > afternoon. > > Thanks, > Huaxin > > On Wed, Jan 12, 2022 at 9:11 AM huaxin gao <huaxin.ga...@gmail.com> wrote: > >> Thank you all for testing and voting! >> >> I will -1 this RC because >> https://issues.apache.org/jira/browse/SPARK-37855 and >> https://issues.apache.org/jira/browse/SPARK-37859 are regressions. These >> are not blockers but I think it's better to fix them in 3.2.1. I will >> prepare for RC2. >> >> Thanks, >> Huaxin >> >> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao <y...@apache.org> wrote: >> >>> +1 (non-binding). >>> >>> Chao Sun <sunc...@apache.org> 于2022年1月12日周三 16:10写道: >>> >>>> +1 (non-binding). Thanks Huaxin for driving the release! >>>> >>>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng <ruife...@foxmail.com> >>>> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Thanks, ruifeng zheng >>>>> >>>>> ------------------ Original ------------------ >>>>> *From:* "Cheng Su" <chen...@fb.com.INVALID>; >>>>> *Date:* Wed, Jan 12, 2022 02:54 PM >>>>> *To:* "Qian Sun"<qian.sun2...@gmail.com>;"huaxin gao"< >>>>> huaxin.ga...@gmail.com>; >>>>> *Cc:* "dev"<dev@spark.apache.org>; >>>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1) >>>>> >>>>> +1 (non-binding). Checked commit history and ran some local tests. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Cheng Su >>>>> >>>>> >>>>> >>>>> *From: *Qian Sun <qian.sun2...@gmail.com> >>>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM >>>>> *To: *huaxin gao <huaxin.ga...@gmail.com> >>>>> *Cc: *dev <dev@spark.apache.org> >>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1) >>>>> >>>>> +1 >>>>> >>>>> >>>>> >>>>> Looks good. All integration tests passed. >>>>> >>>>> >>>>> >>>>> Qian >>>>> >>>>> >>>>> >>>>> 2022年1月11日 上午2:09,huaxin gao <huaxin.ga...@gmail.com> 写道: >>>>> >>>>> >>>>> >>>>> Please vote on releasing the following candidate as Apache Spark >>>>> version 3.2.1. >>>>> >>>>> >>>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if >>>>> a majority >>>>> >>>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes. >>>>> >>>>> >>>>> [ ] +1 Release this package as Apache Spark 3.2.1 >>>>> [ ] -1 Do not release this package because ... >>>>> >>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>> >>>>> There are currently no issues targeting 3.2.1 (try project = SPARK AND >>>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In >>>>> Progress")) >>>>> >>>>> The tag to be voted on is v3.2.1-rc1 (commit >>>>> 2b0ee226f8dd17b278ad11139e62464433191653): >>>>> >>>>> https://github.com/apache/spark/tree/v3.2.1-rc1 >>>>> >>>>> The release files, including signatures, digests, etc. can be found at: >>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/ >>>>> >>>>> Signatures used for Spark RCs can be found in this file: >>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>> >>>>> The staging repository for this release can be found at: >>>>> https://repository.apache.org/content/repositories/orgapachespark-1395/ >>>>> >>>>> The documentation corresponding to this release can be found at: >>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/ >>>>> >>>>> The list of bug fixes going into 3.2.1 can be found at the following >>>>> URL: >>>>> https://s.apache.org/7tzik >>>>> >>>>> This release is using the release script of the tag v3.2.1-rc1. >>>>> >>>>> FAQ >>>>> >>>>> >>>>> ========================= >>>>> How can I help test this release? >>>>> ========================= >>>>> >>>>> If you are a Spark user, you can help us test this release by taking >>>>> an existing Spark workload and running on this release candidate, then >>>>> reporting any regressions. >>>>> >>>>> If you're working in PySpark you can set up a virtual env and install >>>>> the current RC and see if anything important breaks, in the Java/Scala >>>>> you can add the staging repository to your projects resolvers and test >>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>> you don't end up building with an out of date RC going forward). >>>>> >>>>> =========================================== >>>>> What should happen to JIRA tickets still targeting 3.2.1? >>>>> =========================================== >>>>> >>>>> The current list of open tickets targeted at 3.2.1 can be found at: >>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>>>> Version/s" = 3.2.1 >>>>> >>>>> Committers should look at those and triage. Extremely important bug >>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>> be worked on immediately. Everything else please retarget to an >>>>> appropriate release. >>>>> >>>>> ================== >>>>> But my bug isn't fixed? >>>>> ================== >>>>> >>>>> In order to make timely releases, we will typically not hold the >>>>> release unless the bug in question is a regression from the previous >>>>> release. That being said, if there is something which is a regression >>>>> that has not been correctly targeted please ping me or a committer to >>>>> help target the issue. >>>>> >>>>> >>>>> >>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297