Continue on the ticket - I am not sure this is established. We would block a release for critical problems that are not regressions. This is not a data loss / 'deleting data' issue even if valid. You're welcome to provide feedback but votes are for the PMC.
On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote: > Ok, but deleting users' data without them knowing it is never a good idea. > That's why I give this RC -1. > > lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen <sro...@gmail.com>: > >> (Bjorn - unless this is a regression, it would not block a release, even >> if it's a bug) >> >> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <bjornjorgen...@gmail.com> >> wrote: >> >>> [x] -1 Do not release this package because, deletes all my columns with >>> only Null in it. >>> >>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for >>> this bug. >>> >>> >>> >>> >>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sro...@gmail.com>: >>> >>>> (Are you suggesting this is a regression, or is it a general question? >>>> here we're trying to figure out whether there are critical bugs introduced >>>> in 3.2.1 vs 3.2.0) >>>> >>>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen < >>>> bjornjorgen...@gmail.com> wrote: >>>> >>>>> Hi, I am wondering if it's a bug or not. >>>>> >>>>> I do have a lot of json files, where they have some columns that are >>>>> all "null" on. >>>>> >>>>> I start spark with >>>>> >>>>> from pyspark import pandas as ps >>>>> import re >>>>> import numpy as np >>>>> import os >>>>> import pandas as pd >>>>> >>>>> from pyspark import SparkContext, SparkConf >>>>> from pyspark.sql import SparkSession >>>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, >>>>> expr >>>>> from pyspark.sql.types import StructType, StructField, >>>>> StringType,IntegerType >>>>> >>>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1" >>>>> >>>>> def get_spark_session(app_name: str, conf: SparkConf): >>>>> conf.setMaster('local[*]') >>>>> conf \ >>>>> .set('spark.driver.memory', '64g')\ >>>>> .set("fs.s3a.access.key", "minio") \ >>>>> .set("fs.s3a.secret.key", "") \ >>>>> .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \ >>>>> .set("spark.hadoop.fs.s3a.impl", >>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \ >>>>> .set("spark.hadoop.fs.s3a.path.style.access", "true") \ >>>>> .set("spark.sql.repl.eagerEval.enabled", "True") \ >>>>> .set("spark.sql.adaptive.enabled", "True") \ >>>>> .set("spark.serializer", >>>>> "org.apache.spark.serializer.KryoSerializer") \ >>>>> .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \ >>>>> .set("sc.setLogLevel", "error") >>>>> >>>>> return >>>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() >>>>> >>>>> spark = get_spark_session("Falk", SparkConf()) >>>>> >>>>> d3 = >>>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json") >>>>> >>>>> import pyspark >>>>> def sparkShape(dataFrame): >>>>> return (dataFrame.count(), len(dataFrame.columns)) >>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape >>>>> print(d3.shape()) >>>>> >>>>> >>>>> (653610, 267) >>>>> >>>>> >>>>> d3.write.json("d3.json") >>>>> >>>>> >>>>> d3 = spark.read.json("d3.json/*.json") >>>>> >>>>> import pyspark >>>>> def sparkShape(dataFrame): >>>>> return (dataFrame.count(), len(dataFrame.columns)) >>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape >>>>> print(d3.shape()) >>>>> >>>>> (653610, 186) >>>>> >>>>> >>>>> So spark is deleting 81 columns. I think that all of these 81 deleted >>>>> columns have only Null in them. >>>>> >>>>> Is this a bug or has this been made on purpose? >>>>> >>>>> >>>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <huaxin.ga...@gmail.com >>>>> >: >>>>> >>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and >>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 >>>>>> votes. [ >>>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release >>>>>> this package because ... To learn more about Apache Spark, please see >>>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 >>>>>> (commit 4f25b3f71238a00508a356591553f2dfa89f8290): >>>>>> https://github.com/apache/spark/tree/v3.2.1-rc2 >>>>>> The release files, including signatures, digests, etc. can be found >>>>>> at:https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/ >>>>>> Signatures used for Spark RCs can be found in this file: >>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging >>>>>> repository for this release can be found at: >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/ >>>>>> >>>>>> The documentation corresponding to this release can be found at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/ >>>>>> The list of bug fixes going into 3.2.1 can be found at the following >>>>>> URL:https://s.apache.org/yu0cy >>>>>> >>>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ >>>>>> ========================= How can I help test this release? >>>>>> ========================= If you are a Spark user, you can help us test >>>>>> this release by taking an existing Spark workload and running on this >>>>>> release candidate, then reporting any regressions. If you're working in >>>>>> PySpark you can set up a virtual env and install the current RC and see >>>>>> if >>>>>> anything important breaks, in the Java/Scala you can add the staging >>>>>> repository to your projects resolvers and test with the RC (make sure to >>>>>> clean up the artifact cache before/after so you don't end up building >>>>>> with >>>>>> a out of date RC going forward). >>>>>> =========================================== What should happen to JIRA >>>>>> tickets still targeting 3.2.1? >>>>>> =========================================== >>>>>> The current list of open tickets targeted at 3.2.1 can be found at: >>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely >>>>>> important bug fixes, documentation, and API tweaks that impact >>>>>> compatibility should be worked on immediately. Everything else please >>>>>> retarget to an appropriate release. ================== But my bug isn't >>>>>> fixed? ================== In order to make timely releases, we will >>>>>> typically not hold the release unless the bug in question is a regression >>>>>> from the previous release. That being said, if there is something which >>>>>> is >>>>>> a regression that has not been correctly targeted please ping me or a >>>>>> committer to help target the issue. >>>>>> >>>>> >>>>> >>>>> -- >>>>> Bjørn Jørgensen >>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>> Norge >>>>> >>>>> +47 480 94 297 >>>>> >>>> >>> >>> -- >>> Bjørn Jørgensen >>> Vestre Aspehaug 4, 6010 Ålesund >>> Norge >>> >>> +47 480 94 297 >>> >> > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 >