Hi Chao & DB, Actually, I cut the RC2 yesterday before your post the Parquet issue: https://github.com/apache/spark/tree/v3.2.0-rc2 It has been 11 days since RC1. I think we can have RC2 today so that the community can test and find potential issues earlier. As for the Parquet issue, we can treat it as a known blocker. If it takes more than one week(which is not likely to happen), we will have to consider reverting Parquet 1.12 and related features from branch-3.2.
Gengliang On Wed, Sep 1, 2021 at 5:40 AM DB Tsai <dbt...@dbtsai.com.invalid> wrote: > Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet > 1.12, so it might be easier to wait for the fix in parquet community > instead of reverting all the related changes. The fix in parquet community > is very trivial, and we hope that it will not take too long. Thanks. > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > > On Tue, Aug 31, 2021 at 1:09 PM Chao Sun <sunc...@apache.org> wrote: > >> Hi Xiao, I'm still checking with the Parquet community on this. Since the >> fix is already +1'd, I'm hoping this won't take long. The delta in >> parquet-1.12.x branch is also small with just 2 commits so far. >> >> Chao >> >> On Tue, Aug 31, 2021 at 12:03 PM Xiao Li <lix...@databricks.com> wrote: >> >>> Hi, Chao, >>> >>> How long will it take? Normally, in the RC stage, we always revert the >>> upgrade made in the current release. We did the parquet upgrade multiple >>> times in the previous releases for avoiding the major delay in our Spark >>> release >>> >>> Thanks, >>> >>> Xiao >>> >>> >>> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun <sunc...@apache.org> wrote: >>> >>>> The Apache Parquet community found an issue [1] in 1.12.0 which could >>>> cause incorrect file offset being written and subsequently reading of the >>>> same file to fail. A fix has been proposed in the same JIRA and we may have >>>> to wait until a new release is available so that we can upgrade Spark with >>>> the hot fix. >>>> >>>> [1]: https://issues.apache.org/jira/browse/PARQUET-2078 >>>> >>>> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> Maybe, I'm just confused why it's needed at all. Other profiles that >>>>> add a dependency seem OK, but something's different here. >>>>> >>>>> One thing we can/should change is to simply remove the >>>>> <dependencyManagement> block in the profile. It should always be a direct >>>>> dep in Scala 2.13 (which lets us take out the profiles in submodules, >>>>> which >>>>> just repeat that) >>>>> We can also update the version, by the by. >>>>> >>>>> I tried this and the resulting POM still doesn't look like what I >>>>> expect though. >>>>> >>>>> (The binary release is OK, FWIW - it gets pulled in as a JAR as >>>>> expected) >>>>> >>>>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy <s...@infomedia.com.au> >>>>> wrote: >>>>> >>>>>> Hi Sean, >>>>>> >>>>>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ >>>>>> will >>>>>> help you out here. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Steve C >>>>>> >>>>>> On 27 Aug 2021, at 12:29 pm, Sean Owen <sro...@gmail.com> wrote: >>>>>> >>>>>> OK right, you would have seen a different error otherwise. >>>>>> >>>>>> Yes profiles are only a compile-time thing, but they should affect >>>>>> the effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom >>>>>> shows scala-parallel-collections as a dependency in the POM as expected >>>>>> (not in a profile). However I see what you see in the .pom in the release >>>>>> repo, and in my local repo after building - it's just sitting there as a >>>>>> profile as if it weren't activated or something. >>>>>> >>>>>> I'm confused then, that shouldn't be what happens. I'd say maybe >>>>>> there is a problem with the release script, but seems to affect a simple >>>>>> local build. Anyone else more expert in this see the problem, while I try >>>>>> to debug more? >>>>>> The binary distro may actually be fine, I'll check; it may even not >>>>>> matter much for users who generally just treat Spark as a >>>>>> compile-time-only >>>>>> dependency either. But I can see it would break exactly your case, >>>>>> something like a self-contained test job. >>>>>> >>>>>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy <s...@infomedia.com.au> >>>>>> wrote: >>>>>> >>>>>>> I did indeed. >>>>>>> >>>>>>> The generated spark-core_2.13-3.2.0.pom that is created alongside >>>>>>> the jar file in the local repo contains: >>>>>>> >>>>>>> <profile> >>>>>>> <id>scala-2.13</id> >>>>>>> <dependencies> >>>>>>> <dependency> >>>>>>> <groupId>org.scala-lang.modules</groupId> >>>>>>> >>>>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>>>>>> </dependency> >>>>>>> </dependencies> >>>>>>> </profile> >>>>>>> >>>>>>> which means this dependency will be missing for unit tests that >>>>>>> create SparkSessions from library code only, a technique inspired by >>>>>>> Spark’s own unit tests. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Steve C >>>>>>> >>>>>>> On 27 Aug 2021, at 11:33 am, Sean Owen <sro...@gmail.com> wrote: >>>>>>> >>>>>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required >>>>>>> first to update POMs. It works fine for me. >>>>>>> >>>>>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy < >>>>>>> s...@infomedia.com.au.invalid> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Being adventurous I have built the RC1 code with: >>>>>>>> >>>>>>>> -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver >>>>>>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2 >>>>>>>> >>>>>>>> >>>>>>>> And then attempted to build my Java based spark application. >>>>>>>> >>>>>>>> However, I found a number of our unit tests were failing with: >>>>>>>> >>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>> scala/collection/parallel/TaskSupport >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412) >>>>>>>> at >>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) >>>>>>>> at >>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) >>>>>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789) >>>>>>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) >>>>>>>> … >>>>>>>> >>>>>>>> >>>>>>>> I tracked this down to a missing dependency: >>>>>>>> >>>>>>>> <dependency> >>>>>>>> <groupId>org.scala-lang.modules</groupId> >>>>>>>> >>>>>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>>>>>>> </dependency> >>>>>>>> >>>>>>>> >>>>>>>> which unfortunately appears only in a profile in the pom files >>>>>>>> associated with the various spark dependencies. >>>>>>>> >>>>>>>> As far as I know it is not possible to activate profiles in >>>>>>>> dependencies in maven builds. >>>>>>>> >>>>>>>> Therefore I suspect that right now a Scala 2.13 migration is not >>>>>>>> quite as seamless as we would like. >>>>>>>> >>>>>>>> I stress that this is only an issue for developers that write unit >>>>>>>> tests for their applications, as the Spark runtime environment will >>>>>>>> always >>>>>>>> have the necessary dependencies available to it. >>>>>>>> >>>>>>>> (You might consider upgrading the >>>>>>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from >>>>>>>> 0.2 to >>>>>>>> 1.0.3 though!) >>>>>>>> >>>>>>>> Cheers and thanks for the great work! >>>>>>>> >>>>>>>> Steve Coy >>>>>>>> >>>>>>>> >>>>>>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang <ltn...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>> version 3.2.0. >>>>>>>> >>>>>>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a >>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>>> >>>>>>>> [ ] +1 Release this package as Apache Spark 3.2.0 >>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>> >>>>>>>> To learn more about Apache Spark, please see http://spark >>>>>>>> .apache.org/ >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738454069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=R0QBrNxN%2FYd9HrCrihR5XgRZF7jYRHcq931lLXwhQeQ%3D&reserved=0> >>>>>>>> >>>>>>>> The tag to be voted on is v3.2.0-rc1 (commit >>>>>>>> 6bb3523d8e838bd2082fb90d7f3741339245c044): >>>>>>>> https://github.com/apache/spark/tree/v3.2.0-rc1 >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc1&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=aDmKWoXWZNsrYv6bLP%2F78rnC8rbhYEbOVoJ3FwQ49yU%3D&reserved=0> >>>>>>>> >>>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>>> at: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/ >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6w0zf1lNPWdTeSLOGmUo4yMkDwd6xwC4o7EUkw1n9gI%3D&reserved=0> >>>>>>>> >>>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=x7XeOjMPwuEqR%2FuXijVjAlwf68MuVInqGhZ9l19eVPI%3D&reserved=0> >>>>>>>> >>>>>>>> The staging repository for this release can be found at: >>>>>>>> >>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1388 >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1388&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=DLKn1scc4YOYUNGP51ch4nkxr1lh5nhZIBj0%2BoBSCXo%3D&reserved=0> >>>>>>>> >>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/ >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-docs%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=QtfYYwnJlQIHry0TlmQy72y2DYzat1MQmpBQkATw%2BAQ%3D&reserved=0> >>>>>>>> >>>>>>>> The list of bug fixes going into 3.2.0 can be found at the >>>>>>>> following URL: >>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407 >>>>>>>> >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK%2Fversions%2F12349407&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=cop5XebB3u0dc2rRqe4YvHfCJ2w9yLlhcdaGB7TSTas%3D&reserved=0> >>>>>>>> >>>>>>>> This release is using the release script of the tag v3.2.0-rc1. >>>>>>>> >>>>>>>> >>>>>>>> FAQ >>>>>>>> >>>>>>>> ========================= >>>>>>>> How can I help test this release? >>>>>>>> ========================= >>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>> taking >>>>>>>> an existing Spark workload and running on this release candidate, >>>>>>>> then >>>>>>>> reporting any regressions. >>>>>>>> >>>>>>>> If you're working in PySpark you can set up a virtual env and >>>>>>>> install >>>>>>>> the current RC and see if anything important breaks, in the >>>>>>>> Java/Scala >>>>>>>> you can add the staging repository to your projects resolvers and >>>>>>>> test >>>>>>>> with the RC (make sure to clean up the artifact cache before/after >>>>>>>> so >>>>>>>> you don't end up building with a out of date RC going forward). >>>>>>>> >>>>>>>> =========================================== >>>>>>>> What should happen to JIRA tickets still targeting 3.2.0? >>>>>>>> =========================================== >>>>>>>> The current list of open tickets targeted at 3.2.0 can be found at: >>>>>>>> https://issues.apache.org/jira/projects/SPARK >>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=k5gTpGV4JvGRC6gKOXY%2BlaZKAH5NPFM3nDwmRyNDiQA%3D&reserved=0> >>>>>>>> and >>>>>>>> search for "Target Version/s" = 3.2.0 >>>>>>>> >>>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>>> fixes, documentation, and API tweaks that impact compatibility >>>>>>>> should >>>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>>> appropriate release. >>>>>>>> >>>>>>>> ================== >>>>>>>> But my bug isn't fixed? >>>>>>>> ================== >>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>> release unless the bug in question is a regression from the previous >>>>>>>> release. That being said, if there is something which is a >>>>>>>> regression >>>>>>>> that has not been correctly targeted please ping me or a committer >>>>>>>> to >>>>>>>> help target the issue. >>>>>>>> >>>>>>>> >>>>>>>> This email contains confidential information of and is the >>>>>>>> copyright of Infomedia. It must not be forwarded, amended or disclosed >>>>>>>> without consent of the sender. If you received this message by mistake, >>>>>>>> please advise the sender and delete all copies. Security of >>>>>>>> transmission on >>>>>>>> the internet cannot be guaranteed, could be infected, intercepted, or >>>>>>>> corrupted and you should ensure you have suitable antivirus protection >>>>>>>> in >>>>>>>> place. By sending us your or any third party personal details, you >>>>>>>> consent >>>>>>>> to (or confirm you have obtained consent from such third parties) to >>>>>>>> Infomedia’s privacy policy. >>>>>>>> http://www.infomedia.com.au/privacy-policy/ >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>> >>> -- >>> >>>