Hi Xiao, I'm still checking with the Parquet community on this. Since the fix is already +1'd, I'm hoping this won't take long. The delta in parquet-1.12.x branch is also small with just 2 commits so far.
Chao On Tue, Aug 31, 2021 at 12:03 PM Xiao Li <lix...@databricks.com> wrote: > Hi, Chao, > > How long will it take? Normally, in the RC stage, we always revert the > upgrade made in the current release. We did the parquet upgrade multiple > times in the previous releases for avoiding the major delay in our Spark > release > > Thanks, > > Xiao > > > On Tue, Aug 31, 2021 at 11:03 AM Chao Sun <sunc...@apache.org> wrote: > >> The Apache Parquet community found an issue [1] in 1.12.0 which could >> cause incorrect file offset being written and subsequently reading of the >> same file to fail. A fix has been proposed in the same JIRA and we may have >> to wait until a new release is available so that we can upgrade Spark with >> the hot fix. >> >> [1]: https://issues.apache.org/jira/browse/PARQUET-2078 >> >> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen <sro...@gmail.com> wrote: >> >>> Maybe, I'm just confused why it's needed at all. Other profiles that add >>> a dependency seem OK, but something's different here. >>> >>> One thing we can/should change is to simply remove the >>> <dependencyManagement> block in the profile. It should always be a direct >>> dep in Scala 2.13 (which lets us take out the profiles in submodules, which >>> just repeat that) >>> We can also update the version, by the by. >>> >>> I tried this and the resulting POM still doesn't look like what I expect >>> though. >>> >>> (The binary release is OK, FWIW - it gets pulled in as a JAR as expected) >>> >>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy <s...@infomedia.com.au> >>> wrote: >>> >>>> Hi Sean, >>>> >>>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will >>>> help you out here. >>>> >>>> Cheers, >>>> >>>> Steve C >>>> >>>> On 27 Aug 2021, at 12:29 pm, Sean Owen <sro...@gmail.com> wrote: >>>> >>>> OK right, you would have seen a different error otherwise. >>>> >>>> Yes profiles are only a compile-time thing, but they should affect the >>>> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows >>>> scala-parallel-collections as a dependency in the POM as expected (not in a >>>> profile). However I see what you see in the .pom in the release repo, and >>>> in my local repo after building - it's just sitting there as a profile as >>>> if it weren't activated or something. >>>> >>>> I'm confused then, that shouldn't be what happens. I'd say maybe there >>>> is a problem with the release script, but seems to affect a simple local >>>> build. Anyone else more expert in this see the problem, while I try to >>>> debug more? >>>> The binary distro may actually be fine, I'll check; it may even not >>>> matter much for users who generally just treat Spark as a compile-time-only >>>> dependency either. But I can see it would break exactly your case, >>>> something like a self-contained test job. >>>> >>>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy <s...@infomedia.com.au> >>>> wrote: >>>> >>>>> I did indeed. >>>>> >>>>> The generated spark-core_2.13-3.2.0.pom that is created alongside the >>>>> jar file in the local repo contains: >>>>> >>>>> <profile> >>>>> <id>scala-2.13</id> >>>>> <dependencies> >>>>> <dependency> >>>>> <groupId>org.scala-lang.modules</groupId> >>>>> >>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>>>> </dependency> >>>>> </dependencies> >>>>> </profile> >>>>> >>>>> which means this dependency will be missing for unit tests that create >>>>> SparkSessions from library code only, a technique inspired by Spark’s own >>>>> unit tests. >>>>> >>>>> Cheers, >>>>> >>>>> Steve C >>>>> >>>>> On 27 Aug 2021, at 11:33 am, Sean Owen <sro...@gmail.com> wrote: >>>>> >>>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first >>>>> to update POMs. It works fine for me. >>>>> >>>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy < >>>>> s...@infomedia.com.au.invalid> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Being adventurous I have built the RC1 code with: >>>>>> >>>>>> -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver >>>>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2 >>>>>> >>>>>> >>>>>> And then attempted to build my Java based spark application. >>>>>> >>>>>> However, I found a number of our unit tests were failing with: >>>>>> >>>>>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport >>>>>> >>>>>> at >>>>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412) >>>>>> at >>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) >>>>>> at >>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) >>>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789) >>>>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406) >>>>>> at >>>>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698) >>>>>> at >>>>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) >>>>>> … >>>>>> >>>>>> >>>>>> I tracked this down to a missing dependency: >>>>>> >>>>>> <dependency> >>>>>> <groupId>org.scala-lang.modules</groupId> >>>>>> >>>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>>>>> </dependency> >>>>>> >>>>>> >>>>>> which unfortunately appears only in a profile in the pom files >>>>>> associated with the various spark dependencies. >>>>>> >>>>>> As far as I know it is not possible to activate profiles in >>>>>> dependencies in maven builds. >>>>>> >>>>>> Therefore I suspect that right now a Scala 2.13 migration is not >>>>>> quite as seamless as we would like. >>>>>> >>>>>> I stress that this is only an issue for developers that write unit >>>>>> tests for their applications, as the Spark runtime environment will >>>>>> always >>>>>> have the necessary dependencies available to it. >>>>>> >>>>>> (You might consider upgrading the >>>>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 >>>>>> to >>>>>> 1.0.3 though!) >>>>>> >>>>>> Cheers and thanks for the great work! >>>>>> >>>>>> Steve Coy >>>>>> >>>>>> >>>>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang <ltn...@gmail.com> wrote: >>>>>> >>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>> version 3.2.0. >>>>>> >>>>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a >>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>> >>>>>> [ ] +1 Release this package as Apache Spark 3.2.0 >>>>>> [ ] -1 Do not release this package because ... >>>>>> >>>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738454069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=R0QBrNxN%2FYd9HrCrihR5XgRZF7jYRHcq931lLXwhQeQ%3D&reserved=0> >>>>>> >>>>>> The tag to be voted on is v3.2.0-rc1 (commit >>>>>> 6bb3523d8e838bd2082fb90d7f3741339245c044): >>>>>> https://github.com/apache/spark/tree/v3.2.0-rc1 >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc1&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=aDmKWoXWZNsrYv6bLP%2F78rnC8rbhYEbOVoJ3FwQ49yU%3D&reserved=0> >>>>>> >>>>>> The release files, including signatures, digests, etc. can be found >>>>>> at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/ >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6w0zf1lNPWdTeSLOGmUo4yMkDwd6xwC4o7EUkw1n9gI%3D&reserved=0> >>>>>> >>>>>> Signatures used for Spark RCs can be found in this file: >>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=x7XeOjMPwuEqR%2FuXijVjAlwf68MuVInqGhZ9l19eVPI%3D&reserved=0> >>>>>> >>>>>> The staging repository for this release can be found at: >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1388 >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1388&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=DLKn1scc4YOYUNGP51ch4nkxr1lh5nhZIBj0%2BoBSCXo%3D&reserved=0> >>>>>> >>>>>> The documentation corresponding to this release can be found at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/ >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-docs%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=QtfYYwnJlQIHry0TlmQy72y2DYzat1MQmpBQkATw%2BAQ%3D&reserved=0> >>>>>> >>>>>> The list of bug fixes going into 3.2.0 can be found at the following >>>>>> URL: >>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407 >>>>>> >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK%2Fversions%2F12349407&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=cop5XebB3u0dc2rRqe4YvHfCJ2w9yLlhcdaGB7TSTas%3D&reserved=0> >>>>>> >>>>>> This release is using the release script of the tag v3.2.0-rc1. >>>>>> >>>>>> >>>>>> FAQ >>>>>> >>>>>> ========================= >>>>>> How can I help test this release? >>>>>> ========================= >>>>>> If you are a Spark user, you can help us test this release by taking >>>>>> an existing Spark workload and running on this release candidate, >>>>>> then >>>>>> reporting any regressions. >>>>>> >>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>> the current RC and see if anything important breaks, in the >>>>>> Java/Scala >>>>>> you can add the staging repository to your projects resolvers and test >>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>> you don't end up building with a out of date RC going forward). >>>>>> >>>>>> =========================================== >>>>>> What should happen to JIRA tickets still targeting 3.2.0? >>>>>> =========================================== >>>>>> The current list of open tickets targeted at 3.2.0 can be found at: >>>>>> https://issues.apache.org/jira/projects/SPARK >>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=k5gTpGV4JvGRC6gKOXY%2BlaZKAH5NPFM3nDwmRyNDiQA%3D&reserved=0> >>>>>> and >>>>>> search for "Target Version/s" = 3.2.0 >>>>>> >>>>>> Committers should look at those and triage. Extremely important bug >>>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>>> be worked on immediately. Everything else please retarget to an >>>>>> appropriate release. >>>>>> >>>>>> ================== >>>>>> But my bug isn't fixed? >>>>>> ================== >>>>>> In order to make timely releases, we will typically not hold the >>>>>> release unless the bug in question is a regression from the previous >>>>>> release. That being said, if there is something which is a regression >>>>>> that has not been correctly targeted please ping me or a committer to >>>>>> help target the issue. >>>>>> >>>>>> >>>>>> This email contains confidential information of and is the copyright >>>>>> of Infomedia. It must not be forwarded, amended or disclosed without >>>>>> consent of the sender. If you received this message by mistake, please >>>>>> advise the sender and delete all copies. Security of transmission on the >>>>>> internet cannot be guaranteed, could be infected, intercepted, or >>>>>> corrupted >>>>>> and you should ensure you have suitable antivirus protection in place. By >>>>>> sending us your or any third party personal details, you consent to (or >>>>>> confirm you have obtained consent from such third parties) to Infomedia’s >>>>>> privacy policy. http://www.infomedia.com.au/privacy-policy/ >>>>>> >>>>> >>>>> >>>> > > -- > >