The Apache Parquet community found an issue [1] in 1.12.0 which could cause incorrect file offset being written and subsequently reading of the same file to fail. A fix has been proposed in the same JIRA and we may have to wait until a new release is available so that we can upgrade Spark with the hot fix.
[1]: https://issues.apache.org/jira/browse/PARQUET-2078 On Fri, Aug 27, 2021 at 7:06 AM Sean Owen <sro...@gmail.com> wrote: > Maybe, I'm just confused why it's needed at all. Other profiles that add a > dependency seem OK, but something's different here. > > One thing we can/should change is to simply remove the > <dependencyManagement> block in the profile. It should always be a direct > dep in Scala 2.13 (which lets us take out the profiles in submodules, which > just repeat that) > We can also update the version, by the by. > > I tried this and the resulting POM still doesn't look like what I expect > though. > > (The binary release is OK, FWIW - it gets pulled in as a JAR as expected) > > On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy <s...@infomedia.com.au> > wrote: > >> Hi Sean, >> >> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will >> help you out here. >> >> Cheers, >> >> Steve C >> >> On 27 Aug 2021, at 12:29 pm, Sean Owen <sro...@gmail.com> wrote: >> >> OK right, you would have seen a different error otherwise. >> >> Yes profiles are only a compile-time thing, but they should affect the >> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows >> scala-parallel-collections as a dependency in the POM as expected (not in a >> profile). However I see what you see in the .pom in the release repo, and >> in my local repo after building - it's just sitting there as a profile as >> if it weren't activated or something. >> >> I'm confused then, that shouldn't be what happens. I'd say maybe there is >> a problem with the release script, but seems to affect a simple local >> build. Anyone else more expert in this see the problem, while I try to >> debug more? >> The binary distro may actually be fine, I'll check; it may even not >> matter much for users who generally just treat Spark as a compile-time-only >> dependency either. But I can see it would break exactly your case, >> something like a self-contained test job. >> >> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy <s...@infomedia.com.au> >> wrote: >> >>> I did indeed. >>> >>> The generated spark-core_2.13-3.2.0.pom that is created alongside the >>> jar file in the local repo contains: >>> >>> <profile> >>> <id>scala-2.13</id> >>> <dependencies> >>> <dependency> >>> <groupId>org.scala-lang.modules</groupId> >>> >>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>> </dependency> >>> </dependencies> >>> </profile> >>> >>> which means this dependency will be missing for unit tests that create >>> SparkSessions from library code only, a technique inspired by Spark’s own >>> unit tests. >>> >>> Cheers, >>> >>> Steve C >>> >>> On 27 Aug 2021, at 11:33 am, Sean Owen <sro...@gmail.com> wrote: >>> >>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first >>> to update POMs. It works fine for me. >>> >>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy < >>> s...@infomedia.com.au.invalid> wrote: >>> >>>> Hi all, >>>> >>>> Being adventurous I have built the RC1 code with: >>>> >>>> -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver >>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2 >>>> >>>> >>>> And then attempted to build my Java based spark application. >>>> >>>> However, I found a number of our unit tests were failing with: >>>> >>>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport >>>> >>>> at >>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) >>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789) >>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406) >>>> at >>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698) >>>> at >>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) >>>> … >>>> >>>> >>>> I tracked this down to a missing dependency: >>>> >>>> <dependency> >>>> <groupId>org.scala-lang.modules</groupId> >>>> >>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> >>>> </dependency> >>>> >>>> >>>> which unfortunately appears only in a profile in the pom files >>>> associated with the various spark dependencies. >>>> >>>> As far as I know it is not possible to activate profiles in >>>> dependencies in maven builds. >>>> >>>> Therefore I suspect that right now a Scala 2.13 migration is not quite >>>> as seamless as we would like. >>>> >>>> I stress that this is only an issue for developers that write unit >>>> tests for their applications, as the Spark runtime environment will always >>>> have the necessary dependencies available to it. >>>> >>>> (You might consider upgrading the >>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to >>>> 1.0.3 though!) >>>> >>>> Cheers and thanks for the great work! >>>> >>>> Steve Coy >>>> >>>> >>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang <ltn...@gmail.com> wrote: >>>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 3.2.0. >>>> >>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a >>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>> >>>> [ ] +1 Release this package as Apache Spark 3.2.0 >>>> [ ] -1 Do not release this package because ... >>>> >>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738454069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=R0QBrNxN%2FYd9HrCrihR5XgRZF7jYRHcq931lLXwhQeQ%3D&reserved=0> >>>> >>>> The tag to be voted on is v3.2.0-rc1 (commit >>>> 6bb3523d8e838bd2082fb90d7f3741339245c044): >>>> https://github.com/apache/spark/tree/v3.2.0-rc1 >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc1&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=aDmKWoXWZNsrYv6bLP%2F78rnC8rbhYEbOVoJ3FwQ49yU%3D&reserved=0> >>>> >>>> The release files, including signatures, digests, etc. can be found at: >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/ >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6w0zf1lNPWdTeSLOGmUo4yMkDwd6xwC4o7EUkw1n9gI%3D&reserved=0> >>>> >>>> Signatures used for Spark RCs can be found in this file: >>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=x7XeOjMPwuEqR%2FuXijVjAlwf68MuVInqGhZ9l19eVPI%3D&reserved=0> >>>> >>>> The staging repository for this release can be found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1388 >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1388&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=DLKn1scc4YOYUNGP51ch4nkxr1lh5nhZIBj0%2BoBSCXo%3D&reserved=0> >>>> >>>> The documentation corresponding to this release can be found at: >>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/ >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-docs%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=QtfYYwnJlQIHry0TlmQy72y2DYzat1MQmpBQkATw%2BAQ%3D&reserved=0> >>>> >>>> The list of bug fixes going into 3.2.0 can be found at the following >>>> URL: >>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407 >>>> >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK%2Fversions%2F12349407&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=cop5XebB3u0dc2rRqe4YvHfCJ2w9yLlhcdaGB7TSTas%3D&reserved=0> >>>> >>>> This release is using the release script of the tag v3.2.0-rc1. >>>> >>>> >>>> FAQ >>>> >>>> ========================= >>>> How can I help test this release? >>>> ========================= >>>> If you are a Spark user, you can help us test this release by taking >>>> an existing Spark workload and running on this release candidate, then >>>> reporting any regressions. >>>> >>>> If you're working in PySpark you can set up a virtual env and install >>>> the current RC and see if anything important breaks, in the Java/Scala >>>> you can add the staging repository to your projects resolvers and test >>>> with the RC (make sure to clean up the artifact cache before/after so >>>> you don't end up building with a out of date RC going forward). >>>> >>>> =========================================== >>>> What should happen to JIRA tickets still targeting 3.2.0? >>>> =========================================== >>>> The current list of open tickets targeted at 3.2.0 can be found at: >>>> https://issues.apache.org/jira/projects/SPARK >>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=k5gTpGV4JvGRC6gKOXY%2BlaZKAH5NPFM3nDwmRyNDiQA%3D&reserved=0> >>>> and >>>> search for "Target Version/s" = 3.2.0 >>>> >>>> Committers should look at those and triage. Extremely important bug >>>> fixes, documentation, and API tweaks that impact compatibility should >>>> be worked on immediately. Everything else please retarget to an >>>> appropriate release. >>>> >>>> ================== >>>> But my bug isn't fixed? >>>> ================== >>>> In order to make timely releases, we will typically not hold the >>>> release unless the bug in question is a regression from the previous >>>> release. That being said, if there is something which is a regression >>>> that has not been correctly targeted please ping me or a committer to >>>> help target the issue. >>>> >>>> >>>> This email contains confidential information of and is the copyright of >>>> Infomedia. It must not be forwarded, amended or disclosed without consent >>>> of the sender. If you received this message by mistake, please advise the >>>> sender and delete all copies. Security of transmission on the internet >>>> cannot be guaranteed, could be infected, intercepted, or corrupted and you >>>> should ensure you have suitable antivirus protection in place. By sending >>>> us your or any third party personal details, you consent to (or confirm you >>>> have obtained consent from such third parties) to Infomedia’s privacy >>>> policy. http://www.infomedia.com.au/privacy-policy/ >>>> >>> >>> >>