The Apache Parquet community found an issue [1] in 1.12.0 which could cause
incorrect file offset being written and subsequently reading of the same
file to fail. A fix has been proposed in the same JIRA and we may have to
wait until a new release is available so that we can upgrade Spark with the
hot fix.

[1]: https://issues.apache.org/jira/browse/PARQUET-2078

On Fri, Aug 27, 2021 at 7:06 AM Sean Owen <sro...@gmail.com> wrote:

> Maybe, I'm just confused why it's needed at all. Other profiles that add a
> dependency seem OK, but something's different here.
>
> One thing we can/should change is to simply remove the
> <dependencyManagement> block in the profile. It should always be a direct
> dep in Scala 2.13 (which lets us take out the profiles in submodules, which
> just repeat that)
> We can also update the version, by the by.
>
> I tried this and the resulting POM still doesn't look like what I expect
> though.
>
> (The binary release is OK, FWIW - it gets pulled in as a JAR as expected)
>
> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy <s...@infomedia.com.au>
> wrote:
>
>> Hi Sean,
>>
>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
>> help you out here.
>>
>> Cheers,
>>
>> Steve C
>>
>> On 27 Aug 2021, at 12:29 pm, Sean Owen <sro...@gmail.com> wrote:
>>
>> OK right, you would have seen a different error otherwise.
>>
>> Yes profiles are only a compile-time thing, but they should affect the
>> effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom shows
>> scala-parallel-collections as a dependency in the POM as expected (not in a
>> profile). However I see what you see in the .pom in the release repo, and
>> in my local repo after building - it's just sitting there as a profile as
>> if it weren't activated or something.
>>
>> I'm confused then, that shouldn't be what happens. I'd say maybe there is
>> a problem with the release script, but seems to affect a simple local
>> build. Anyone else more expert in this see the problem, while I try to
>> debug more?
>> The binary distro may actually be fine, I'll check; it may even not
>> matter much for users who generally just treat Spark as a compile-time-only
>> dependency either. But I can see it would break exactly your case,
>> something like a self-contained test job.
>>
>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy <s...@infomedia.com.au>
>> wrote:
>>
>>> I did indeed.
>>>
>>> The generated spark-core_2.13-3.2.0.pom that is created alongside the
>>> jar file in the local repo contains:
>>>
>>> <profile>
>>>   <id>scala-2.13</id>
>>>   <dependencies>
>>>     <dependency>
>>>       <groupId>org.scala-lang.modules</groupId>
>>>
>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
>>>     </dependency>
>>>   </dependencies>
>>> </profile>
>>>
>>> which means this dependency will be missing for unit tests that create
>>> SparkSessions from library code only, a technique inspired by Spark’s own
>>> unit tests.
>>>
>>> Cheers,
>>>
>>> Steve C
>>>
>>> On 27 Aug 2021, at 11:33 am, Sean Owen <sro...@gmail.com> wrote:
>>>
>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required first
>>> to update POMs. It works fine for me.
>>>
>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>>> s...@infomedia.com.au.invalid> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Being adventurous I have built the RC1 code with:
>>>>
>>>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>>>
>>>>
>>>> And then attempted to build my Java based spark application.
>>>>
>>>> However, I found a number of our unit tests were failing with:
>>>>
>>>> java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
>>>>
>>>> at
>>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>>>> at
>>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>>>>         …
>>>>
>>>>
>>>> I tracked this down to a missing dependency:
>>>>
>>>> <dependency>
>>>>   <groupId>org.scala-lang.modules</groupId>
>>>>
>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
>>>> </dependency>
>>>>
>>>>
>>>> which unfortunately appears only in a profile in the pom files
>>>> associated with the various spark dependencies.
>>>>
>>>> As far as I know it is not possible to activate profiles in
>>>> dependencies in maven builds.
>>>>
>>>> Therefore I suspect that right now a Scala 2.13 migration is not quite
>>>> as seamless as we would like.
>>>>
>>>> I stress that this is only an issue for developers that write unit
>>>> tests for their applications, as the Spark runtime environment will always
>>>> have the necessary dependencies available to it.
>>>>
>>>> (You might consider upgrading the
>>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
>>>> 1.0.3 though!)
>>>>
>>>> Cheers and thanks for the great work!
>>>>
>>>> Steve Coy
>>>>
>>>>
>>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang <ltn...@gmail.com> wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>>  version 3.2.0.
>>>>
>>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738454069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=R0QBrNxN%2FYd9HrCrihR5XgRZF7jYRHcq931lLXwhQeQ%3D&reserved=0>
>>>>
>>>> The tag to be voted on is v3.2.0-rc1 (commit
>>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc1&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=aDmKWoXWZNsrYv6bLP%2F78rnC8rbhYEbOVoJ3FwQ49yU%3D&reserved=0>
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6w0zf1lNPWdTeSLOGmUo4yMkDwd6xwC4o7EUkw1n9gI%3D&reserved=0>
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=x7XeOjMPwuEqR%2FuXijVjAlwf68MuVInqGhZ9l19eVPI%3D&reserved=0>
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1388&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=DLKn1scc4YOYUNGP51ch4nkxr1lh5nhZIBj0%2BoBSCXo%3D&reserved=0>
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-docs%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=QtfYYwnJlQIHry0TlmQy72y2DYzat1MQmpBQkATw%2BAQ%3D&reserved=0>
>>>>
>>>> The list of bug fixes going into 3.2.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK%2Fversions%2F12349407&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=cop5XebB3u0dc2rRqe4YvHfCJ2w9yLlhcdaGB7TSTas%3D&reserved=0>
>>>>
>>>> This release is using the release script of the tag v3.2.0-rc1.
>>>>
>>>>
>>>> FAQ
>>>>
>>>> =========================
>>>> How can I help test this release?
>>>> =========================
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with a out of date RC going forward).
>>>>
>>>> ===========================================
>>>> What should happen to JIRA tickets still targeting 3.2.0?
>>>> ===========================================
>>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK
>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=k5gTpGV4JvGRC6gKOXY%2BlaZKAH5NPFM3nDwmRyNDiQA%3D&reserved=0>
>>>>  and
>>>> search for "Target Version/s" = 3.2.0
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==================
>>>> But my bug isn't fixed?
>>>> ==================
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>>
>>>> This email contains confidential information of and is the copyright of
>>>> Infomedia. It must not be forwarded, amended or disclosed without consent
>>>> of the sender. If you received this message by mistake, please advise the
>>>> sender and delete all copies. Security of transmission on the internet
>>>> cannot be guaranteed, could be infected, intercepted, or corrupted and you
>>>> should ensure you have suitable antivirus protection in place. By sending
>>>> us your or any third party personal details, you consent to (or confirm you
>>>> have obtained consent from such third parties) to Infomedia’s privacy
>>>> policy. http://www.infomedia.com.au/privacy-policy/
>>>>
>>>
>>>
>>

Reply via email to