Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Martin Grigorov
Congratulations, Yikun!

On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon  wrote:

> Hi all,
>
> The Spark PMC recently added Yikun Jiang as a committer on the project.
> Yikun is the major contributor of the infrastructure and GitHub Actions in
> Apache Spark as well as Kubernates and PySpark.
> He has put a lot of effort into stabilizing and optimizing the builds
> so we all can work together in Apache Spark more
> efficiently and effectively. He's also driving the SPIP for Docker
> official image in Apache Spark as well for users and developers.
> Please join me in welcoming Yikun!
>
>


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Martin Grigorov
+1

On Thu, Sep 22, 2022, 04:42 Hyukjin Kwon  wrote:

> Hi all,
>
> I would like to start a vote for SPIP: "Support Docker Official Image for
> Spark"
>
> The goal of the SPIP is to add Docker Official Image(DOI)
>  to ensure the Spark
> Docker images
> meet the quality standards for Docker images, to provide these Docker
> images for users
> who want to use Apache Spark via Docker image.
>
> Please also refer to:
>
> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support Docker
> Official Image for Spark
> 
> - SPIP doc: SPIP: Support Docker Official Image for Spark
> 
> - JIRA: SPARK-40513 
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>


Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-19 Thread Martin Grigorov
+1

Good idea!

Martin

On Mon, Sep 19, 2022 at 3:16 AM Yikun Jiang  wrote:

> Hi, all
>
> I would like to start the discussion for supporting Docker Official Image
> for Spark.
>
> This SPIP is proposed to add Docker Official Image(DOI)
>  to ensure the Spark
> Docker images meet the quality standards for Docker images, to provide
> these Docker images for users who want to use Apache Spark via Docker image.
>
> There are also several Apache projects that release the Docker Official
> Images ,
> such as: flink , storm
> , solr ,
> zookeeper , httpd
>  (with 50M+ to 1B+ download for each).
> From the huge download statistics, we can see the real demands of users,
> and from the support of other apache projects, we should also be able to do
> it.
>
> After support:
>
>-
>
>The Dockerfile will still be maintained by the Apache Spark community
>and reviewed by Docker.
>-
>
>The images will be maintained by the Docker community to ensure the
>quality standards for Docker images of the Docker community.
>
>
> It will also reduce the extra docker images maintenance effort (such as
> frequently rebuilding, image security update) of the Apache Spark community.
>
> See more in SPIP DOC:
> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>
> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>
> Regards,
> Yikun
>


Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-14 Thread Martin Grigorov
Hi,

[X] +1 Release this package as Apache Spark 3.3.0

Tested:
- make local distribution from sources (with ./dev/make-distribution.sh
--tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
- create a Docker image (with JDK 11)
- run Pi example on
-- local
-- Kubernetes with default scheduler
-- Kubernetes with Volcano scheduler

On both Linux x86_64 and aarch64 !

Regards,
Martin

On Fri, Jun 10, 2022 at 7:28 AM Maxim Gekk
 wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time June 14th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc6 (commit
> f74867bddfbcdd4d08076db36851e88b15e66556):
> https://github.com/apache/spark/tree/v3.3.0-rc6
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1407
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc6.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-14 Thread Martin Grigorov
Hi Pralabh,

The Dockerfile defines and ARG for the JDK version:
https://github.com/apache/spark/blob/861df43e8d022f51727e0a12a7cca5e119e3c4cc/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L17
That means you could use --build-arg to overwrite it when building the
image. See
https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg

On Tue, Jun 14, 2022 at 7:31 AM Pralabh Kumar 
wrote:

> Hi Steve / Dev team
>
> Thx for the help . Have a quick question ,  How can we fix the above error
> in Hadoop 3.1 .
>
>- Spark docker file have (Java 11)
>
> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>
>- Now if we build Spark32  , Spark image will be having Java 11 .  If
>we run on a Hadoop version less than 3.2 , it will throw an exception.
>
>
>
>- Should there be a separate docker file for Spark32 for Java 8 for
>Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>fine in our environment (with Hadoop3.1)
>
>
> Regards
> Pralabh Kumar
>
>
>
> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran 
> wrote:
>
>>
>>
>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
>> wrote:
>>
>>> Hi Dev team
>>>
>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>>> the following error . The same error is mentioned in Spark docs
>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>> arrow.
>>>
>>>
>>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>based on Hadoop 2.10 which is not java 11 compatible.
>>>
>>>
>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>
>>
>> Please let me know if this understanding is correct and is there a way to
>>> fix it.
>>>
>>
>>
>>
>> upgrade to a version of hadoop with the fix. That's any version >= hadoop
>> 3.2.0 which shipped since 2018
>>
>>>
>>>
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>
>>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>>
>>> at
>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>
>>> at
>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>
>>> at
>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>
>>> at
>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>
>>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>
>>> at scala.util.Success.map(Try.scala:213)
>>>
>>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>
>>> at
>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>> Source)
>>>
>>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>
>>


Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-07 Thread Martin Grigorov
Hi,

[X] +1 Release this package as Apache Spark 3.3.0

Tested:
- make local distribution from sources (with ./dev/make-distribution.sh
--tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
- create a Docker image (with JDK 11)
- run Pi example on
-- local
-- Kubernetes with default scheduler
-- Kubernetes with Volcano scheduler

On both x86_64 and aarch64 !

Regards,
Martin

On Sat, Jun 4, 2022 at 5:50 PM Maxim Gekk 
wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time June 8th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc5 (commit
> 7cf29705272ab8e8c70e8885a3664ad8ae3cd5e9):
> https://github.com/apache/spark/tree/v3.3.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1406
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc5.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Martin Grigorov
Hi,

[X] +1 Release this package as Apache Spark 3.3.0

Tested:
- make local distribution from sources (with ./dev/make-distribution.sh
--tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
- create a Docker image (with JDK 11)
- run Pi example on
-- local
-- Kubernetes with default scheduler
-- Kubernetes with Volcano scheduler

On both x86_64 and aarch64 !

Regards,
Martin


On Mon, May 16, 2022 at 3:44 PM Maxim Gekk
 wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time May 19th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc2 (commit
> c8c657b922ac8fd8dcf9553113e11a80079db059):
> https://github.com/apache/spark/tree/v3.3.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1403
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


Re: CVE-2020-13936

2022-05-05 Thread Martin Grigorov
Hi,

On Thu, May 5, 2022 at 8:44 PM Sean Owen  wrote:

> This is a Velocity issue. Spark doesn't use it, although it looks like
> Avro does. From reading the CVE, I do not believe it would impact Avro's
> usage - velocity templates it may use for codegen aren't exposed that I
> know of. Is there a known relationship to Spark here? That is the key
> question in security questions like this.
>
> In any event, to pursue an update, it would likely have to start by
> updating Avro if it hasn't already, and if it has, pursue upgrading Avro in
> Spark -- if the supported Hadoop versions work with it.
>

Avro uses Velocity 2.3 since v 1.11 (
https://github.com/apache/avro/commit/8824d6577368cf29b867efcd331151259c24e7b0
)
Spark 3.3.0 will use Avro 1.11 (
https://github.com/apache/spark/commit/132548116a0842c3db6abc99bc8298d504624abd
)

For earlier versions of Spark you will need to update Velocity in your
Maven/Sbt/Gradle/... config.



>
> On Thu, May 5, 2022 at 12:32 PM Pralabh Kumar 
> wrote:
>
>> Hi Dev Team
>>
>> Please let me know if  there is a jira to track this CVE changes with
>> respect to Spark  . Searched jira but couldn't find anything.
>>
>> Please help
>>
>> Regards
>> Pralabh Kumar
>>
>


Re: Is spark fair scheduler is for kubernete?

2022-04-11 Thread Martin Grigorov
Hi,

On Mon, Apr 11, 2022 at 7:43 AM Jason Jun  wrote:

> the official doc, https://spark.apache.org/docs/latest/job-scheduling.html,
> didn't mention  that its working for kubernete cluster?
>

You could use Volcano scheduler for more advanced setups on Kubernetes.
Here is an article explaining how to make use of the fresh integration
between Spark and Volcano in 3.3 (not yet released!) -
https://martin-grigorov.medium.com/native-integration-between-apache-spark-and-volcano-kubernetes-scheduler-488f54dbbab3

Regards,
Martin


>
> Can anyone quickly answer this?
>
> TIA.
> Jason
>


Re: Spark on K8s , some applications ended ungracefully

2022-04-01 Thread Martin Grigorov
Hi,

On Thu, Mar 31, 2022 at 4:18 PM Pralabh Kumar 
wrote:

> Hi Spark Team
>
> Some of my spark applications on K8s ended with the below error . These
> applications though completed successfully (as per the event log
> SparkListenerApplicationEnd event at the end)
> stil have even files with .inprogress. This causes the application to be
> shown as inprogress in SHS.
>
> Spark v : 3.0.1
>

I'd suggest you to try with newer version, e.g. 3.2.1 or even one built
from branch-3.3.



>
>
>
> 22/03/31 08:33:34 WARN ShutdownHookManager: ShutdownHook '$anon$2'
> timeout, java.util.concurrent.TimeoutException
>
> java.util.concurrent.TimeoutException
>
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>
> at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
>
> 22/03/31 08:33:34 WARN SparkContext: Ignoring Exception while stopping
> SparkContext from shutdown hook
>
> java.lang.InterruptedException
>
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
>
> at
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
>
> at
> org.apache.spark.util.ThreadUtils$.shutdown(ThreadUtils.scala:348)
>
>
>
>
>
> Please let me know if there is a solution for it ..
>
> Regards
>
> Pralabh Kumar
>


Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
I've found the problem!
It was indeed a local thingy!

$ cat ~/.mavenrc
MAVEN_OPTS='-XX:+TieredCompilation -XX:TieredStopAtLevel=1'

I've added this some time ago. It optimizes the build time. But it seems it
also overrides the env var MAVEN_OPTS...

Now it fails with:

[INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @
spark-catalyst_2.12 ---
[INFO] Using incremental compilation using Mixed compile order
[INFO] Compiler bridge file:
/home/martin/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.12-1.3.1-bin_2.12.15__52.0-1.3.1_20191012T045515.jar
[INFO] compiler plugin:
BasicArtifact(com.github.ghik,silencer-plugin_2.12.15,1.7.6,null)
[INFO] Compiling 372 Scala sources and 171 Java sources to
/home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes ...

[ERROR] [Error] : error writing
/home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes/org/apache/spark/sql/catalyst/analysis/Analyzer$ResolveGroupingAnalytics$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGroupingAnalytics$$replaceGroupingFunc$1.class:
java.nio.file.FileSystemException
/home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes/org/apache/spark/sql/catalyst/analysis/Analyzer$ResolveGroupingAnalytics$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGroupingAnalytics$$replaceGroupingFunc$1.class:
File name too long
but this is well documented:
https://spark.apache.org/docs/latest/building-spark.html#encrypted-filesystems

All works now!
Thank you, Sean!


On Thu, Feb 10, 2022 at 10:13 PM Sean Owen  wrote:

> I think it's another occurrence that I had to change or had to set
> MAVEN_OPTS. I think this occurs in a way that this setting doesn't affect,
> though I don't quite understand it. Try the stack size in test runner
> configs
>
> On Thu, Feb 10, 2022, 2:02 PM Martin Grigorov 
> wrote:
>
>> Hi Sean,
>>
>> On Thu, Feb 10, 2022 at 5:37 PM Sean Owen  wrote:
>>
>>> Yes I've seen this; the JVM stack size needs to be increased. I'm not
>>> sure if it's env specific (though you and I at least have hit it, I think
>>> others), or whether we need to change our build script.
>>> In the pom.xml file, find "-Xss..." settings and make them something
>>> like "-Xss4m", see if that works.
>>>
>>
>> It is already a much bigger value - 128m (
>> https://github.com/apache/spark/blob/50256bde9bdf217413545a6d2945d6c61bf4cfff/pom.xml#L2845
>> )
>> I've tried smaller and bigger values for all jvmArgs next to this one.
>> None helped!
>> I also have the feeling it is something in my environment that overrides
>> these values but so far I cannot identify anything.
>>
>>
>>
>>>
>>> On Thu, Feb 10, 2022 at 8:54 AM Martin Grigorov 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am not able to build Spark due to the following error :
>>>>
>>>> ERROR] ## Exception when compiling 543 sources to
>>>> /home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes
>>>> java.lang.BootstrapMethodError: call site initialization exception
>>>> java.lang.invoke.CallSite.makeSite(CallSite.java:341)
>>>>
>>>> java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
>>>>
>>>> java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
>>>> scala.tools.nsc.typechecker.Typers$Typer.typedBlock(Typers.scala:2504)
>>>>
>>>> scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$103(Typers.scala:5711)
>>>>
>>>> scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1(Typers.scala:500)
>>>> scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5746)
>>>> scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5781)
>>>> ...
>>>> Caused by: java.lang.StackOverflowError
>>>> at java.lang.ref.Reference. (Reference.java:303)
>>>> at java.lang.ref.WeakReference. (WeakReference.java:57)
>>>> at
>>>> java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry.
>>>> (MethodType.java:1269)
>>>> at java.lang.invoke.MethodType$ConcurrentWeakInternSet.get
>>>> (MethodType.java:1216)
>>>> at java.lang.invoke.MethodType.makeImpl (MethodType.java:302)
>>>> at java.lang.invoke.MethodType.dropParameterTypes
>>>> (MethodType.java:573)
>>>> at java.lang.invoke.MethodType.replaceParameterTypes
>>>> (MethodType.java:467)
>>>> at java.lang.invoke.MethodHandle.asSpreader (MethodHandle.java:875)

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
Hi Sean,

On Thu, Feb 10, 2022 at 5:37 PM Sean Owen  wrote:

> Yes I've seen this; the JVM stack size needs to be increased. I'm not sure
> if it's env specific (though you and I at least have hit it, I think
> others), or whether we need to change our build script.
> In the pom.xml file, find "-Xss..." settings and make them something like
> "-Xss4m", see if that works.
>

It is already a much bigger value - 128m (
https://github.com/apache/spark/blob/50256bde9bdf217413545a6d2945d6c61bf4cfff/pom.xml#L2845
)
I've tried smaller and bigger values for all jvmArgs next to this one. None
helped!
I also have the feeling it is something in my environment that overrides
these values but so far I cannot identify anything.



>
> On Thu, Feb 10, 2022 at 8:54 AM Martin Grigorov 
> wrote:
>
>> Hi,
>>
>> I am not able to build Spark due to the following error :
>>
>> ERROR] ## Exception when compiling 543 sources to
>> /home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes
>> java.lang.BootstrapMethodError: call site initialization exception
>> java.lang.invoke.CallSite.makeSite(CallSite.java:341)
>>
>> java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
>>
>> java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
>> scala.tools.nsc.typechecker.Typers$Typer.typedBlock(Typers.scala:2504)
>>
>> scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$103(Typers.scala:5711)
>>
>> scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1(Typers.scala:500)
>> scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5746)
>> scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5781)
>> ...
>> Caused by: java.lang.StackOverflowError
>> at java.lang.ref.Reference. (Reference.java:303)
>> at java.lang.ref.WeakReference. (WeakReference.java:57)
>> at
>> java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry.
>> (MethodType.java:1269)
>> at java.lang.invoke.MethodType$ConcurrentWeakInternSet.get
>> (MethodType.java:1216)
>> at java.lang.invoke.MethodType.makeImpl (MethodType.java:302)
>> at java.lang.invoke.MethodType.dropParameterTypes
>> (MethodType.java:573)
>> at java.lang.invoke.MethodType.replaceParameterTypes
>> (MethodType.java:467)
>> at java.lang.invoke.MethodHandle.asSpreader (MethodHandle.java:875)
>> at java.lang.invoke.Invokers.spreadInvoker (Invokers.java:158)
>> at java.lang.invoke.CallSite.makeSite (CallSite.java:324)
>> at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl
>> (MethodHandleNatives.java:307)
>> at java.lang.invoke.MethodHandleNatives.linkCallSite
>> (MethodHandleNatives.java:297)
>> at scala.tools.nsc.typechecker.Typers$Typer.typedBlock
>> (Typers.scala:2504)
>> at scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$103
>> (Typers.scala:5711)
>> at scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1
>> (Typers.scala:500)
>> at scala.tools.nsc.typechecker.Typers$Typer.typed1 (Typers.scala:5746)
>> at scala.tools.nsc.typechecker.Typers$Typer.typed (Typers.scala:5781)
>>
>> I have played a lot with the scala-maven-plugin jvmArg settings at [1]
>> but so far nothing helps.
>> Same error for Scala 2.12 and 2.13.
>>
>> The command I use is: ./build/mvn install -Pkubernetes -DskipTests
>>
>> I need to create a distribution from master branch.
>>
>> Java: 1.8.0_312
>> Maven: 3.8.4
>> OS: Ubuntu 21.10
>>
>> Any hints ?
>> Thank you!
>>
>> 1.
>> https://github.com/apache/spark/blob/50256bde9bdf217413545a6d2945d6c61bf4cfff/pom.xml#L2845-L2849
>>
>


Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
Hi,

I am not able to build Spark due to the following error :

ERROR] ## Exception when compiling 543 sources to
/home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes
java.lang.BootstrapMethodError: call site initialization exception
java.lang.invoke.CallSite.makeSite(CallSite.java:341)
java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
scala.tools.nsc.typechecker.Typers$Typer.typedBlock(Typers.scala:2504)
scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$103(Typers.scala:5711)
scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1(Typers.scala:500)
scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5746)
scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5781)
...
Caused by: java.lang.StackOverflowError
at java.lang.ref.Reference. (Reference.java:303)
at java.lang.ref.WeakReference. (WeakReference.java:57)
at java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry.
(MethodType.java:1269)
at java.lang.invoke.MethodType$ConcurrentWeakInternSet.get
(MethodType.java:1216)
at java.lang.invoke.MethodType.makeImpl (MethodType.java:302)
at java.lang.invoke.MethodType.dropParameterTypes (MethodType.java:573)
at java.lang.invoke.MethodType.replaceParameterTypes
(MethodType.java:467)
at java.lang.invoke.MethodHandle.asSpreader (MethodHandle.java:875)
at java.lang.invoke.Invokers.spreadInvoker (Invokers.java:158)
at java.lang.invoke.CallSite.makeSite (CallSite.java:324)
at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl
(MethodHandleNatives.java:307)
at java.lang.invoke.MethodHandleNatives.linkCallSite
(MethodHandleNatives.java:297)
at scala.tools.nsc.typechecker.Typers$Typer.typedBlock
(Typers.scala:2504)
at scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$103
(Typers.scala:5711)
at scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1
(Typers.scala:500)
at scala.tools.nsc.typechecker.Typers$Typer.typed1 (Typers.scala:5746)
at scala.tools.nsc.typechecker.Typers$Typer.typed (Typers.scala:5781)

I have played a lot with the scala-maven-plugin jvmArg settings at [1] but
so far nothing helps.
Same error for Scala 2.12 and 2.13.

The command I use is: ./build/mvn install -Pkubernetes -DskipTests

I need to create a distribution from master branch.

Java: 1.8.0_312
Maven: 3.8.4
OS: Ubuntu 21.10

Any hints ?
Thank you!

1.
https://github.com/apache/spark/blob/50256bde9bdf217413545a6d2945d6c61bf4cfff/pom.xml#L2845-L2849


Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Martin Grigorov
On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon  wrote:

> Hi Greg,
>
> I raised this thread to figure out a way that we can work together to
> resolve this issue, gather feedback, and to understand how other projects
> work around.
> Several projects I observed, as far as I can tell, have made enough efforts
> to save the resources in GitHub Actions but still suffer from the lack of
> resources.
>

And it will get even worse because:
1) more and more Apache projects migrate from TravisCI to Github Actions
(GA)
2) new projects join ASF and many of them already use GA


What was your reason to migrate from Apache Jenkins to Github Actions ?
If you want dedicated resources then you will need to manage the CI
yourself.
You could use Apache Jenkins/Buildbot with dedicated agents for your
project.
Or you could set up your own CI infrastructure with Jenkins, DroneIO,
ConcourceCI, ...

Yet another option is to move to CircleCI or Cirrus. They are similar to
TravisCI / GA and less crowded (for now).

Martin

I appreciate the resources provided to us but that does not resolve the
> issue of the development being slowed down.
>
>
> 2021년 4월 7일 (수) 오후 5:52, Greg Stein 님이 작성:
>
> > On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon 
> wrote:
> >
> >> Hi all,
> >>
> >> I am an Apache Spark PMC,
> >
> >
> > You are a member of the Apache Spark PMC. You are *not* a PMC. Please
> stop
> > with that terminology. The Foundation has about 200 PMCs, and you are a
> > member of one of them. You are NOT a "PMC" .. you're a person. A PMC is a
> > construct of the Foundation.
> >
> > >...
> >
> >> I am aware of the limited GitHub Actions resources that are shared
> >> across all projects in ASF,
> >> and many projects suffer from it. This issue significantly slows down
> the
> >> development cycle of
> >>  other projects, at least Apache Spark.
> >>
> >
> > And the Foundation gets those build minutes for GitHub Actions provided
> to
> > us from GitHub and Microsoft, and we are thankful that they provide them
> to
> > the Foundation. Maybe it isn't all the build minutes that every group
> > wants, but that is what we have. So it is incumbent upon all of us to
> > figure out how to build more, with fewer minutes.
> >
> > Say "thank you" to GitHub, please.
> >
> > Regards,
> > -g
> >
> >
>