Re: Spark master build hangs using parallel build option in maven

2020-01-18 Thread Sean Owen
I think we can remove that note from the README then, I'll do that.

On Sat, Jan 18, 2020 at 1:38 AM Dongjoon Hyun  wrote:
>
> Hi, Saurabh.
>
> It seems that you are hitting 
> https://issues.apache.org/jira/browse/SPARK-26095 .
>
> And, we disabled the parallel build via 
> https://github.com/apache/spark/pull/23061 at 3.0.0.
>
> According to the stack trace in JIRA and PR description, `maven-shade-plugin` 
> seems to be the root cause.
>
> For now, I'd like to recommend you to disable it because `Maven` itself warns 
> you already. (You know that, right?)
>
>> [INFO] [ pom 
>> ]-
>> [WARNING] *
>> [WARNING] * Your build is requesting parallel execution, but project  *
>> [WARNING] * contains the following plugin(s) that have goals not marked   *
>> [WARNING] * as @threadSafe to support parallel building.  *
>> [WARNING] * While this /may/ work fine, please look for plugin updates*
>> [WARNING] * and/or request plugins be made thread-safe.   *
>> [WARNING] * If reporting an issue, report it against the plugin in*
>> [WARNING] * question, not against maven-core  *
>> [WARNING] *
>> [WARNING] The following plugins are not marked @threadSafe in Spark Project 
>> Parent POM:
>> [WARNING] org.scalatest:scalatest-maven-plugin:2.0.0
>> [WARNING] Enable debug to see more precisely which goals are not marked 
>> @threadSafe.
>> [WARNING] *
>
>
> I respect `Maven` warnings.
>
> Bests,
> Dongjoon.
>
>
> On Fri, Jan 17, 2020 at 9:22 PM Saurabh Chawla  wrote:
>>
>> Hi Sean,
>>
>> Thanks for checking this.
>>
>> I am able to see parallel build info in the readme file 
>> https://github.com/apache/spark#building-spark
>>
>> "
>> You can build Spark using more than one thread by using the -T option with 
>> Maven, see "Parallel builds in Maven 3". More detailed documentation is 
>> available from the project site, at "Building Spark".
>> "
>>
>> This used to work while building older version of spark(2.4.3, 2.3.2 etc).
>> build/mvn -Duse.zinc.server=false -DuseZincForJdk8=false 
>> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver 
>> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 
>> -DskipTests=true -T 4 clean package
>>
>> Also I have seen the maven version is changed from 3.5.4 to 3.6.3 in master 
>> branch compared to spark 2.4.3.
>> Not sure if it's due to some bug in maven version used in master or some new 
>> change added in the master branch that prevent the parallel build.
>>
>> Regards
>> Saurabh Chawla
>>
>>
>> On Sat, Jan 18, 2020 at 2:19 AM Sean Owen  wrote:
>>>
>>> I don't believe you can use a parallel build indeed. Some things
>>> collide with each other. Some of the suites are run in parallel inside
>>> the build though already.
>>>
>>> On Fri, Jan 17, 2020 at 1:23 PM Saurabh Chawla  
>>> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > Spark master build hangs using parallel build option in maven. On running 
>>> > build the sequentially on spark master using maven, build did not hang. 
>>> > This issue occurs on giving hadoop-provided (-Phadoop-provided 
>>> > -Dhadoop.version=2.8.5) option. Same command works fine to build 
>>> > spark-2.4.3 parallelly
>>> >
>>> > Command to build spark master sequentially - Spark build works fine
>>> > build/mvn  -Duse.zinc.server=false -DuseZincForJdk8=false 
>>> > -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver 
>>> > -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 
>>> > -DskipTests=true  clean package
>>> >
>>> > Command to build spark master parallel - spark build hangs
>>> > build/mvn -X -Duse.zinc.server=false -DuseZincForJdk8=false 
>>> > -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver 
>>> > -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 
>>> > -DskipTests=true -T 4 clean package
>>> >
>>> > This is the trace which keeps on repeating in maven logs
>>> >
>>> > [DEBUG] building maven31 dependency graph for 
>>> > org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT with 
>>> > Maven31DependencyGraphBuilder
>>> > [DEBUG] Dependency collection stats: {ConflictMarker.analyzeTime=60583, 
>>> > ConflictMarker.markTime=23750, ConflictMarker.nodeCount=419, 
>>> > ConflictIdSorter.graphTime=41262, ConflictIdSorter.topsortTime=9704, 
>>> > ConflictIdSorter.conflictIdCount=105, 
>>> > ConflictIdSorter.conflictIdCycleCount=0, 
>>> > ConflictResolver.totalTime=632542, 
>>> > ConflictResolver.conflictItemCount=193, 
>>> > DefaultDependencyCollector.collectTime=1020759, 
>>> > DefaultDependencyCollector.transformTime=775495}
>>> > [DEBUG] org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT
>>> > [DEBUG]
>>> > 

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Dongjoon Hyun
Hi, Saurabh.

It seems that you are hitting
https://issues.apache.org/jira/browse/SPARK-26095 .

And, we disabled the parallel build via
https://github.com/apache/spark/pull/23061 at 3.0.0.

According to the stack trace in JIRA and PR description,
`maven-shade-plugin` seems to be the root cause.

For now, I'd like to recommend you to disable it because `Maven` itself
warns you already. (You know that, right?)

[INFO] [ pom
> ]-
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but project  *
> [WARNING] * contains the following plugin(s) that have goals not marked   *
> [WARNING] * as @threadSafe to support parallel building.  *
> [WARNING] * While this /may/ work fine, please look for plugin updates*
> [WARNING] * and/or request plugins be made thread-safe.   *
> [WARNING] * If reporting an issue, report it against the plugin in*
> [WARNING] * question, not against maven-core  *
> [WARNING] *
> [WARNING] The following plugins are not marked @threadSafe in Spark
> Project Parent POM:
> [WARNING] org.scalatest:scalatest-maven-plugin:2.0.0
> [WARNING] Enable debug to see more precisely which goals are not marked
> @threadSafe.
> [WARNING] *


I respect `Maven` warnings.

Bests,
Dongjoon.


On Fri, Jan 17, 2020 at 9:22 PM Saurabh Chawla 
wrote:

> Hi Sean,
>
> Thanks for checking this.
>
> I am able to see parallel build info in the readme file
> https://github.com/apache/spark#building-spark
>
> "
> You can build Spark using more than one thread by using the -T option with
> Maven, see "Parallel builds in Maven 3"
> .
> More detailed documentation is available from the project site, at "Building
> Spark" .
> "
>
> This used to work while building older version of spark(2.4.3, 2.3.2 etc).
> build/mvn -Duse.zinc.server=false -DuseZincForJdk8=false
> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
> -DskipTests=true -T 4 clean package
>
> Also I have seen the maven version is changed from 3.5.4 to 3.6.3 in
> master branch compared to spark 2.4.3.
> Not sure if it's due to some bug in maven version used in master or some
> new change added in the master branch that prevent the parallel build.
>
> Regards
> Saurabh Chawla
>
>
> On Sat, Jan 18, 2020 at 2:19 AM Sean Owen  wrote:
>
>> I don't believe you can use a parallel build indeed. Some things
>> collide with each other. Some of the suites are run in parallel inside
>> the build though already.
>>
>> On Fri, Jan 17, 2020 at 1:23 PM Saurabh Chawla 
>> wrote:
>> >
>> > Hi All,
>> >
>> > Spark master build hangs using parallel build option in maven. On
>> running build the sequentially on spark master using maven, build did not
>> hang. This issue occurs on giving hadoop-provided (-Phadoop-provided
>> -Dhadoop.version=2.8.5) option. Same command works fine to build
>> spark-2.4.3 parallelly
>> >
>> > Command to build spark master sequentially - Spark build works fine
>> > build/mvn  -Duse.zinc.server=false -DuseZincForJdk8=false
>> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
>> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
>> -DskipTests=true  clean package
>> >
>> > Command to build spark master parallel - spark build hangs
>> > build/mvn -X -Duse.zinc.server=false -DuseZincForJdk8=false
>> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
>> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
>> -DskipTests=true -T 4 clean package
>> >
>> > This is the trace which keeps on repeating in maven logs
>> >
>> > [DEBUG] building maven31 dependency graph for
>> org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT with
>> Maven31DependencyGraphBuilder
>> > [DEBUG] Dependency collection stats: {ConflictMarker.analyzeTime=60583,
>> ConflictMarker.markTime=23750, ConflictMarker.nodeCount=419,
>> ConflictIdSorter.graphTime=41262, ConflictIdSorter.topsortTime=9704,
>> ConflictIdSorter.conflictIdCount=105,
>> ConflictIdSorter.conflictIdCycleCount=0, ConflictResolver.totalTime=632542,
>> ConflictResolver.conflictItemCount=193,
>> DefaultDependencyCollector.collectTime=1020759,
>> DefaultDependencyCollector.transformTime=775495}
>> > [DEBUG] org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT
>> > [DEBUG]
>> org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:compile
>> > [DEBUG]
>>  org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:compile
>> > [DEBUG]  

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Saurabh Chawla
Hi Sean,

Thanks for checking this.

I am able to see parallel build info in the readme file
https://github.com/apache/spark#building-spark

"
You can build Spark using more than one thread by using the -T option with
Maven, see "Parallel builds in Maven 3"
.
More detailed documentation is available from the project site, at "Building
Spark" .
"

This used to work while building older version of spark(2.4.3, 2.3.2 etc).
build/mvn -Duse.zinc.server=false -DuseZincForJdk8=false
-Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
-Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
-DskipTests=true -T 4 clean package

Also I have seen the maven version is changed from 3.5.4 to 3.6.3 in master
branch compared to spark 2.4.3.
Not sure if it's due to some bug in maven version used in master or some
new change added in the master branch that prevent the parallel build.

Regards
Saurabh Chawla


On Sat, Jan 18, 2020 at 2:19 AM Sean Owen  wrote:

> I don't believe you can use a parallel build indeed. Some things
> collide with each other. Some of the suites are run in parallel inside
> the build though already.
>
> On Fri, Jan 17, 2020 at 1:23 PM Saurabh Chawla 
> wrote:
> >
> > Hi All,
> >
> > Spark master build hangs using parallel build option in maven. On
> running build the sequentially on spark master using maven, build did not
> hang. This issue occurs on giving hadoop-provided (-Phadoop-provided
> -Dhadoop.version=2.8.5) option. Same command works fine to build
> spark-2.4.3 parallelly
> >
> > Command to build spark master sequentially - Spark build works fine
> > build/mvn  -Duse.zinc.server=false -DuseZincForJdk8=false
> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
> -DskipTests=true  clean package
> >
> > Command to build spark master parallel - spark build hangs
> > build/mvn -X -Duse.zinc.server=false -DuseZincForJdk8=false
> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver
> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5
> -DskipTests=true -T 4 clean package
> >
> > This is the trace which keeps on repeating in maven logs
> >
> > [DEBUG] building maven31 dependency graph for
> org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT with
> Maven31DependencyGraphBuilder
> > [DEBUG] Dependency collection stats: {ConflictMarker.analyzeTime=60583,
> ConflictMarker.markTime=23750, ConflictMarker.nodeCount=419,
> ConflictIdSorter.graphTime=41262, ConflictIdSorter.topsortTime=9704,
> ConflictIdSorter.conflictIdCount=105,
> ConflictIdSorter.conflictIdCycleCount=0, ConflictResolver.totalTime=632542,
> ConflictResolver.conflictItemCount=193,
> DefaultDependencyCollector.collectTime=1020759,
> DefaultDependencyCollector.transformTime=775495}
> > [DEBUG] org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT
> > [DEBUG]
> org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:compile
> > [DEBUG]
>  org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:compile
> > [DEBUG]  io.netty:netty-all:jar:4.1.42.Final:compile (version
> managed from 4.1.42.Final)
> > [DEBUG]  org.apache.commons:commons-lang3:jar:3.9:compile
> (version managed from 3.9)
> > [DEBUG]
> org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile (version managed
> from 1.8)
> > [DEBUG]
> com.fasterxml.jackson.core:jackson-databind:jar:2.10.0:compile (version
> managed from 2.10.0)
> > [DEBUG]
>  com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile (version
> managed from 2.10.0)
> > [DEBUG]
> com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile (version
> managed from 2.10.0)
> > [DEBUG]  com.google.code.findbugs:jsr305:jar:3.0.0:compile
> (version managed from 3.0.0)
> > [DEBUG]  com.google.guava:guava:jar:14.0.1:provided (scope
> managed from compile) (version managed from 14.0.1)
> > [DEBUG]  org.apache.commons:commons-crypto:jar:1.0.0:compile
> (version managed from 1.0.0) (exclusions managed from
> [net.java.dev.jna:jna:*:*])
> > [DEBUG]   io.dropwizard.metrics:metrics-core:jar:4.1.1:compile
> (version managed from 4.1.1)
> > [DEBUG]org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:test
> > [DEBUG]   org.scala-lang:scala-library:jar:2.12.10:compile (version
> managed from 2.12.10)
> > [DEBUG]org.apache.spark:spark-tags_2.12:jar:tests:3.0.0-SNAPSHOT:test
> > [DEBUG]org.apache.hadoop:hadoop-client:jar:2.8.5:provided
> (exclusions managed from [org.fusesource.leveldbjni:leveldbjni-all:*:*,
> asm:asm:*:*, org.codehaus.jackson:jackson-mapper-asl:*:*,
> org.ow2.asm:asm:*:*, org.jboss.netty:netty:*:*, io.netty:netty:*:*,
> commons-beanutils:commons-beanutils-core:*:*,
> commons-logging:commons-logging:*:*, org.mockito:mockito-all:*:*,
> 

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Sean Owen
I don't believe you can use a parallel build indeed. Some things
collide with each other. Some of the suites are run in parallel inside
the build though already.

On Fri, Jan 17, 2020 at 1:23 PM Saurabh Chawla  wrote:
>
> Hi All,
>
> Spark master build hangs using parallel build option in maven. On running 
> build the sequentially on spark master using maven, build did not hang. This 
> issue occurs on giving hadoop-provided (-Phadoop-provided 
> -Dhadoop.version=2.8.5) option. Same command works fine to build spark-2.4.3 
> parallelly
>
> Command to build spark master sequentially - Spark build works fine
> build/mvn  -Duse.zinc.server=false -DuseZincForJdk8=false 
> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver 
> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 
> -DskipTests=true  clean package
>
> Command to build spark master parallel - spark build hangs
> build/mvn -X -Duse.zinc.server=false -DuseZincForJdk8=false 
> -Dmaven.javadoc.skip=true -DskipSource=true -Phive -Phive-thriftserver 
> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 
> -DskipTests=true -T 4 clean package
>
> This is the trace which keeps on repeating in maven logs
>
> [DEBUG] building maven31 dependency graph for 
> org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT with 
> Maven31DependencyGraphBuilder
> [DEBUG] Dependency collection stats: {ConflictMarker.analyzeTime=60583, 
> ConflictMarker.markTime=23750, ConflictMarker.nodeCount=419, 
> ConflictIdSorter.graphTime=41262, ConflictIdSorter.topsortTime=9704, 
> ConflictIdSorter.conflictIdCount=105, 
> ConflictIdSorter.conflictIdCycleCount=0, ConflictResolver.totalTime=632542, 
> ConflictResolver.conflictItemCount=193, 
> DefaultDependencyCollector.collectTime=1020759, 
> DefaultDependencyCollector.transformTime=775495}
> [DEBUG] org.apache.spark:spark-network-yarn_2.12:jar:3.0.0-SNAPSHOT
> [DEBUG]
> org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:compile
> [DEBUG]   
> org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:compile
> [DEBUG]  io.netty:netty-all:jar:4.1.42.Final:compile (version managed 
> from 4.1.42.Final)
> [DEBUG]  org.apache.commons:commons-lang3:jar:3.9:compile (version 
> managed from 3.9)
> [DEBUG]  org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile 
> (version managed from 1.8)
> [DEBUG]  
> com.fasterxml.jackson.core:jackson-databind:jar:2.10.0:compile (version 
> managed from 2.10.0)
> [DEBUG] 
> com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile (version managed 
> from 2.10.0)
> [DEBUG]  
> com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile (version 
> managed from 2.10.0)
> [DEBUG]  com.google.code.findbugs:jsr305:jar:3.0.0:compile (version 
> managed from 3.0.0)
> [DEBUG]  com.google.guava:guava:jar:14.0.1:provided (scope managed 
> from compile) (version managed from 14.0.1)
> [DEBUG]  org.apache.commons:commons-crypto:jar:1.0.0:compile (version 
> managed from 1.0.0) (exclusions managed from [net.java.dev.jna:jna:*:*])
> [DEBUG]   io.dropwizard.metrics:metrics-core:jar:4.1.1:compile (version 
> managed from 4.1.1)
> [DEBUG]org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:test
> [DEBUG]   org.scala-lang:scala-library:jar:2.12.10:compile (version 
> managed from 2.12.10)
> [DEBUG]org.apache.spark:spark-tags_2.12:jar:tests:3.0.0-SNAPSHOT:test
> [DEBUG]org.apache.hadoop:hadoop-client:jar:2.8.5:provided (exclusions 
> managed from [org.fusesource.leveldbjni:leveldbjni-all:*:*, asm:asm:*:*, 
> org.codehaus.jackson:jackson-mapper-asl:*:*, org.ow2.asm:asm:*:*, 
> org.jboss.netty:netty:*:*, io.netty:netty:*:*, 
> commons-beanutils:commons-beanutils-core:*:*, 
> commons-logging:commons-logging:*:*, org.mockito:mockito-all:*:*, 
> org.mortbay.jetty:servlet-api-2.5:*:*, javax.servlet:servlet-api:*:*, 
> junit:junit:*:*, com.sun.jersey:*:*:*, 
> com.sun.jersey.jersey-test-framework:*:*:*, com.sun.jersey.contribs:*:*:*, 
> net.java.dev.jets3t:jets3t:*:*, javax.ws.rs:jsr311-api:*:*, 
> org.eclipse.jetty:jetty-webapp:*:*])
> [DEBUG]   org.apache.hadoop:hadoop-common:jar:2.8.5:provided
> [DEBUG]  com.hadoop.gplcompression:hadoop-lzo:jar:0.4.19:provided
> [DEBUG]  commons-cli:commons-cli:jar:1.2:provided
> [DEBUG]  org.apache.commons:commons-math3:jar:3.4.1:provided (version 
> managed from 3.1.1)
> [DEBUG]  org.apache.httpcomponents:httpclient:jar:4.5.6:provided 
> (version managed from 4.5.2)
> [DEBUG] org.apache.httpcomponents:httpcore:jar:4.4.12:provided 
> (version managed from 4.4.10)
> [DEBUG]  commons-codec:commons-codec:jar:1.10:provided (version 
> managed from 1.11)
> [DEBUG]  commons-io:commons-io:jar:2.4:provided (version managed from 
> 2.5)
> [DEBUG]  commons-net:commons-net:jar:3.1:provided (version managed 
> from 3.6)
> [DEBUG]