Re: Spark build is failing in amplab Jenkins

2017-11-05 Thread Frank Austin Nothaft
Hi Xin!

Alyssa and I chatted just now and both reviewed the mango build scripts. We 
don’t see anything in the mango build scripts that looks concerning. To give a 
bit more context, Mango is a Spark-based application for visualizing genomics 
data that is built in Scala, but which has python language bindings and a 
node.js frontend. During CI, the mango build runs the following steps:

• Creates a temp directory
• Runs maven to build the Java artifacts
• Copies the built artifacts into the temp directory, and cd’s into the temp 
directory. Inside the temp directory, we:
• Create a temporary conda environment and install node.js into the 
conda environment
• Pull down a pre-built distribution of Spark
• We then run our python build, from inside the temp directory
• Once this is done, we:
• Deactivate and remove the conda environment
• Delete the temp directory

This is very similar to the ADAM build, which has been running Python builds 
since mid-summer. We don’t manipulate any python dependencies outside of the 
conda environment, which we delete at the end of the build, so we are pretty 
confident that we’re not doing anything that should be breaking the PySpark 
builds.

To help debug, it would help if you could provide the path to the Python 
executables that get run during both a good and a bad build, as well as the 
Python versions. From our side (mango/ADAM), we’ve seen some oddness over the 
last few months with the environment on some of the Jenkins executors (things 
like the JAVA_HOME getting changed), but we haven’t been able to root cause 
those issues.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On Nov 4, 2017, at 10:50 PM, Frank A Nothaft <fnoth...@berkeley.edu> wrote:
> 
> Hi Xin!
> 
> Mango does install python dependencies, but they should all be inside of a 
> conda environment. My guess is that we've got somewhere in the mango Jenkins 
> build where something is getting installed outside of the conda environment. 
> I'll be looking into this shortly.
> 
> Regards,
> 
> Frank Austin Nothaft
> 
> On Nov 4, 2017, at 9:25 PM, Xin Lu <x...@salesforce.com 
> <mailto:x...@salesforce.com>> wrote:
> 
>> I'm not entirely sure if it's the cause because I can't see the build 
>> configurations, but just looking at the build logs it looks like they share 
>> a pool and those mango builds run some setup with python.  
>> 
>> On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft <fnoth...@berkeley.edu 
>> <mailto:fnoth...@berkeley.edu>> wrote:
>> Hi folks,
>> 
>> Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will 
>> start to look into this to see what the connection between the mango builds 
>> and the failing Spark builds are.
>> 
>> Regards,
>> 
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
>> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
>> 202-340-0466 <tel:(202)%20340-0466>
>> 
>>> On Nov 4, 2017, at 9:15 PM, Xin Lu <x...@salesforce.com 
>>> <mailto:x...@salesforce.com>> wrote:
>>> 
>>> Sorry, mango wasn't added recently, but it looks like after successful 
>>> builds of this specific configuration the workers break:
>>> 
>>> https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/
>>>  
>>> <https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/>
>>> 
>>> And then after another configuration runs it recovers.
>>> 
>>> Xin
>>> 
>>> On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <x...@salesforce.com 
>>> <mailto:x...@salesforce.com>> wrote:
>>> It has happened with other workers as well, namely 3 and 4 and then 
>>> recovered. Looking at the build history it looks like a project called 
>>> mango has been added to this pool of machines recently:
>>> 
>>> https://amplab.cs.berkeley.edu/jenkins/job/mango/ 
>>> <https://amplab.cs.berkeley.edu/jenkins/job/mango/>
>>> 
>>> It looks like the slaves start to fail spark pull request builds after some 
>>> runs of mango.  
>>> 
>>> Xin
>>> 
>>> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <gurwls...@gmail.com 
>>> <mailto:gurwls...@gmail.com>> wrote:
>>> I assume it is as it says:
>>> 
>>> Python versions prior to 2.7 are not supported.
>>> 
>>> Looks this happens in worker 2, 6 and 7 given my observation.
>&g

Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Frank Austin Nothaft
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start 
to look into this to see what the connection between the mango builds and the 
failing Spark builds are.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On Nov 4, 2017, at 9:15 PM, Xin Lu <x...@salesforce.com> wrote:
> 
> Sorry, mango wasn't added recently, but it looks like after successful builds 
> of this specific configuration the workers break:
> 
> https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/
>  
> <https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/>
> 
> And then after another configuration runs it recovers.
> 
> Xin
> 
> On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu <x...@salesforce.com 
> <mailto:x...@salesforce.com>> wrote:
> It has happened with other workers as well, namely 3 and 4 and then 
> recovered. Looking at the build history it looks like a project called mango 
> has been added to this pool of machines recently:
> 
> https://amplab.cs.berkeley.edu/jenkins/job/mango/ 
> <https://amplab.cs.berkeley.edu/jenkins/job/mango/>
> 
> It looks like the slaves start to fail spark pull request builds after some 
> runs of mango.  
> 
> Xin
> 
> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon <gurwls...@gmail.com 
> <mailto:gurwls...@gmail.com>> wrote:
> I assume it is as it says:
> 
> Python versions prior to 2.7 are not supported.
> 
> Looks this happens in worker 2, 6 and 7 given my observation.
> 
> 
> On 4 Nov 2017 5:15 pm, "Sean Owen" <so...@cloudera.com 
> <mailto:so...@cloudera.com>> wrote:
> Agree, seeing this somewhat regularly on the pull request builder. Do some 
> machines inadvertently have Python 2.6? some builds succeed, so may just be 
> one or a few. CC Shane.
> 
> 
> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar <pralabhku...@gmail.com 
> <mailto:pralabhku...@gmail.com>> wrote:
> Hi Dev
> 
> Spark build is failing in Jenkins
> 
> 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>  
> <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull>
> 
> Python versions prior to 2.7 are not supported.
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
> were found. Configuration error?
> 
> Please help
> 
> 
> Regards
> Pralabh Kumar
> 
> 
> 



Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-01 Thread Frank Austin Nothaft
Hi Ryan,

IMO, the problem is that the Spark Avro version conflicts with the Parquet Avro 
version. As discussed upthread, I don’t think there’s a way to reliably make 
sure that Avro 1.8 is on the classpath first while using spark-submit. 
Relocating avro in our project wouldn’t solve the problem, because the 
MethodNotFoundError is thrown from the internals of the 
ParquetAvroOutputFormat, not from code in our project.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On May 1, 2017, at 12:33 PM, Ryan Blue <rb...@netflix.com> wrote:
> 
> Michael, I think that the problem is with your classpath.
> 
> Spark has a dependency to 1.7.7, which can't be changed. Your project is what 
> pulls in parquet-avro and transitively Avro 1.8. Spark has no runtime 
> dependency on Avro 1.8. It is understandably annoying that using the same 
> version of Parquet for your parquet-avro dependency is what causes your 
> project to depend on Avro 1.8, but Spark's dependencies aren't a problem 
> because its Parquet dependency doesn't bring in Avro.
> 
> There are a few ways around this:
> 1. Make sure Avro 1.8 is found in the classpath first
> 2. Shade Avro 1.8 in your project (assuming Avro classes aren't shared)
> 3. Use parquet-avro 1.8.1 in your project, which I think should work with 
> 1.8.2 and avoid the Avro change
> 
> The work-around in Spark is for tests, which do use parquet-avro. We can look 
> at a Parquet 1.8.3 that avoids this issue, but I think this is reasonable for 
> the 2.2.0 release.
> 
> rb
> 
> On Mon, May 1, 2017 at 12:08 PM, Michael Heuer <heue...@gmail.com 
> <mailto:heue...@gmail.com>> wrote:
> Please excuse me if I'm misunderstanding -- the problem is not with our 
> library or our classpath.
> 
> There is a conflict within Spark itself, in that Parquet 1.8.2 expects to 
> find Avro 1.8.0 on the runtime classpath and sees 1.7.7 instead.  Spark 
> already has to work around this for unit tests to pass.
> 
> 
> 
> On Mon, May 1, 2017 at 2:00 PM, Ryan Blue <rb...@netflix.com 
> <mailto:rb...@netflix.com>> wrote:
> Thanks for the extra context, Frank. I agree that it sounds like your problem 
> comes from the conflict between your Jars and what comes with Spark. Its the 
> same concern that makes everyone shudder when anything has a public 
> dependency on Jackson. :)
> 
> What we usually do to get around situations like this is to relocate the 
> problem library inside the shaded Jar. That way, Spark uses its version of 
> Avro and your classes use a different version of Avro. This works if you 
> don't need to share classes between the two. Would that work for your 
> situation?
> 
> rb
> 
> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com 
> <mailto:ko...@tresata.com>> wrote:
> sounds like you are running into the fact that you cannot really put your 
> classes before spark's on classpath? spark's switches to support this never 
> really worked for me either.
> 
> inability to control the classpath + inconsistent jars => trouble ?
> 
> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <fnoth...@berkeley.edu 
> <mailto:fnoth...@berkeley.edu>> wrote:
> Hi Ryan,
> 
> We do set Avro to 1.8 in our downstream project. We also set Spark as a 
> provided dependency, and build an überjar. We run via spark-submit, which 
> builds the classpath with our überjar and all of the Spark deps. This leads 
> to avro 1.7.1 getting picked off of the classpath at runtime, which causes 
> the no such method exception to occur.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
> 202-340-0466 <tel:(202)%20340-0466>
>> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com 
>> <mailto:rb...@netflix.com>> wrote:
>> 
>> Frank,
>> 
>> The issue you're running into is caused by using parquet-avro with Avro 1.7. 
>> Can't your downstream project set the Avro dependency to 1.8? Spark can't 
>> update Avro because it is a breaking change that would force users to 
>> rebuilt specific Avro classes in some cases. But you should be free to use 
>> Avro 1.8 to avoid the problem.
>> 
>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnoth...@berkeley.edu 
>> <mailto:fnoth...@berkeley.edu>> wrote:
>> Hi Ryan et al,
>> 
>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a 
>> downstream project is that parquet-avro uses one of the new Avro 1.8.0 
>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-01 Thread Frank Austin Nothaft
Hi Ryan!

I think relocating the avro dependency inside of Spark would make a lot of 
sense. Otherwise, we’d need Spark to move to Avro 1.8.0, or Parquet to cut a 
new 1.8.3 release that either reverts back to Avro 1.7.7 or that eliminates the 
code that is binary incompatible between Avro 1.7.7 and 1.8.0.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On May 1, 2017, at 12:00 PM, Ryan Blue <rb...@netflix.com> wrote:
> 
> Thanks for the extra context, Frank. I agree that it sounds like your problem 
> comes from the conflict between your Jars and what comes with Spark. Its the 
> same concern that makes everyone shudder when anything has a public 
> dependency on Jackson. :)
> 
> What we usually do to get around situations like this is to relocate the 
> problem library inside the shaded Jar. That way, Spark uses its version of 
> Avro and your classes use a different version of Avro. This works if you 
> don't need to share classes between the two. Would that work for your 
> situation?
> 
> rb
> 
> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com 
> <mailto:ko...@tresata.com>> wrote:
> sounds like you are running into the fact that you cannot really put your 
> classes before spark's on classpath? spark's switches to support this never 
> really worked for me either.
> 
> inability to control the classpath + inconsistent jars => trouble ?
> 
> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <fnoth...@berkeley.edu 
> <mailto:fnoth...@berkeley.edu>> wrote:
> Hi Ryan,
> 
> We do set Avro to 1.8 in our downstream project. We also set Spark as a 
> provided dependency, and build an überjar. We run via spark-submit, which 
> builds the classpath with our überjar and all of the Spark deps. This leads 
> to avro 1.7.1 getting picked off of the classpath at runtime, which causes 
> the no such method exception to occur.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
> 202-340-0466 <tel:(202)%20340-0466>
>> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com 
>> <mailto:rb...@netflix.com>> wrote:
>> 
>> Frank,
>> 
>> The issue you're running into is caused by using parquet-avro with Avro 1.7. 
>> Can't your downstream project set the Avro dependency to 1.8? Spark can't 
>> update Avro because it is a breaking change that would force users to 
>> rebuilt specific Avro classes in some cases. But you should be free to use 
>> Avro 1.8 to avoid the problem.
>> 
>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnoth...@berkeley.edu 
>> <mailto:fnoth...@berkeley.edu>> wrote:
>> Hi Ryan et al,
>> 
>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a 
>> downstream project is that parquet-avro uses one of the new Avro 1.8.0 
>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a 
>> dependency. My colleague Michael (who posted earlier on this thread) 
>> documented this in Spark-19697 
>> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark has 
>> unit tests that check this compatibility issue, but it looks like there was 
>> a recent change that sets a test scope dependency on Avro 1.8.0 
>> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>,
>>  which masks this issue in the unit tests. With this error, you can’t use 
>> the ParquetAvroOutputFormat from a application running on Spark 2.2.0.
>> 
>> Regards,
>> 
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
>> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
>> 202-340-0466 <tel:(202)%20340-0466>
>> 
>>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID 
>>> <mailto:rb...@netflix.com.invalid>> wrote:
>>> 
>>> I agree with Sean. Spark only pulls in parquet-avro for tests. For 
>>> execution, it implements the record materialization APIs in Parquet to go 
>>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 
>>> dependency into Spark as far as I can tell.
>>> 
>>> rb
>>> 
>>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com 
>>> <mailto:so...@cloudera.com>> wrote:
>>> See discussion at https://github.com/apache/spark/pull/17163 
>>> <https://github.com/apache/spark/pull/17163> -- I think the issue is that 
>>> fix

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-01 Thread Frank Austin Nothaft
Hi Ryan,

We do set Avro to 1.8 in our downstream project. We also set Spark as a 
provided dependency, and build an überjar. We run via spark-submit, which 
builds the classpath with our überjar and all of the Spark deps. This leads to 
avro 1.7.1 getting picked off of the classpath at runtime, which causes the no 
such method exception to occur.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com> wrote:
> 
> Frank,
> 
> The issue you're running into is caused by using parquet-avro with Avro 1.7. 
> Can't your downstream project set the Avro dependency to 1.8? Spark can't 
> update Avro because it is a breaking change that would force users to rebuilt 
> specific Avro classes in some cases. But you should be free to use Avro 1.8 
> to avoid the problem.
> 
> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnoth...@berkeley.edu 
> <mailto:fnoth...@berkeley.edu>> wrote:
> Hi Ryan et al,
> 
> The issue we’ve seen using a build of the Spark 2.2.0 branch from a 
> downstream project is that parquet-avro uses one of the new Avro 1.8.0 
> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a 
> dependency. My colleague Michael (who posted earlier on this thread) 
> documented this in Spark-19697 
> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark has 
> unit tests that check this compatibility issue, but it looks like there was a 
> recent change that sets a test scope dependency on Avro 1.8.0 
> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>,
>  which masks this issue in the unit tests. With this error, you can’t use the 
> ParquetAvroOutputFormat from a application running on Spark 2.2.0.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
> 202-340-0466 <tel:(202)%20340-0466>
> 
>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID 
>> <mailto:rb...@netflix.com.invalid>> wrote:
>> 
>> I agree with Sean. Spark only pulls in parquet-avro for tests. For 
>> execution, it implements the record materialization APIs in Parquet to go 
>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 
>> dependency into Spark as far as I can tell.
>> 
>> rb
>> 
>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com 
>> <mailto:so...@cloudera.com>> wrote:
>> See discussion at https://github.com/apache/spark/pull/17163 
>> <https://github.com/apache/spark/pull/17163> -- I think the issue is that 
>> fixing this trades one problem for a slightly bigger one.
>> 
>> 
>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com 
>> <mailto:heue...@gmail.com>> wrote:
>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not 
>> bump the dependency version for avro (currently at 1.7.7).  Though perhaps 
>> not clear from the issue I reported [0], this means that Spark is internally 
>> inconsistent, in that a call through parquet (which depends on avro 1.8.0 
>> [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  
>> Avro 1.8.0 is not binary compatible with 1.7.7.
>> 
>> [0] - https://issues.apache.org/jira/browse/SPARK-19697 
>> <https://issues.apache.org/jira/browse/SPARK-19697>
>> [1] - 
>> https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96 
>> <https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96>
>> 
>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com 
>> <mailto:so...@cloudera.com>> wrote:
>> I have one more issue that, if it needs to be fixed, needs to be fixed for 
>> 2.2.0.
>> 
>> I'm fixing build warnings for the release and noticed that checkstyle 
>> actually complains there are some Java methods named in TitleCase, like 
>> `ProcessingTimeTimeout`:
>> 
>> https://github.com/apache/spark/pull/17803/files#r113934080 
>> <https://github.com/apache/spark/pull/17803/files#r113934080>
>> 
>> Easy enough to fix and it's right, that's not conventional. However I wonder 
>> if it was done on purpose to match a class name?
>> 
>> I think this is one for @tdas
>> 
>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com 
>> <mailto:mich...@databricks.com>> wrote:
>> Please vote on releasing the following candidate as Apache Spark version 
&g