Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Frank A Nothaft
Hi Xin!

Mango does install python dependencies, but they should all be inside of a 
conda environment. My guess is that we've got somewhere in the mango Jenkins 
build where something is getting installed outside of the conda environment. 
I'll be looking into this shortly.

Regards,

Frank Austin Nothaft

> On Nov 4, 2017, at 9:25 PM, Xin Lu  wrote:
> 
> I'm not entirely sure if it's the cause because I can't see the build 
> configurations, but just looking at the build logs it looks like they share a 
> pool and those mango builds run some setup with python.  
> 
>> On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft  
>> wrote:
>> Hi folks,
>> 
>> Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will 
>> start to look into this to see what the connection between the mango builds 
>> and the failing Spark builds are.
>> 
>> Regards,
>> 
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu
>> fnoth...@eecs.berkeley.edu
>> 202-340-0466
>> 
>>> On Nov 4, 2017, at 9:15 PM, Xin Lu  wrote:
>>> 
>>> Sorry, mango wasn't added recently, but it looks like after successful 
>>> builds of this specific configuration the workers break:
>>> 
>>> https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/
>>> 
>>> And then after another configuration runs it recovers.
>>> 
>>> Xin
>>> 
 On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu  wrote:
 It has happened with other workers as well, namely 3 and 4 and then 
 recovered. Looking at the build history it looks like a project called 
 mango has been added to this pool of machines recently:
 
 https://amplab.cs.berkeley.edu/jenkins/job/mango/
 
 It looks like the slaves start to fail spark pull request builds after 
 some runs of mango.  
 
 Xin
 
> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon  wrote:
> I assume it is as it says:
> 
> Python versions prior to 2.7 are not supported.
> 
> Looks this happens in worker 2, 6 and 7 given my observation.
> 
> 
> On 4 Nov 2017 5:15 pm, "Sean Owen"  wrote:
> Agree, seeing this somewhat regularly on the pull request builder. Do 
> some machines inadvertently have Python 2.6? some builds succeed, so may 
> just be one or a few. CC Shane.
> 
> 
>> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar  
>> wrote:
>> Hi Dev
>> 
>> Spark build is failing in Jenkins
>> 
>> 
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>> 
>> Python versions prior to 2.7 are not supported.
>> Build step 'Execute shell' marked build as failure
>> Archiving artifacts
>> Recording test results
>> ERROR: Step ?Publish JUnit test result report? failed: No test report 
>> files were found. Configuration error?
>> 
>> Please help
>> 
>> 
>> Regards
>> Pralabh Kumar
> 
 
>>> 
>> 
> 


Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Xin Lu
I'm not entirely sure if it's the cause because I can't see the build
configurations, but just looking at the build logs it looks like they share
a pool and those mango builds run some setup with python.

On Sat, Nov 4, 2017 at 9:19 PM, Frank Austin Nothaft 
wrote:

> Hi folks,
>
> Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will
> start to look into this to see what the connection between the mango builds
> and the failing Spark builds are.
>
> Regards,
>
> Frank Austin Nothaft
> fnoth...@berkeley.edu
> fnoth...@eecs.berkeley.edu
> 202-340-0466 <(202)%20340-0466>
>
> On Nov 4, 2017, at 9:15 PM, Xin Lu  wrote:
>
> Sorry, mango wasn't added recently, but it looks like after successful
> builds of this specific configuration the workers break:
>
> https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_
> VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/
>
> And then after another configuration runs it recovers.
>
> Xin
>
> On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu  wrote:
>
>> It has happened with other workers as well, namely 3 and 4 and then
>> recovered. Looking at the build history it looks like a project called
>> mango has been added to this pool of machines recently:
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/mango/
>>
>> It looks like the slaves start to fail spark pull request builds after
>> some runs of mango.
>>
>> Xin
>>
>> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon  wrote:
>>
>>> I assume it is as it says:
>>>
>>> Python versions prior to 2.7 are not supported.
>>>
>>>
>>> Looks this happens in worker 2, 6 and 7 given my observation.
>>>
>>>
>>> On 4 Nov 2017 5:15 pm, "Sean Owen"  wrote:
>>>
>>> Agree, seeing this somewhat regularly on the pull request builder. Do
>>> some machines inadvertently have Python 2.6? some builds succeed, so may
>>> just be one or a few. CC Shane.
>>>
>>>
>>> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar 
>>> wrote:
>>>
 Hi Dev

 Spark build is failing in Jenkins


 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull


 Python versions prior to 2.7 are not supported.
 Build step 'Execute shell' marked build as failure
 Archiving artifacts
 Recording test results
 ERROR: Step ?Publish JUnit test result report? failed: No test report 
 files were found. Configuration error?


 Please help



 Regards

 Pralabh Kumar


>>>
>>
>
>


Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Frank Austin Nothaft
Hi folks,

Alyssa (cc’ed) and I manage the mango build on the AMPLab Jenkins. I will start 
to look into this to see what the connection between the mango builds and the 
failing Spark builds are.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On Nov 4, 2017, at 9:15 PM, Xin Lu  wrote:
> 
> Sorry, mango wasn't added recently, but it looks like after successful builds 
> of this specific configuration the workers break:
> 
> https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/
>  
> 
> 
> And then after another configuration runs it recovers.
> 
> Xin
> 
> On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu  > wrote:
> It has happened with other workers as well, namely 3 and 4 and then 
> recovered. Looking at the build history it looks like a project called mango 
> has been added to this pool of machines recently:
> 
> https://amplab.cs.berkeley.edu/jenkins/job/mango/ 
> 
> 
> It looks like the slaves start to fail spark pull request builds after some 
> runs of mango.  
> 
> Xin
> 
> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon  > wrote:
> I assume it is as it says:
> 
> Python versions prior to 2.7 are not supported.
> 
> Looks this happens in worker 2, 6 and 7 given my observation.
> 
> 
> On 4 Nov 2017 5:15 pm, "Sean Owen"  > wrote:
> Agree, seeing this somewhat regularly on the pull request builder. Do some 
> machines inadvertently have Python 2.6? some builds succeed, so may just be 
> one or a few. CC Shane.
> 
> 
> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar  > wrote:
> Hi Dev
> 
> Spark build is failing in Jenkins
> 
> 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>  
> 
> 
> Python versions prior to 2.7 are not supported.
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
> were found. Configuration error?
> 
> Please help
> 
> 
> Regards
> Pralabh Kumar
> 
> 
> 



Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Xin Lu
Sorry, mango wasn't added recently, but it looks like after successful
builds of this specific configuration the workers break:

https://amplab.cs.berkeley.edu/jenkins/job/mango/HADOOP_VERSION=2.6.0,SCALAVER=2.11,SPARK_VERSION=1.6.1,label=centos/

And then after another configuration runs it recovers.

Xin

On Sat, Nov 4, 2017 at 9:09 PM, Xin Lu  wrote:

> It has happened with other workers as well, namely 3 and 4 and then
> recovered. Looking at the build history it looks like a project called
> mango has been added to this pool of machines recently:
>
> https://amplab.cs.berkeley.edu/jenkins/job/mango/
>
> It looks like the slaves start to fail spark pull request builds after
> some runs of mango.
>
> Xin
>
> On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon  wrote:
>
>> I assume it is as it says:
>>
>> Python versions prior to 2.7 are not supported.
>>
>>
>> Looks this happens in worker 2, 6 and 7 given my observation.
>>
>>
>> On 4 Nov 2017 5:15 pm, "Sean Owen"  wrote:
>>
>> Agree, seeing this somewhat regularly on the pull request builder. Do
>> some machines inadvertently have Python 2.6? some builds succeed, so may
>> just be one or a few. CC Shane.
>>
>>
>> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar 
>> wrote:
>>
>>> Hi Dev
>>>
>>> Spark build is failing in Jenkins
>>>
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>>>
>>>
>>> Python versions prior to 2.7 are not supported.
>>> Build step 'Execute shell' marked build as failure
>>> Archiving artifacts
>>> Recording test results
>>> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
>>> were found. Configuration error?
>>>
>>>
>>> Please help
>>>
>>>
>>>
>>> Regards
>>>
>>> Pralabh Kumar
>>>
>>>
>>
>


Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Xin Lu
It has happened with other workers as well, namely 3 and 4 and then
recovered. Looking at the build history it looks like a project called
mango has been added to this pool of machines recently:

https://amplab.cs.berkeley.edu/jenkins/job/mango/

It looks like the slaves start to fail spark pull request builds after some
runs of mango.

Xin

On Sat, Nov 4, 2017 at 1:23 AM, Hyukjin Kwon  wrote:

> I assume it is as it says:
>
> Python versions prior to 2.7 are not supported.
>
>
> Looks this happens in worker 2, 6 and 7 given my observation.
>
>
> On 4 Nov 2017 5:15 pm, "Sean Owen"  wrote:
>
> Agree, seeing this somewhat regularly on the pull request builder. Do some
> machines inadvertently have Python 2.6? some builds succeed, so may just be
> one or a few. CC Shane.
>
>
> On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar 
> wrote:
>
>> Hi Dev
>>
>> Spark build is failing in Jenkins
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>>
>>
>> Python versions prior to 2.7 are not supported.
>> Build step 'Execute shell' marked build as failure
>> Archiving artifacts
>> Recording test results
>> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
>> were found. Configuration error?
>>
>>
>> Please help
>>
>>
>>
>> Regards
>>
>> Pralabh Kumar
>>
>>
>


Accessing DataFrame inside UserDefinedFunction.

2017-11-04 Thread knowsnothing
Hi Everyone,

I've been told, that accessing DataFrame object from UserDefinedFunction is
not possible, but turns out that the code shown below works fine (taken from
StackOverflow). Why is it so? Is it a bug, or is it expected?

Thanks in advance.

case class Target(wordListOne: Seq[String], WordListTwo: Seq[String])
val targetData = Seq(Target(Seq("Spark", "Wrong", "Something"), Seq("Java",
"Grape", "Banana")),
 Target(Seq("Java", "Scala"), Seq("Scala", "Banana")),
 Target(Seq(""), Seq("Grape", "Banana")),
 Target(Seq(""), Seq("")))
val targets = spark.createDataset(targetData)

case class WordSimilarity(first: String, second: String, similarity: Double)
val similarityData = Seq(WordSimilarity("Spark", "Java", 0.8), 
 WordSimilarity("Scala", "Spark", 0.9), 
 WordSimilarity("Java", "Scala", 0.9),
 WordSimilarity("Apple", "Grape", 0.66),
 WordSimilarity("Scala", "Apple", -0.1),
 WordSimilarity("Gine", "Spark", 0.1)) 
val dict = spark.createDataset(similarityData)

val countPositiveSimilarity = udf[Long, Seq[String], Seq[String]]((a, b) => 
dict.filter(
(($"first".isin(a: _*) && $"second".isin(b: _*)) ||
($"first".isin(b: _*) && $"second".isin(a: _*))) && $"similarity" > 0.7
).count
)

val countDF = targets.withColumn("positive_count",
countPositiveSimilarity($"wordListOne", $"wordListTwo"))




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-04 Thread Burak Yavuz
+1

On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan  wrote:

> +1
>
> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu 
> wrote:
>
>> +1.
>>
>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia 
>> wrote:
>>
>>> +1 from me too.
>>>
>>> Matei
>>>
>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan  wrote:
>>> >
>>> > +1.
>>> >
>>> > I think this architecture makes a lot of sense to let executors talk
>>> to source/sink directly, and bring very low latency.
>>> >
>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen  wrote:
>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>> have no reason to doubt the change though, from a skim through the doc.
>>> >
>>> >
>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin 
>>> wrote:
>>> > Earlier I sent out a discussion thread for CP in Structured Streaming:
>>> >
>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>> >
>>> > It is meant to be a very small, surgical change to Structured
>>> Streaming to enable ultra-low latency. This is great timing because we are
>>> also designing and implementing data source API v2. If designed properly,
>>> we can have the same data source API working for both streaming and batch.
>>> >
>>> >
>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>> >
>>> > +1: Let's go ahead and design / implement the SPIP.
>>> > +0: Don't really care.
>>> > -1: I do not think this is a good idea for the following reasons.
>>> >
>>> >
>>> >
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
> Greater Chicago
>


Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Hyukjin Kwon
I assume it is as it says:

Python versions prior to 2.7 are not supported.


Looks this happens in worker 2, 6 and 7 given my observation.


On 4 Nov 2017 5:15 pm, "Sean Owen"  wrote:

Agree, seeing this somewhat regularly on the pull request builder. Do some
machines inadvertently have Python 2.6? some builds succeed, so may just be
one or a few. CC Shane.


On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar  wrote:

> Hi Dev
>
> Spark build is failing in Jenkins
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>
>
> Python versions prior to 2.7 are not supported.
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
> were found. Configuration error?
>
>
> Please help
>
>
>
> Regards
>
> Pralabh Kumar
>
>


Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Sean Owen
Agree, seeing this somewhat regularly on the pull request builder. Do some
machines inadvertently have Python 2.6? some builds succeed, so may just be
one or a few. CC Shane.

On Thu, Nov 2, 2017 at 5:39 PM Pralabh Kumar  wrote:

> Hi Dev
>
> Spark build is failing in Jenkins
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
>
>
> Python versions prior to 2.7 are not supported.
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> ERROR: Step ?Publish JUnit test result report? failed: No test report files 
> were found. Configuration error?
>
>
> Please help
>
>
>
> Regards
>
> Pralabh Kumar
>
>