Re: Revisiting Online serving of Spark models?

2018-05-31 Thread Chris Fregly
Hey everyone!

@Felix:  thanks for putting this together.  i sent some of you a quick calendar 
event - mostly for me, so i don’t forget!  :)

Coincidentally, this is the focus of June 6th's Advanced Spark and TensorFlow 
Meetup 
 
@5:30pm on June 6th (same night) here in SF!

Everybody is welcome to come.  Here’s the link to the meetup that includes the 
signup link:  
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/ 


We have an awesome lineup of speakers covered a lot of deep, technical ground.

For those who can’t attend in person, we’ll be broadcasting live - and posting 
the recording afterward.  

All details are in the meetup link above…

@holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than welcome to 
give a talk. I can move things around to make room.

@joseph:  I’d personally like an update on the direction of the Databricks 
proprietary ML Serving export format which is similar to PMML but not a 
standard in any way.

Also, the Databricks ML Serving Runtime is only available to Databricks 
customers.  This seems in conflict with the community efforts described here.  
Can you comment on behalf of Databricks?

Look forward to your response, joseph.

See you all soon!

—

Chris Fregly
Founder @ PipelineAI  (100,000 Users)
Organizer @ Advanced Spark and TensorFlow Meetup 
 (85,000 Global 
Members)

San Francisco - Chicago - Austin - 
Washington DC - London - Dusseldorf

Try our PipelineAI Community Edition with GPUs and TPUs!! 



> On May 30, 2018, at 9:32 AM, Felix Cheung  wrote:
> 
> Hi!
> 
> Thank you! Let’s meet then
> 
> June 6 4pm
> 
> Moscone West Convention Center
> 800 Howard Street, San Francisco, CA 94103
> 
> Ground floor (outside of conference area - should be available for all) - we 
> will meet and decide where to go
> 
> (Would not send invite because that would be too much noise for dev@)
> 
> To paraphrase Joseph, we will use this to kick off the discusssion and post 
> notes after and follow up online. As for Seattle, I would be very interested 
> to meet in person lateen and discuss ;) 
> 
> 
> _
> From: Saikat Kanjilal 
> Sent: Tuesday, May 29, 2018 11:46 AM
> Subject: Re: Revisiting Online serving of Spark models?
> To: Maximiliano Felice 
> Cc: Felix Cheung , Holden Karau 
> , Joseph Bradley , Leif Walsh 
> , dev 
> 
> 
> Would love to join but am in Seattle, thoughts on how to make this work?
> 
> Regards
> 
> Sent from my iPhone
> 
> On May 29, 2018, at 10:35 AM, Maximiliano Felice  > wrote:
> 
>> Big +1 to a meeting with fresh air.
>> 
>> Could anyone send the invites? I don't really know which is the place Holden 
>> is talking about.
>> 
>> 2018-05-29 14:27 GMT-03:00 Felix Cheung > >:
>> You had me at blue bottle!
>> 
>> _
>> From: Holden Karau mailto:hol...@pigscanfly.ca>>
>> Sent: Tuesday, May 29, 2018 9:47 AM
>> Subject: Re: Revisiting Online serving of Spark models?
>> To: Felix Cheung > >
>> Cc: Saikat Kanjilal mailto:sxk1...@hotmail.com>>, 
>> Maximiliano Felice > >, Joseph Bradley > >, Leif Walsh > >, dev > >
>> 
>> 
>> 
>> I'm down for that, we could all go for a walk maybe to the mint plazaa blue 
>> bottle and grab coffee (if the weather holds have our design meeting outside 
>> :p)?
>> 
>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung > > wrote:
>> Bump.
>> 
>> From: Felix Cheung > >
>> Sent: Saturday, May 26, 2018 1:05:29 PM
>> To: Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>> Cc: Leif Walsh; Holden Karau; dev
>> 
>> Subject: Re: Revisiting Online serving of Spark models?
>>  
>> Hi! How about we meet the community and discuss on June 6 4pm at (near) the 
>> Summit?
>> 
>> (I propose we meet at the venue entrance so we could accommodate people 
>> might not be in the conference)
>> 
>> From: Saikat Kanjilal mailto:sxk1...@hotmail.com>>
>> Sent: Tuesday, May 22, 2018 7:47:07 AM
>> To: Maximiliano Felice
>> Cc: Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
>> Subject: Re: Revisiting Online serving of Spark models?
>>  
>> I’m in the same exact boat as Maximiliano and have use cases as well for 
>> model serving and would love to join this discussion.
>> 
>> Sent from my iPhone
>> 
>> On May 22, 2018, at 6:39 AM, Maximiliano Felice > > wrote:
>> 
>>> Hi!
>>> 
>>> I'm don't usually write a lot on this list but I keep up to date with the 
>>> 

Re: Feedback on first commit + jira issue I opened

2018-05-31 Thread Bryan Cutler
Hi Andrew,

Please just go ahead and make the pull request.  It's easier to review and
give feedback, thanks!

Bryan

On Thu, May 31, 2018 at 9:44 AM, Long, Andrew  wrote:

> Hello Friends,
>
>
>
> I’m a new committer and I’ve submitted my first patch and I had some
> questions about documentation standards.  In my patch(jira below)  I’ve
> added a config parameter to adjust the number of records show when a user
> calls .show() on a dataframe.  I was hoping someone could double check my
> small diff to make sure I wasn’t making any rookie mistakes before I submit
> a pull request.
>
>
>
> https://issues.apache.org/jira/browse/SPARK-24442
>
>
>
> Cheers Andrew
>


Re: [VOTE] SPIP ML Pipelines in R

2018-05-31 Thread Joseph Bradley
Hossein might be slow to respond (OOO), but I just commented on the JIRA.
I'd recommend we follow the same process as the SparkR package.

+1 on this from me (and I'll be happy to help shepherd it, though Felix and
Shivaram are the experts in this area).  CRAN presents challenges, but this
is a good step towards making R a first-class citizen for ML use cases of
Spark.

On Thu, May 31, 2018 at 9:10 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> Hossein -- Can you clarify what the resolution on the repository /
> release issue discussed on SPIP ?
>
> Shivaram
>
> On Thu, May 31, 2018 at 9:06 AM, Felix Cheung 
> wrote:
> > +1
> > With my concerns in the SPIP discussion.
> >
> > 
> > From: Hossein 
> > Sent: Wednesday, May 30, 2018 2:03:03 PM
> > To: dev@spark.apache.org
> > Subject: [VOTE] SPIP ML Pipelines in R
> >
> > Hi,
> >
> > I started discussion thread for a new R package to expose MLlib
> pipelines in
> > R.
> >
> > To summarize we will work on utilities to generate R wrappers for MLlib
> > pipeline API for a new R package. This will lower the burden for exposing
> > new API in future.
> >
> > Following the SPIP process, I am proposing the SPIP for a vote.
> >
> > +1: Let's go ahead and implement the SPIP.
> > +0: Don't really care.
> > -1: I do not think this is a good idea for the following reasons.
> >
> > Thanks,
> > --Hossein
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


REMINDER: Apache EU Roadshow 2018 in Berlin is less than 2 weeks away!

2018-05-31 Thread sharan

Hello Apache Supporters and Enthusiasts

This is a reminder that our Apache EU Roadshow in Berlin is less than 
two weeks away and we need your help to spread the word. Please let your 
work colleagues, friends and anyone interested in any attending know 
about our Apache EU Roadshow event.


We have a great schedule including tracks on Apache Tomcat, Apache Http 
Server, Microservices, Internet of Things (IoT) and Cloud Technologies. 
You can find more details at the link below:


https://s.apache.org/0hnG

Ticket prices will be going up on 8^th June 2018, so please make sure 
that you register soon if you want to beat the price increase. 
https://foss-backstage.de/tickets


Remember that registering for the Apache EU Roadshow also gives you 
access to FOSS Backstage so you can attend any talks and workshops from 
both conferences. And don’t forget that our Apache Lounge will be open 
throughout the whole conference as a place to meet up, hack and relax.


We look forward to seeing you in Berlin!

Thanks
Sharan Foga,  VP Apache Community Development

http://apachecon.com/
@apachecon

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.


[Spark SQL Discuss] Better support for Partitioning and Bucketing when used together

2018-05-31 Thread pnpranavrao
Hello,
We use partitioned + bucketed datasets for use-cases where we can afford to
take a perf hit at write time, so that reads are optimised. But I feel Spark
could more optimally exploit the data layout in query planning. Here I
describe why this is a problem, and how it could be improved.

<#why-is-partitioning--bucketing-required-together>  Why is Partitioning +
Bucketing required (together)?
There are a class of common problems that can't be solved by:
Pure partitioning - We want to avoid shuffle on some commonly joined
datasets, on the same few join keys. This can't be solved by pure
partitioning.
Pure Bucketing - For most DataSources (on Spark and other processing
frameworks), the folder is the least granular level of identifying datasets.
The HiveMetastore lets us collect arbitrary folder partitions into a logical
view, and this helps in incremental ingestion and lends itself to a simple
form of MVCC.
On Spark, when you try to both Partition and Bucket a dataset, the format on
disk and in the metastore is correctly recorded. But this information isn't
optimally used for query planning because:
A partitioned+bucketed dataset is read into num_buckets input RDD partitions
due to createBucketedRDD. For large datasets with a lot of partitions, this
DataFrame is now unusable because of the severely limited parallelism. We
can't have large num_buckets, as it would lead to small file problems,
especially in skewed partitions.
We could manually turn bucketing off with the
spark.sql.sources.bucketing.enabled flag, but we would be losing the natural
distribution that's present in the dataset, and lose out on shuffle
elimination.

<#what-could-happen-ideally>  What could happen ideally:
Partitioned + Bucketed data actually have a well defined distribution. It
does not fit the currently defined HashClusteredDistribution, but a new one
can be defined which takes into account both the value based distribution of
partition values and the hash based distribution of bucketing columns.
Queries involving partition columns AND bucketing columns should make use of
this data distribution.E.g: Suppose you have PartitioningCols(a,b) and
BucketingCols(c), the joins we can support without shuffle would be on keys
(a,b,c), (a,c)[Coalesce values of b into a] and (a)[Partition-Partition
join].
Number of input RDD partitions should be decided at the last possible stage
of query planning. If no join (as described above) can utilize bucketed
data, the physicalPlan could fallback to regular DataSource scan.
 <#implementation> 
Implementation:
This will involve changes some aspects of query planning. Here I list the
top level changes:
Add a new Distribution and Partitioning to describe this partition value and
bucket column hash data layout.
Change the Logical and SparkPlan(DataSourceScanExec) to capture the above
Distribution and Partitioning.
Just like ensureRequirements adds a shuffle to satisfy child distributions,
we could have it add a coalesce operator to club together buckets across
folder partitions if required, and at different partition hierarchies
according to the distribution required. For example: With
PartitioningCols(a,b) and BucketingCols(c), a join that involves (a,c) can
be answered by coalescing the b values within (a,c)'s partitions. This would
be a meta-data only operation.
Account for co-clustered partitions so that RDDs can be zipped in joins -
this will have to handle partitions pruned out too.
There is a strong requirement for this functionality at my team (in Amazon).
I've opened a  JIRA   
regarding this issue here.I did consider DataSourcesV2, but it looks like a
better fit here.
I wanted some inputs regarding this. Is this approach feasible and is it
aligned with how Spark wants to handle native datasources in the future? 
Does anyone else have similar requirements?
Thanks,Pranav.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Feedback on first commit + jira issue I opened

2018-05-31 Thread Long, Andrew
Hello Friends,

I’m a new committer and I’ve submitted my first patch and I had some questions 
about documentation standards.  In my patch(jira below)  I’ve added a config 
parameter to adjust the number of records show when a user calls .show() on a 
dataframe.  I was hoping someone could double check my small diff to make sure 
I wasn’t making any rookie mistakes before I submit a pull request.

https://issues.apache.org/jira/browse/SPARK-24442

Cheers Andrew


Re: [VOTE] SPIP ML Pipelines in R

2018-05-31 Thread Shivaram Venkataraman
Hossein -- Can you clarify what the resolution on the repository /
release issue discussed on SPIP ?

Shivaram

On Thu, May 31, 2018 at 9:06 AM, Felix Cheung  wrote:
> +1
> With my concerns in the SPIP discussion.
>
> 
> From: Hossein 
> Sent: Wednesday, May 30, 2018 2:03:03 PM
> To: dev@spark.apache.org
> Subject: [VOTE] SPIP ML Pipelines in R
>
> Hi,
>
> I started discussion thread for a new R package to expose MLlib pipelines in
> R.
>
> To summarize we will work on utilities to generate R wrappers for MLlib
> pipeline API for a new R package. This will lower the burden for exposing
> new API in future.
>
> Following the SPIP process, I am proposing the SPIP for a vote.
>
> +1: Let's go ahead and implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
> Thanks,
> --Hossein

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP ML Pipelines in R

2018-05-31 Thread Felix Cheung
+1
With my concerns in the SPIP discussion.


From: Hossein 
Sent: Wednesday, May 30, 2018 2:03:03 PM
To: dev@spark.apache.org
Subject: [VOTE] SPIP ML Pipelines in R

Hi,

I started discussion 
thread
 for a new R package to expose MLlib pipelines in 
R.

To summarize we will work on utilities to generate R wrappers for MLlib 
pipeline API for a new R package. This will lower the burden for exposing new 
API in future.

Following the SPIP 
process, I am proposing 
the SPIP for a vote.

+1: Let's go ahead and implement the SPIP.
+0: Don't really care.
-1: I do not think this is a good idea for the following reasons.

Thanks,
--Hossein


Re: MatrixUDT and VectorUDT in Spark ML

2018-05-31 Thread Li Jin
Please see https://issues.apache.org/jira/browse/SPARK-24258
On Wed, May 30, 2018 at 10:40 PM Dongjin Lee  wrote:

> How is this issue going? Is there any Jira ticket about this?
>
> Thanks,
> Dongjin
>
> On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <
> himanshu.mo...@aexp.com.invalid> wrote:
>
>> I agree
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>>
>> Himanshu
>>
>>
>>
>> *From:* Li Jin [mailto:ice.xell...@gmail.com]
>> *Sent:* Friday, March 23, 2018 8:24 PM
>> *To:* dev 
>> *Subject:* MatrixUDT and VectorUDT in Spark ML
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I came across these two types MatrixUDT and VectorUDF in Spark ML when
>> doing feature extraction and preprocessing with PySpark. However, when
>> trying to do some basic operations, such as vector multiplication and
>> matrix multiplication, I had to go down to Python UDF.
>>
>>
>>
>> It seems to be it would be very useful to have built-in operators on
>> these types just like first class Spark SQL types, e.g.,
>>
>>
>>
>> df.withColumn('v', df.matrix_column * df.vector_column)
>>
>>
>>
>> I wonder what are other people's thoughts on this?
>>
>>
>>
>> Li
>>
>> --
>> American Express made the following annotations
>> --
>>
>> "This message and any attachments are solely for the intended recipient
>> and may contain confidential or privileged information. If you are not the
>> intended recipient, any disclosure, copying, use, or distribution of the
>> information included in this message and any attachments is prohibited. If
>> you have received this communication in error, please notify us by reply
>> e-mail and immediately and permanently delete this message and any
>> attachments. Thank you."
>>
>> American Express a ajouté le commentaire suivant le
>> Ce courrier et toute pièce jointe qu'il contient sont réservés au seul
>> destinataire indiqué et peuvent renfermer des renseignements confidentiels
>> et privilégiés. Si vous n'êtes pas le destinataire prévu, toute
>> divulgation, duplication, utilisation ou distribution du courrier ou de
>> toute pièce jointe est interdite. Si vous avez reçu cette communication par
>> erreur, veuillez nous en aviser par courrier et détruire immédiatement le
>> courrier et les pièces jointes. Merci.
>> --
>>
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare: 
> www.slideshare.net/dongjinleekr
> *
>


Re: [SQL] Purpose of RuntimeReplaceable unevaluable unary expressions?

2018-05-31 Thread Jacek Laskowski
Yay! That's right!!! Thanks Reynold. Such a short answer with so much
information. Thanks.

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Wed, May 30, 2018 at 8:10 PM, Reynold Xin  wrote:

> SQL expressions?
>
> On Wed, May 30, 2018 at 11:09 AM Jacek Laskowski  wrote:
>
>> Hi,
>>
>> I've been exploring RuntimeReplaceable expressions [1] and have been
>> wondering what their purpose is.
>>
>> Quoting the scaladoc [2]:
>>
>> > An expression that gets replaced at runtime (currently by the
>> optimizer) into a different expression for evaluation. This is mainly used
>> to provide compatibility with other databases.
>>
>> For example, ParseToTimestamp expression is a RuntimeReplaceable
>> expression and it is replaced by Cast(left, TimestampType)
>> or Cast(UnixTimestamp(left, format), TimestampType) per to_timestamp
>> function (there are two variants).
>>
>> My question is why is this RuntimeReplaceable better than simply using
>> the Casts as the implementation of to_timestamp functions?
>>
>> def to_timestamp(s: Column, fmt: String): Column = withExpr {
>>   // pseudocode
>>   Cast(UnixTimestamp(left, format), TimestampType)
>> }
>>
>> What's wrong with the above implementation compared to the current one?
>>
>> [1] https://github.com/apache/spark/blob/master/sql/
>> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
>> expressions/Expression.scala#L275
>>
>> [2] https://github.com/apache/spark/blob/master/sql/
>> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
>> expressions/Expression.scala#L266-L267
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>> Follow me at https://twitter.com/jaceklaskowski
>>
>


Re: Spark version for Mesos 0.27.0

2018-05-31 Thread Thodoris Zois
Ok! Thank you very much!
- Thodoris
On Thu, 2018-05-31 at 11:30 +0200, Szuromi Tamás wrote:
> I see it in the Serenity docs. Anyway, I guess you are able to use
> the newest version of Spark with the Mesos 0.27 without any issues so
> don't have to dispense newer Spark features and fixes.
> Thodoris Zois  ezt írta (időpont: 2018. máj. 31.,
> Cs, 11:22):
> > Hello,
> > The reason is that I want to make some tests and use the
> > oversubscription feature of Mesos along with Spark. Intel and
> > Mesosphere have built a project, called Serenity that actually
> > measures the usage slack on each Mesos agent and returns resources
> > to the cluster. 
> > Unfortunately, Serenity is not compatible with newer versions of
> > Mesos...
> > - Thodoris
> > 
> > On Thu, 2018-05-31 at 11:08 +0200, Szuromi Tamás wrote:
> > > Hey,
> > > I'm sure we used Spark 1.6 on Mesos 0.27 as well but at that time
> > > we used with fine-grained scheduling and not dynamic allocation.
> > > Also, newer Spark versions should work with older mesos versions
> > > like 0.27.
> > > Why do you have Mesos 0.27 btw?
> > > 
> > > cheers,
> > > Tamas
> > > Thodoris Zois  ezt írta (időpont: 2018. máj.
> > > 31., Cs, 1:56):
> > > > Hello,
> > > > I need Mesos 0.27 for specific purposes and unfortunately I
> > > > can’t use a newer version. Did you find anything? Could it be
> > > > Spark 1.6? 
> > > > 
> > > > Except that, from which version Spark supports dynamic
> > > > allocation on Mesos? 
> > > > 
> > > > - Thodoris
> > > > On 25 May 2018, at 16:06, Jacek Laskowski 
> > > > wrote:
> > > > 
> > > > > Hi,
> > > > > Mesos 0.27.0?! That's been a while. I'd search for the
> > > > > changes to pom.xml and see when the mesos dependency version
> > > > > changed. That'd give you the most precise answer. I think it
> > > > > could've been 1.5 or older.
> > > > > 
> > > > > Pozdrawiam,
> > > > > Jacek Laskowski
> > > > > 
> > > > > https://about.me/JacekLaskowski
> > > > > Mastering Spark SQL https://bit.ly/mastering-spark-sql
> > > > > Spark Structured Streaming https://bit.ly/spark-structured-st
> > > > > reaming
> > > > > 
> > > > > Mastering Kafka Streams https://bit.ly/mastering-kafka-stream
> > > > > s
> > > > > Follow me at https://twitter.com/jaceklaskowski
> > > > > 
> > > > > 
> > > > > On Fri, May 25, 2018 at 1:29 PM, Thodoris Zois  > > > > h.gr> wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Could you please tell me which version of Spark works with
> > > > > > Apache Mesos
> > > > > > 
> > > > > >  version 0.27.0? (I cannot find anything on docs at github)
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Thank you very much,
> > > > > > 
> > > > > > Thodoris Zois
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -
> > > > > > 
> > > > > > 
> > > > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > > > > 
> > > > > > 
> > > > > > 

Re: Spark version for Mesos 0.27.0

2018-05-31 Thread Szuromi Tamás
I see it in the Serenity docs. Anyway, I guess you are able to use the
newest version of Spark with the Mesos 0.27 without any issues so don't
have to dispense newer Spark features and fixes.

Thodoris Zois  ezt írta (időpont: 2018. máj. 31., Cs,
11:22):

> Hello,
>
> The reason is that I want to make some tests and use the oversubscription
> feature of Mesos along with Spark. Intel and Mesosphere have built a
> project, called Serenity that actually measures the usage slack on each
> Mesos agent and returns resources to the cluster.
>
> Unfortunately, Serenity is not compatible with newer versions of Mesos...
>
> - Thodoris
>
>
> On Thu, 2018-05-31 at 11:08 +0200, Szuromi Tamás wrote:
>
> Hey,
>
> I'm sure we used Spark 1.6 on Mesos 0.27 as well but at that time we used
> with fine-grained scheduling and not dynamic allocation. Also, newer Spark
> versions should work with older mesos versions like 0.27.
> Why do you have Mesos 0.27 btw?
>
> cheers,
> Tamas
>
> Thodoris Zois  ezt írta (időpont: 2018. máj. 31., Cs,
> 1:56):
>
> Hello,
>
> I need Mesos 0.27 for specific purposes and unfortunately I can’t use a
> newer version. Did you find anything? Could it be Spark 1.6?
>
> Except that, from which version Spark supports dynamic allocation on
> Mesos?
>
> - Thodoris
>
> On 25 May 2018, at 16:06, Jacek Laskowski  wrote:
>
> Hi,
>
> Mesos 0.27.0?! That's been a while. I'd search for the changes to pom.xml
> and see when the mesos dependency version changed. That'd give you the most
> precise answer. I think it could've been 1.5 or older.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>
> On Fri, May 25, 2018 at 1:29 PM, Thodoris Zois  wrote:
>
> Hello,
>
> Could you please tell me which version of Spark works with Apache Mesos
>  version 0.27.0? (I cannot find anything on docs at github)
>
> Thank you very much,
> Thodoris Zois
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>


Re: Spark version for Mesos 0.27.0

2018-05-31 Thread Thodoris Zois
Hello,
The reason is that I want to make some tests and use the
oversubscription feature of Mesos along with Spark. Intel and
Mesosphere have built a project, called Serenity that actually measures
the usage slack on each Mesos agent and returns resources to the
cluster. 
Unfortunately, Serenity is not compatible with newer versions of
Mesos...
- Thodoris

On Thu, 2018-05-31 at 11:08 +0200, Szuromi Tamás wrote:
> Hey,
> I'm sure we used Spark 1.6 on Mesos 0.27 as well but at that time we
> used with fine-grained scheduling and not dynamic allocation. Also,
> newer Spark versions should work with older mesos versions like 0.27.
> Why do you have Mesos 0.27 btw?
> 
> cheers,
> Tamas
> Thodoris Zois  ezt írta (időpont: 2018. máj. 31.,
> Cs, 1:56):
> > Hello,
> > I need Mesos 0.27 for specific purposes and unfortunately I can’t
> > use a newer version. Did you find anything? Could it be Spark 1.6? 
> > 
> > Except that, from which version Spark supports dynamic allocation
> > on Mesos? 
> > 
> > - Thodoris
> > On 25 May 2018, at 16:06, Jacek Laskowski  wrote:
> > 
> > > Hi,
> > > Mesos 0.27.0?! That's been a while. I'd search for the changes to
> > > pom.xml and see when the mesos dependency version changed. That'd
> > > give you the most precise answer. I think it could've been 1.5 or
> > > older.
> > > 
> > > Pozdrawiam,
> > > Jacek Laskowski
> > > 
> > > https://about.me/JacekLaskowski
> > > Mastering Spark SQL https://bit.ly/mastering-spark-sql
> > > Spark Structured Streaming https://bit.ly/spark-structured-stream
> > > ing
> > > 
> > > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> > > Follow me at https://twitter.com/jaceklaskowski
> > > 
> > > 
> > > On Fri, May 25, 2018 at 1:29 PM, Thodoris Zois  > > > wrote:
> > > > Hello,
> > > > 
> > > > 
> > > > 
> > > > Could you please tell me which version of Spark works with
> > > > Apache Mesos
> > > > 
> > > >  version 0.27.0? (I cannot find anything on docs at github)
> > > > 
> > > > 
> > > > 
> > > > Thank you very much,
> > > > 
> > > > Thodoris Zois
> > > > 
> > > > 
> > > > 
> > > > -
> > > > 
> > > > 
> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > > 
> > > > 
> > > > 

Re: Spark version for Mesos 0.27.0

2018-05-31 Thread Szuromi Tamás
Hey,

I'm sure we used Spark 1.6 on Mesos 0.27 as well but at that time we used
with fine-grained scheduling and not dynamic allocation. Also, newer Spark
versions should work with older mesos versions like 0.27.
Why do you have Mesos 0.27 btw?

cheers,
Tamas

Thodoris Zois  ezt írta (időpont: 2018. máj. 31., Cs,
1:56):

> Hello,
>
> I need Mesos 0.27 for specific purposes and unfortunately I can’t use a
> newer version. Did you find anything? Could it be Spark 1.6?
>
> Except that, from which version Spark supports dynamic allocation on
> Mesos?
>
> - Thodoris
>
> On 25 May 2018, at 16:06, Jacek Laskowski  wrote:
>
> Hi,
>
> Mesos 0.27.0?! That's been a while. I'd search for the changes to pom.xml
> and see when the mesos dependency version changed. That'd give you the most
> precise answer. I think it could've been 1.5 or older.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>
> On Fri, May 25, 2018 at 1:29 PM, Thodoris Zois  wrote:
>
>> Hello,
>>
>> Could you please tell me which version of Spark works with Apache Mesos
>>  version 0.27.0? (I cannot find anything on docs at github)
>>
>> Thank you very much,
>> Thodoris Zois
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>