Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Ilan Filonenko
+1 (non-binding)

On Wed, Nov 15, 2023 at 12:57 PM Xiao Li  wrote:

> +1
>
> bo yang  于2023年11月15日周三 05:55写道:
>
>> +1
>>
>> On Tue, Nov 14, 2023 at 7:18 PM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>> On Tue, Nov 14, 2023 at 10:45 AM Holden Karau 
>>> wrote:
>>>
 +1

 On Tue, Nov 14, 2023 at 10:21 AM DB Tsai  wrote:

> +1
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov <
> vakaris.bashki...@gmail.com> wrote:
>
> +1 (non-binding)
>
>
> On Tue, Nov 14, 2023 at 8:03 PM Chao Sun  wrote:
>
>> +1
>>
>> On Tue, Nov 14, 2023 at 9:52 AM L. C. Hsieh  wrote:
>> >
>> > +1
>> >
>> > On Tue, Nov 14, 2023 at 9:46 AM Ye Zhou 
>> wrote:
>> > >
>> > > +1(Non-binding)
>> > >
>> > > On Tue, Nov 14, 2023 at 9:42 AM L. C. Hsieh 
>> wrote:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> I’d like to start a vote for SPIP: An Official Kubernetes
>> Operator for
>> > >> Apache Spark.
>> > >>
>> > >> The proposal is to develop an official Java-based Kubernetes
>> operator
>> > >> for Apache Spark to automate the deployment and simplify the
>> lifecycle
>> > >> management and orchestration of Spark applications and Spark
>> clusters
>> > >> on k8s at prod scale.
>> > >>
>> > >> This aims to reduce the learning curve and operation overhead for
>> > >> Spark users so they can concentrate on core Spark logic.
>> > >>
>> > >> Please also refer to:
>> > >>
>> > >>- Discussion thread:
>> > >> https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz
>> > >>- JIRA ticket:
>> https://issues.apache.org/jira/browse/SPARK-45923
>> > >>- SPIP doc:
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>> > >>
>> > >>
>> > >> Please vote on the SPIP for the next 72 hours:
>> > >>
>> > >> [ ] +1: Accept the proposal as an official SPIP
>> > >> [ ] +0
>> > >> [ ] -1: I don’t think this is a good idea because …
>> > >>
>> > >>
>> > >> Thank you!
>> > >>
>> > >> Liang-Chi Hsieh
>> > >>
>> > >>
>> -
>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>
>> > >
>> > >
>> > > --
>> > >
>> > > Zhou, Ye  周晔
>> >
>> >
>> -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread Ilan Filonenko
+1

On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department as
 Zhou at Apple.

 I support this proposal, provided that we witness community adoption
 following the release of the Flink Kubernetes operator, streamlining Flink
 deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential for
 our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang  wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility of
 developing a Java-based Kubernetes operator for Apache Spark. Following the
 operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 ),
 Spark users may manage applications and related components seamlessly using
 native tools like kubectl. The primary goal is to simplify the Spark user
 experience on Kubernetes, minimizing the learning curve and operational
 complexities and therefore enable users to focus on the Spark application
 development.
 Although there are several open-source Spark on Kubernetes operators
 available, none of them are officially integrated into the Apache Spark
 project. As a result, these operators may lack active support and
 development for new features. Within this proposal, our aim is to introduce
 a Java-based Spark operator as an integral component of the Apache Spark
 project. This solution has been employed internally at Apple for multiple
 years, operating millions of executors in real production environments. The
 use of Java in this solution is intended to accommodate a wider user and
 contributor audience, especially those who are familiar with Scala.
 Ideally, this operator should have its dedicated repository, similar to
 Spark Connect Golang or Spark Docker, allowing it to maintain a loose
 connection with the Spark release cycle. This model is also followed by the
 Apache Flink Kubernetes operator.
 We believe that this project holds the potential to evolve into a
 thriving community project over the long run. A comparison can be drawn
 with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
 Kubernetes operator, making it a part of the Apache Flink project (
 https://github.com/apache/flink-kubernetes-operator
 ).
 This move has gained wide industry adoption and contributions from the
 community. In a mere year, the Flink operator has garnered more than 600
 stars and has attracted contributions from over 80 contributors. This
 showcases the level of community interest and collaborative momentum that
 can be achieved in similar scenarios.
 More details can be found at SPIP doc : Spark Kubernetes Operator
 https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
 

 Thanks,
 --
 *Zhou 

Re: Spark Issue with Istio in Distributed Mode

2022-09-03 Thread Ilan Filonenko
Must be set in envoy (maybe could passthrough via istio)
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/protocol.proto#envoy-v3-api-field-config-core-v3-httpprotocoloptions-idle-timeout


On Sat, Sep 3, 2022 at 4:23 AM Deepak Sharma  wrote:

> Thank for the reply IIan .
> Can we set this in spark conf or does it need to goto istio / envoy conf?
>
>
>
> On Sat, 3 Sept 2022 at 10:28, Ilan Filonenko  wrote:
>
>> This might be a result of the idle_timeout that is configured in envoy.
>> The default is an hour.
>>
>> On Sat, Sep 3, 2022 at 12:17 AM Deepak Sharma 
>> wrote:
>>
>>> Hi All,
>>> In 1 of our cluster , we enabled Istio where spark is running in
>>> distributed mode.
>>> Spark works fine when we run it with Istio in standalone mode.
>>> In spark distributed mode , we are seeing that every 1 hour or so the
>>> workers are getting disassociated from master and then master is not able
>>> to spawn any jobs on these workers , until we restart spark rest server.
>>>
>>> Here is the error we see in the worker logs:
>>>
>>>
>>> *ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to :
>>> Driver spark-rest-service:44463 disassociated! Shutting down.*
>>>
>>> For 1 hour or so (until this issue happens) , spark distributed mode
>>> works just fine.
>>>
>>>
>>> Thanks
>>> Deepak
>>>
>>


Re: Spark Issue with Istio in Distributed Mode

2022-09-02 Thread Ilan Filonenko
This might be a result of the idle_timeout that is configured in envoy. The
default is an hour.

On Sat, Sep 3, 2022 at 12:17 AM Deepak Sharma  wrote:

> Hi All,
> In 1 of our cluster , we enabled Istio where spark is running in
> distributed mode.
> Spark works fine when we run it with Istio in standalone mode.
> In spark distributed mode , we are seeing that every 1 hour or so the
> workers are getting disassociated from master and then master is not able
> to spawn any jobs on these workers , until we restart spark rest server.
>
> Here is the error we see in the worker logs:
>
>
> *ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver
> spark-rest-service:44463 disassociated! Shutting down.*
>
> For 1 hour or so (until this issue happens) , spark distributed mode works
> just fine.
>
>
> Thanks
> Deepak
>


Re: Thoughts on Spark 3 release, or a preview release

2019-09-13 Thread Ilan Filonenko
+1 for preview release

On Fri, Sep 13, 2019 at 9:58 AM Thomas Graves  wrote:

> +1, I think having preview release would be great.
>
> Tom
>
> On Fri, Sep 13, 2019 at 4:55 AM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> +1 as a contributor and as a user. Given the amount of testing required
>> for all the new cool stuff like java 11 support, major
>> refactorings/deprecations etc, a preview version would help a lot the
>> community making adoption smoother long term. I would also add to the list
>> of issues, Scala 2.13 support (
>> https://issues.apache.org/jira/browse/SPARK-25075) assuming things will
>> move forward faster the next few months.
>>
>> On Fri, Sep 13, 2019 at 11:08 AM Driesprong, Fokko 
>> wrote:
>>
>>> Michael Heuer, that's an interesting issue.
>>>
>>> 1.8.2 to 1.9.0 is almost binary compatible (94%):
>>> http://people.apache.org/~busbey/avro/1.9.0-RC4/1.8.2_to_1.9.0RC4_compat_report.html.
>>> Most of the stuff is removing the Jackson and Netty API from Avro's public
>>> API and deprecating the Joda library. I would strongly advise moving to
>>> 1.9.1 since there are some regression issues, for Java most important:
>>> https://jira.apache.org/jira/browse/AVRO-2400
>>>
>>> I'd love to dive into the issue that you describe and I'm curious if the
>>> issue is still there with Avro 1.9.1. I'm a bit busy at the moment but
>>> might have some time this weekend to dive into it.
>>>
>>> Cheers, Fokko Driesprong
>>>
>>>
>>> Op vr 13 sep. 2019 om 02:32 schreef Reynold Xin :
>>>
 +1! Long due for a preview release.


 On Thu, Sep 12, 2019 at 5:26 PM, Holden Karau 
 wrote:

> I like the idea from the PoV of giving folks something to start
> testing against and exploring so they can raise issues with us earlier in
> the process and we have more time to make calls around this.
>
> On Thu, Sep 12, 2019 at 4:15 PM John Zhuge  wrote:
>
> +1  Like the idea as a user and a DSv2 contributor.
>
> On Thu, Sep 12, 2019 at 4:10 PM Jungtaek Lim 
> wrote:
>
> +1 (as a contributor) from me to have preview release on Spark 3 as it
> would help to test the feature. When to cut preview release is
> questionable, as major works are ideally to be done before that - if we 
> are
> intended to introduce new features before official release, that should
> work regardless of this, but if we are intended to have opportunity to 
> test
> earlier, ideally it should.
>
> As a one of contributors in structured streaming area, I'd like to add
> some items for Spark 3.0, both "must be done" and "better to have". For
> "better to have", I pick some items for new features which committers
> reviewed couple of rounds and dropped off without soft-reject (No valid
> reason to stop). For Spark 2.4 users, only added feature for structured
> streaming is Kafka delegation token. (given we assume revising Kafka
> consumer pool as improvement) I hope we provide some gifts for structured
> streaming users in Spark 3.0 envelope.
>
> > must be done
> * SPARK-26154 Stream-stream joins - left outer join gives inconsistent
> output
> It's a correctness issue with multiple users reported, being reported
> at Nov. 2018. There's a way to reproduce it consistently, and we have a
> patch submitted at Jan. 2019 to fix it.
>
> > better to have
> * SPARK-23539 Add support for Kafka headers in Structured Streaming
> * SPARK-26848 Introduce new option to Kafka source - specify timestamp
> to start and end offset
> * SPARK-20568 Delete files after processing in structured streaming
>
> There're some more new features/improvements items in SS, but given
> we're talking about ramping-down, above list might be realistic one.
>
>
>
> On Thu, Sep 12, 2019 at 9:53 AM Jean Georges Perrin 
> wrote:
>
> As a user/non committer, +1
>
> I love the idea of an early 3.0.0 so we can test current dev against
> it, I know the final 3.x will probably need another round of testing when
> it gets out, but less for sure... I know I could checkout and compile, but
> having a “packaged” preversion is great if it does not take too much time
> to the team...
>
> jg
>
>
> On Sep 11, 2019, at 20:40, Hyukjin Kwon  wrote:
>
> +1 from me too but I would like to know what other people think too.
>
> 2019년 9월 12일 (목) 오전 9:07, Dongjoon Hyun 님이
> 작성:
>
> Thank you, Sean.
>
> I'm also +1 for the following three.
>
> 1. Start to ramp down (by the official branch-3.0 cut)
> 2. Apache Spark 3.0.0-preview in 2019
> 3. Apache Spark 3.0.0 in early 2020
>
> For JDK11 clean-up, it will meet the timeline and `3.0.0-preview`
> helps it a lot.
>
> After this discussion, can we have some timeline for `Spark 3.0
> Release Window` in 

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Ilan Filonenko
+1 (non-binding). This API is versatile and flexible enough to handle
Bloomberg's internal use-cases. The ability for us to vary implementation
strategies is quite appealing. It is also worth to note the minimal changes
to Spark core in order to make it work. This is a very much needed addition
within the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:

> +1 This is great work, allowing plugin of different sort shuffle
> write/read implementation! Also great to see it retain the current Spark
> configuration
> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>
>
> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>
>> Hi everyone,
>>
>>
>>
>> I would like to call a vote for the SPIP for SPARK-25299
>> , which proposes to
>> introduce a pluggable storage API for temporary shuffle data.
>>
>>
>>
>> You may find the SPIP document here
>> 
>> .
>>
>>
>>
>> The discussion thread for the SPIP was conducted here
>> 
>> .
>>
>>
>>
>> Please vote on whether or not this proposal is agreeable to you.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -Matt Cheah
>>
>


Re: Spark-optimized Shuffle (SOS) any update?

2018-12-19 Thread Ilan Filonenko
Recently, the community has actively been working on this. The JIRA to
follow is:
https://issues.apache.org/jira/browse/SPARK-25299. A group of various
companies including Bloomberg and Palantir are in the works of a WIP
solution that implements a varied version of Option #5 (which is elaborated
upon in the google doc linked in the JIRA summary).

On Wed, Dec 19, 2018 at 5:20 AM  wrote:

> Hi everyone,
> we are facing same problems as Facebook had, where shuffle service is
> a bottleneck. For now we solved that with large task size (2g) to reduce
> shuffle I/O.
>
> I saw very nice presentation from Brian Cho on Optimizing shuffle I/O at
> large scale[1]. It is a implementation of white paper[2].
> Brian Cho at the end of the lecture kindly mentioned about plans to
> contribute it back to Spark[3]. I checked mailing list and spark JIRA and
> didn't find any ticket on this topic.
>
> Please, does anyone has a contact on someone from Facebook who could know
> more about this? Or are there some plans to bring similar optimization to
> Spark?
>
> [1] https://databricks.com/session/sos-optimizing-shuffle-i-o
> [2] https://haoyuzhang.org/publications/riffle-eurosys18.pdf
> [3]
> https://image.slidesharecdn.com/5brianchoerginseyfe-180613004126/95/sos-optimizing-shuffle-io-with-brian-cho-and-ergin-seyfe-30-638.jpg?cb=1528850545
>


Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Ilan Filonenko
+1 (non-binding) in reference to all k8s tests for 2.11 (including SparkR
Tests with R version being 3.4.1)

*[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.11 ---*
*Discovery starting.*
*Discovery completed in 202 milliseconds.*
*Run starting. Expected test count is: 15*
*KubernetesSuite:*
*- Run SparkPi with no resources*
*- Run SparkPi with a very long application name.*
*- Use SparkLauncher.NO_RESOURCE*
*- Run SparkPi with a master URL without a scheme.*
*- Run SparkPi with an argument.*
*- Run SparkPi with custom labels, annotations, and environment variables.*
*- Run extraJVMOptions check on driver*
*- Run SparkRemoteFileTest using a remote data file*
*- Run SparkPi with env and mount secrets.*
*- Run PySpark on simple pi.py example*
*- Run PySpark with Python2 to test a pyfiles example*
*- Run PySpark with Python3 to test a pyfiles example*
*- Run PySpark with memory customization*
*- Run SparkR on simple dataframe.R example*
*- Run in client mode.*
*Run completed in 6 minutes, 47 seconds.*
*Total number of tests run: 15*
*Suites: completed 2, aborted 0*
*Tests: succeeded 15, failed 0, canceled 0, ignored 0, pending 0*
*All tests passed.*

Sean, in reference to your issues, the comment you linked is correct in
that you would need to build a Kubernetes distribution:
i.e.
*dev/make-distribution.sh --pip --r --tgz -Psparkr -Phadoop-2.7
-Pkubernetes*setup minikube
i.e. *minikube start --insecure-registry=localhost:5000 --cpus 6 --memory
6000*
and then run appropriate tests:
i.e. *dev/dev-run-integration-tests.sh --spark-tgz
.../spark-2.4.0-bin-2.7.3.tgz*

The newest PR that you linked allows us to point to the local Kubernetes
cluster deployed via docker-for-mac as opposed to minikube which gives us
another way to test, but does not change the workflow of testing AFAICT.

On Tue, Oct 23, 2018 at 9:14 AM Sean Owen  wrote:

> (I should add, I only observed this with the Scala 2.12 build. It all
> seemed to work with 2.11. Therefore I'm not too worried about it. I
> don't think it's a Scala version issue, but perhaps something looking
> for a spark 2.11 tarball and not finding it. See
> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
> a change that might address this kind of thing.)
>
> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen  wrote:
> >
> > Yeah, that's maybe the issue here. This is a source release, not a git
> checkout, and it still needs to work in this context.
> >
> > I just added -Pkubernetes to my build and didn't do anything else. I
> think the ideal is that a "mvn -P... -P... install" to work from a source
> release; that's a good expectation and consistent with docs.
> >
> > Maybe these tests simply don't need to run with the normal suite of
> tests, and can be considered tests run manually by developers running these
> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
> >
> > I don't think this has to block the release even if so, just trying to
> get to the bottom of it.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Ilan Filonenko
On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
the next RC be? I would like to propose the inclusion of the Kerberos
feature sooner rather than later as it would increase Spark-on-K8S adoption
in production workloads while bringing greater feature parity with Yarn and
Mesos. I would like to note that the feature itself is isolated from Core
and isolated via the step-based architecture of the Kubernetes
Driver/Executor builders.

Furthermore, Spark users traditionally use HDFS for storage and in
production use-cases these HDFS clusters would be kerberized. At Bloomberg,
for example, all of the HDFS clusters are kerberized and for this reason,
the only thing stopping our internal Data Science Platform from adopting
Spark-on-K8S is this feature.

On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson  wrote:

>
> SPARK-23257 merged more recently than I realized. If that isn't on
> branch-2.4 then the first question is how soon on the release sequence that
> can be adopted
>
> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin  wrote:
>
>> We shouldn’t merge new features into release branches anymore.
>>
>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse  wrote:
>>
>>> Right now the Kerberos support for Spark on K8S is only on master AFAICT
>>> i.e. the feature is not present on branch-2.4
>>>
>>>
>>>
>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>
>>>
>>>
>>> Rob
>>>
>>>
>>>
>>> *From: *Erik Erlandson 
>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>> *To: *dev 
>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
>>> Spark 2.4
>>>
>>>
>>>
>>> I'd like to propose including integration testing for Kerberos on the
>>> Spark 2.4 release:
>>>
>>> https://github.com/apache/spark/pull/22608
>>>
>>>
>>>
>>> Arguments in favor:
>>>
>>> 1) it improves testing coverage on a feature important for integrating
>>> with HDFS deployments
>>>
>>> 2) its intersection with existing code is small - it consists primarily
>>> of new testing code, with a bit of refactoring into 'main' and 'test'
>>> sub-trees. These new tests appear stable.
>>>
>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>
>>>
>>>
>>> The argument 'against' that I'm aware of would be the relatively large
>>> size of the PR. I believe this is considered above, but am soliciting
>>> community feedback before committing.
>>>
>>> Cheers,
>>>
>>> Erik
>>>
>>>
>>>
>>


Re: Python kubernetes spark 2.4 branch

2018-09-25 Thread Ilan Filonenko
Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736
?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li  wrote:

> Can you give more details on how you ran your app, did you build your own
> image, and which image are you using?
>
> On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia -
> IN/Bangalore)  wrote:
>
>> Hi,
>>
>> I am trying to run spark python testcases on k8s based on tag
>> spark-2.4-rc1. When the dependent files are passed through the --py-files
>> option, they are not getting resolved by the main python script. Please let
>> me know, is this a known issue?
>>
>>
>>
>> Regards
>>
>> Surya
>>
>>
>>
>


Re: no logging in pyspark code?

2018-08-27 Thread Ilan Filonenko
A JIRA has been opened up on this exact topic: SPARK-25236
, a few days ago, after
seeing another case of print(_, file=sys.stderr) in a most recent review. I
agree that we should include logging for PySpark workers.

On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid 
wrote:

> Another question on pyspark code -- how come there is no logging at all?
> does python logging have an unreasonable overhead, or its impossible to
> configure or something?
>
> I'm really surprised nobody has ever wanted to me able to turn on some
> debug or trace logging in pyspark by just configuring a logging level.
>
> For me, I wanted this during debugging while developing -- I'd work on
> some part of the code and drop in a bunch of print statements.  Then I'd
> rip those out when I think I'm ready to submit a patch.  But then I realize
> I forgot some case, then more debugging -- oh gotta add those print
> statements in again ...
>
> does somebody jsut need to setup the configuration properly, or is there a
> bigger reason to avoid logging in python?
>
> thanks,
> Imran
>


Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-16 Thread Ilan Filonenko
Okay, if there is a consensus in merging the SparkR feature and its
respective tests separately, I will split up the current PR to allow for
the feature to be merged while the e2e tests to be used for locally testing
until we are ready to merge (similar to PySpark). Will give it another day
to hear other opinions, but otherwise working under the impression that the
OS upgrade will not happen before the 2.4 cut.



On Thu, Aug 16, 2018 at 12:53 PM, shane knapp  wrote:

> On Thu, Aug 16, 2018 at 9:49 AM, Erik Erlandson 
> wrote:
>
>> IMO sparkR support makes sense to merge for 2.4, as long as the release
>> wranglers agree that local integration testing is sufficiently convincing.
>>
>
> i agree w/this.  testing for this stuff specifically will happen within a
> couple of weeks after the 2.4 cut.
>
>
>> Part of the intent here is to allow this to happen without Shane having
>> to reorganize his complex upgrade schedule and make it even more
>> complicated.
>>
>> this.  exactly.  :)
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Ilan Filonenko
Correct, the OS change and updates would require more testing, from what
Shane has told me, and could potentially surface some issue that could
delay a major release.

So yes, the release manager would need to run the tests manually and after
the release we would switch to a fully integrated Jenkins that would run
those same tests on the newly updated workers.

On Wed, Aug 15, 2018 at 3:47 PM, Reynold Xin  wrote:

> Personally I'd love for R support to be in 2.4, but I don't consider
> something "Done" unless tests are running ... Is the proposal: the release
> manager manually run the R tests when preparing the release, and switch
> over to fully integrated Jenkins after 2.4.0 is released?
>
> On Wed, Aug 15, 2018 at 2:45 PM Reynold Xin  wrote:
>
>> What's the reason we don't want to do the OS updates right now? Is it due
>> to the unpredictability of potential issues that might happen and end up
>> delaying 2.4 release?
>>
>>
>> On Wed, Aug 15, 2018 at 2:33 PM Erik Erlandson 
>> wrote:
>>
>>> The SparkR support PR is finished, along with integration testing,
>>> however Shane has requested that the integration testing not be enabled
>>> until after the 2.4 release because it requires the OS updates he wants to
>>> test *after* the release.
>>>
>>> The integration testing can be run locally, and so the question at hand
>>> is: would the PMC be willing to consider inclusion of the SparkR for 2.4,
>>> based on local verification of the testing? The PySpark PR was merged under
>>> similar circumstances: the testing was verified locally and the PR was
>>> merged before the testing was enabled for jenkins.
>>>
>>> Cheers,
>>> Erik
>>>
>>


Re: [DISCUSS] SparkR support on k8s back-end for Spark 2.4

2018-08-15 Thread Ilan Filonenko
The SparkR support PR includes integration testing that can be tested on a
local Minikube instance by merely running the distribution with appropriate
flags (--r) and running the integration-tests similarly to as you would on
any k8s test. Maybe some others could locally test this, if there is any
hesitation. I, for one, don't see a problem with merging as is and having
the Jenkins build be configured for R distribution support via the Ubuntu
update when Shane gets to it, similar to as we did for PySpark. The PR can
be found here <https://github.com/apache/spark/pull/21584>.

Furthermore, I am wondering if there is any desire to ensure SparkR support
(for k8s) is merged by 2.4 by the community, and if so, then maybe merging
this without the Jenkins build seems even more appropriate.

Best,
Ilan Filonenko

On Wed, Aug 15, 2018 at 3:33 PM, Erik Erlandson  wrote:

> The SparkR support PR is finished, along with integration testing, however
> Shane has requested that the integration testing not be enabled until after
> the 2.4 release because it requires the OS updates he wants to test *after*
> the release.
>
> The integration testing can be run locally, and so the question at hand
> is: would the PMC be willing to consider inclusion of the SparkR for 2.4,
> based on local verification of the testing? The PySpark PR was merged under
> similar circumstances: the testing was verified locally and the PR was
> merged before the testing was enabled for jenkins.
>
> Cheers,
> Erik
>


Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-05-13 Thread Ilan Filonenko
Hi dev,

I recently updated an on-going PR [
https://github.com/apache/spark/pull/21092] that was updated with a merge
that included a lot of commits from master and I got the following error:

*continuous-integration/appveyor/pr *— AppVeyor build failed

due to:

*Build execution time has reached the maximum allowed time for your plan
(90 minutes).*

seen here:
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/2300-master

As this is the first time I am seeing this, I am wondering if this is in
relation to a large merge and if it is, I am wondering if the timeout can
be increased.

Thanks!

Best,
Ilan Filonenko


Re: Welcoming some new committers

2018-03-02 Thread Ilan Filonenko
Congrats to everyone! :)

On Fri, Mar 2, 2018 at 7:34 PM Felix Cheung 
wrote:

> Congrats and welcome!
>
> --
> *From:* Dongjoon Hyun 
> *Sent:* Friday, March 2, 2018 4:27:10 PM
> *To:* Spark dev list
> *Subject:* Re: Welcoming some new committers
>
> Congrats to all!
>
> Bests,
> Dongjoon.
>
> On Fri, Mar 2, 2018 at 4:13 PM, Wenchen Fan  wrote:
>
>> Congratulations to everyone and welcome!
>>
>> On Sat, Mar 3, 2018 at 7:26 AM, Cody Koeninger 
>> wrote:
>>
>>> Congrats to the new committers, and I appreciate the vote of confidence.
>>>
>>> On Fri, Mar 2, 2018 at 4:41 PM, Matei Zaharia 
>>> wrote:
>>> > Hi everyone,
>>> >
>>> > The Spark PMC has recently voted to add several new committers to the
>>> project, based on their contributions to Spark 2.3 and other past work:
>>> >
>>> > - Anirudh Ramanathan (contributor to Kubernetes support)
>>> > - Bryan Cutler (contributor to PySpark and Arrow support)
>>> > - Cody Koeninger (contributor to streaming and Kafka support)
>>> > - Erik Erlandson (contributor to Kubernetes support)
>>> > - Matt Cheah (contributor to Kubernetes support and other parts of
>>> Spark)
>>> > - Seth Hendrickson (contributor to MLlib and PySpark)
>>> >
>>> > Please join me in welcoming Anirudh, Bryan, Cody, Erik, Matt and Seth
>>> as committers!
>>> >
>>> > Matei
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>