Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Saisai Shao
Hi Ben and Felix, I'm also interested in this. Would you please add me to the invite, thanks a lot. Best regards, Saisai Greg Lee 于2019年12月2日周一 下午11:34写道: > Hi Felix & Ben, > > This is Li Hao from Baidu, same team with Linhong. > > As mentioned in Linhong’s email, independent disaggregated

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Saisai Shao
Congratulations! Jungtaek Lim 于2019年9月9日周一 下午6:11写道: > Congratulations! Well deserved! > > On Tue, Sep 10, 2019 at 9:51 AM John Zhuge wrote: > >> Congratulations! >> >> On Mon, Sep 9, 2019 at 5:45 PM Shane Knapp wrote: >> >>> congrats everyone! :) >>> >>> On Mon, Sep 9, 2019 at 5:32 PM Matei

Re: Release Spark 2.3.4

2019-08-18 Thread Saisai Shao
+1 Wenchen Fan 于2019年8月19日周一 上午10:28写道: > +1 > > On Sat, Aug 17, 2019 at 3:37 PM Hyukjin Kwon wrote: > >> +1 too >> >> 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성: >> >>> +1 >>> >>> Regards, >>> Dilip Biswal >>> Tel: 408-463-4980 >>> dbis...@us.ibm.com >>> >>> >>> >>> - Original message

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-16 Thread Saisai Shao
+1 (binding) Thanks Saisai Imran Rashid 于2019年6月15日周六 上午3:46写道: > +1 (binding) > > I think this is a really important feature for spark. > > First, there is already a lot of interest in alternative shuffle storage > in the community. There is already a lot of interest in alternative > shuffle

Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-13 Thread Saisai Shao
I think maybe we could start a vote on this SPIP. This has been discussed for a while, and the current doc is pretty complete as for now. Also we saw lots of demands in the community about building their own shuffle storage. Thanks Saisai Imran Rashid 于2019年6月11日周二 上午3:27写道: > I would be

Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-10 Thread Saisai Shao
I'm currently working with MemVerge on the Splash project (one implementation of remote shuffle storage) and followed this ticket for a while. I would like to be a shepherd if no one else volunteered to be. Best regards, Saisai Matt Cheah 于2019年6月6日周四 上午8:33写道: > Hi everyone, > > > > I wanted

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-06 Thread Saisai Shao
Do we have other block/critical issues for Spark 2.4.1 or waiting something to be fixed? I roughly searched the JIRA, seems there's no block/critical issues marked for 2.4.1. Thanks Saisai shane knapp 于2019年3月7日周四 上午4:57写道: > i'll be popping in to the sig-big-data meeting on the 20th to talk

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-05 Thread Saisai Shao
Hi DB, I saw that we already have 6 RCs, but the vote I can search by now was RC2, were they all canceled? Thanks Saisai DB Tsai 于2019年2月22日周五 上午4:51写道: > I am cutting a new rc4 with fix from Felix. Thanks. > > Sincerely, > > DB Tsai >

Re: Apache Spark 2.2.3 ?

2019-01-01 Thread Saisai Shao
Agreed to have a new branch-2.3 release, as we already accumulated several fixes. Thanks Saisai Xiao Li 于2019年1月2日周三 下午1:32写道: > Based on the commit history, > https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3 > contains more critical fixes. Maybe the priority

Re: What's a blocker?

2018-10-24 Thread Saisai Shao
Just my two cents of the past experience. As a release manager of Spark 2.3.2, I felt significantly delay during the release by block issues. Vote was failed several times by one or two "block issue". I think during the RC time, each "block issue" should be carefully evaluated by the related PMCs

Re: welcome a new batch of committers

2018-10-07 Thread Saisai Shao
Congratulations to all! Jacek Laskowski 于2018年10月7日周日 上午1:12写道: > Wow! That's a nice bunch of contributors. Congrats to all new committers. > I've had tough times to follow all the contributions, but with this crew > it's gonna be nearly impossible. > > Pozdrawiam, > Jacek Laskowski > >

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Saisai Shao
I like this proposal. Since Kafka already provides delegation token mechanism, we can also leverage Spark's delegation token framework to add Kafka as a built-in support. BTW I think there's no much difference in support structured streaming and DStream, maybe we can set both as goal. Thanks

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-27 Thread Saisai Shao
Only "without-hadoop" profile has 2.12 binary, is it expected? Thanks Saisai Wenchen Fan 于2018年9月28日周五 上午11:08写道: > I'm adding my own +1, since all the problems mentioned in the RC1 voting > email are all resolved. And there is no blocker issue for 2.4.0 AFAIK. > > On Fri, Sep 28, 2018 at

[ANNOUNCE] Announcing Apache Spark 2.3.2

2018-09-26 Thread Saisai Shao
We are happy to announce the availability of Spark 2.3.2! Apache Spark 2.3.2 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.2, head over to the download page:

[VOTE][RESULT] Spark 2.3.2 (RC6)

2018-09-23 Thread Saisai Shao
The vote passes. Thanks to all who helped with the release! I'll follow up later with a release announcement once everything is published. +1 (* = binding): Sea Owen* Wenchen Fan* Saisai Shao Dongjoon Hyun Takeshi Yamamuro John Zhuge Xiao Li* Denny Lee Ryan Blue Michael Heuer +0: None -1

***UNCHECKED*** Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-19 Thread Saisai Shao
: >> false) *** FAILED *** >> >> Thank you, Saisai. >> >> Bests, >> Dongjoon. >> >> On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao >> wrote: >> >>> +1 from my own side. >>> >>> Thanks >>> Saisai >>>

***UNCHECKED*** Re: Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-19 Thread Saisai Shao
** FAILED *** >> >> Thank you, Saisai. >> >> Bests, >> Dongjoon. >> >> On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao >> wrote: >> >>> +1 from my own side. >>> >>> Thanks >>> Saisai >>> >>> W

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Saisai Shao
ld from source with most profiles passed for me. >> On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao >> wrote: >> > >> > Please vote on releasing the following candidate as Apache Spark >> version 2.3.2. >> > >> > The vote is open until September 21 PST a

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Saisai Shao
Hi Wenchen, I think you need to set SPHINXPYTHON to python3 before building the docs, to workaround the doc issue ( https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression ). Here is the notes for release page:

[VOTE] SPARK 2.3.2 (RC6)

2018-09-17 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until September 21 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ...

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-09-06 Thread Saisai Shao
Hi, PMC members asked me to hold a bit while they're dealing with some other things. Please wait for a bit while. Thanks Saisai zzc <441586...@qq.com> 于2018年9月6日周四 下午4:27写道: > Hi Saisai: > Spark 2.4 was cut, and is there any new process on 2.3.2? > > > > -- > Sent from:

Re: Persisting driver logs in yarn client mode (SPARK-25118)

2018-08-21 Thread Saisai Shao
One issue I can think of is that this "moving the driver log" in the application end is quite time-consuming, which will significantly delay the shutdown. We already suffered such "rename" problem for event log on object store, the moving of driver log will make the problem severe. For a vanilla

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Saisai Shao
There's still another one SPARK-25114. I will wait for several days in case some other blocks jumped. Thanks Saisai Wenchen Fan 于2018年8月15日周三 上午10:19写道: > SPARK-25051 is resolved, can we start a new RC? > > SPARK-16406 is an improvement, generally we should not backport. > > On Wed, Aug 15,

[VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until August 20 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-08-06 Thread Saisai Shao
Yes, there'll be an RC4, still waiting for the fix of one issue. Yuval Itzchakov 于2018年8月6日周一 下午6:10写道: > Are there any plans to create an RC4? There's an important Kafka Source > leak > fix I've merged back to the 2.3 branch. > > > > -- > Sent from:

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-29 Thread Saisai Shao
83b2bc8e> >> and SPARK-24677 >> <https://github.com/apache/spark/commit/7be70e29dd92de36dbb30ce39623d588f48e4cac>, >> if anyone disagrees we could back those out but I think they would be good >> to include. >> >> Tom >> >> On Thursday, July 19, 2018, 8:13:23 PM CDT, Saisa

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
R is to get rid of AnalysisBarrier. That is better than multiple > patches we added for AnalysisBarrier after 2.3.0 release. We can target it > to 2.4. > > Thanks, > > Xiao > > 2018-07-19 17:48 GMT-07:00 Saisai Shao : > >> I see, thanks Reynold. >> >> Rey

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
hink my ticket should block this release. It's a big general >> refactoring. >> >> Xiao do you have a ticket for the bug you found? >> >> >> On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao >> wrote: >> >>> Hi Xiao, >>> >>> Are you referri

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
Hi Xiao, Are you referring to this JIRA ( https://issues.apache.org/jira/browse/SPARK-24865)? Xiao Li 于2018年7月20日周五 上午2:41写道: > dfWithUDF.cache() > dfWithUDF.write.saveAsTable("t") > dfWithUDF.write.saveAsTable("t1") > > > Cached data is not being used. It causes a big performance regression.

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-16 Thread Saisai Shao
t; should not block release. > > On Sun, Jul 15, 2018 at 9:39 PM Saisai Shao > wrote: > >> Hi Sean, >> >> I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the errors >> you pasted here. I'm not sure how it happens. >> >> S

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-15 Thread Saisai Shao
path.) > > [error] A full rebuild may help if 'MetricsSystem.class' was compiled > against an incompatible version of org.eclipse > > On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >

[VOTE] SPARK 2.3.2 (RC3)

2018-07-15 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until July 20 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-11 Thread Saisai Shao
ocs are usable or not in > this RC. They looked reasonable to me but I don't know enough to know what > the issue was. If the result is usable, then there's no problem here, even > if something could be fixed/improved later. > > On Sun, Jul 8, 2018 at 7:25 PM Saisai Shao wrote: > &g

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
ache/spark/pull/21659, and fix the release doc too. > > > 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성: > >> Hi Sean, >> >> SPARK-24530 is not included in this RC1 release. Actually I'm so familiar >> with this issue so still using python2 to generate docs. >&g

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
> >> >> Otherwise nothing is open for 2.3.2, sigs and license look good, tests >> pass as last time, etc. >> >> +1 >> >> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao >> wrote: >> >>> Please vote on releasing the following candidate as A

[VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until July 11th PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To

Re: Time for 2.3.2?

2018-07-03 Thread Saisai Shao
FYI, currently we have one block issue ( https://issues.apache.org/jira/browse/SPARK-24535), will start the release after this is fixed. Also please let me know if there're other blocks or fixes want to land to 2.3.2 release. Thanks Saisai Saisai Shao 于2018年7月2日周一 下午1:16写道: > I will st

Re: Time for 2.3.2?

2018-07-01 Thread Saisai Shao
t; +1, I heard some Spark users have skipped v2.3.1 because of these bugs. >>>> >>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Wenchen Fan 于2018年6月28日 周四下午2:06写道:

Re: Time for 2.3.2?

2018-06-27 Thread Saisai Shao
+1, like mentioned by Marcelo, these issues seems quite severe. I can work on the release if short of hands :). Thanks Jerry Marcelo Vanzin 于2018年6月28日周四 上午11:40写道: > +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes > for those out. > > (Those are what delayed 2.2.2 and

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Saisai Shao
+1, checked new py4j related changes. Marcelo Vanzin 于2018年5月17日周四 上午5:41写道: > This is actually in 2.3, jira is just missing the version. > > https://github.com/apache/spark/pull/20765 > > On Wed, May 16, 2018 at 2:14 PM, kant kodali wrote: > > I am not

Re: Hadoop 3 support

2018-04-02 Thread Saisai Shao
Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think some pom changes should be enough to support Hadoop 3. If we want to use Hadoop 3 shaded client jar, then the

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Saisai Shao
Congrats, Zhenhua! 2018-04-02 16:57 GMT+08:00 Takeshi Yamamuro : > Congrats, Zhenhua! > > On Mon, Apr 2, 2018 at 4:13 PM, Ted Yu wrote: > >> Congratulations, Zhenhua >> >> Original message >> From: 雨中漫步 <601450...@qq.com> >> Date:

Re: Welcoming some new committers

2018-03-02 Thread Saisai Shao
Congrats to everyone! Thanks Jerry 2018-03-03 15:30 GMT+08:00 Liang-Chi Hsieh : > > Congrats to everyone! > > > Kazuaki Ishizaki wrote > > Congratulations to everyone! > > > > Kazuaki Ishizaki > > > > > > > > From: Takeshi Yamamuro > > > linguin.m.s@ > > > > > To:

Re: Does anyone know how to build spark with scala12.4?

2017-11-28 Thread Saisai Shao
experimental and for > people willing to make their own build. That's why I wanted it in good > enough shape that the scala-2.12 profile produces something basically > functional. > > On Tue, Nov 28, 2017 at 8:43 PM Saisai Shao <sai.sai.s...@gmail.com> > wrote: > >

Re: Does anyone know how to build spark with scala12.4?

2017-11-28 Thread Saisai Shao
Hi Sean, Two questions about Scala 2.12 for release artifacts. Are we planning to ship 2.12 artifacts for Spark 2.3 release? If not, will we only ship 2.11 artifacts? Thanks Jerry 2017-11-28 21:51 GMT+08:00 Sean Owen : > The Scala 2.12 profile mostly works, but not all

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-07 Thread Saisai Shao
+1, looking forward to more design details of this feature. Thanks Jerry On Wed, Nov 8, 2017 at 6:40 AM, Shixiong(Ryan) Zhu wrote: > +1 > > On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley > wrote: > >> +1 >> >> On Mon, Nov 6, 2017 at 5:11 PM,

Re: Moving Scala 2.12 forward one step

2017-08-31 Thread Saisai Shao
Hi Sean, Do we have a planned target version for Scala 2.12 support? Several other projects like Zeppelin, Livy which rely on Spark repl also require changes to support this Scala 2.12. Thanks Jerry On Thu, Aug 31, 2017 at 5:55 PM, Sean Owen wrote: > No, this doesn't let

Re: Spark 2.1.x client with 2.2.0 cluster

2017-08-10 Thread Saisai Shao
As I remembered using Spark 2.1 Driver to communicate with Spark 2.2 executors will throw some RPC exceptions (I don't remember the details of exception). On Thu, Aug 10, 2017 at 4:23 PM, Ted Yu wrote: > Hi, > Has anyone used Spark 2.1.x client with Spark 2.2.0 cluster ? >

Re: Spark History Server does not redirect to Yarn aggregated logs for container logs

2017-06-08 Thread Saisai Shao
Yes, currently if log is aggregated, then accessing through UI is not worked, you can create a JIRA to improve this if you would like to. On Thu, Jun 8, 2017 at 1:43 PM, ckhari4u wrote: > Hey Guys, > > I am hitting the below issue when trying to access the STDOUT/STDERR logs

Re: How about the fetch the shuffle data in one same machine?

2017-05-10 Thread Saisai Shao
There is a JIRA about this thing ( https://issues.apache.org/jira/browse/SPARK-6521). In the current Spark shuffle fetch still leverages Netty even two executors are on the same node, but according to the test on the JIRA, the performance is close whether to bypass network or not. From my

Re: Spark 2.0 and Yarn

2016-08-29 Thread Saisai Shao
This archive contains all the jars required by Spark runtime, you could zip all the jars under /jars and upload this archive to HDFS, then configure spark.yarn.archive with the path of this archive on HDFS. On Sun, Aug 28, 2016 at 9:59 PM, Srikanth Sampath wrote: >

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Saisai Shao
Use dominant resource calculator instead of default resource calculator will get the expected vcores as you wanted. Basically by default yarn does not honor cpu cores as resource, so you will always see vcore is 1 no matter what number of cores you set in spark. On Wed, Aug 3, 2016 at 12:11 PM,

Re: Issue with Spark Streaming UI

2016-05-24 Thread Saisai Shao
I think it is by design FileInputDStream doesn't support report info, because FileInputDStream doesn't have event/record concept (it is file based), so it is hard to define how to correctly report the input info. Current input info reporting can be supported for all receiver based InputDStream

Re: combitedTextFile and CombineTextInputFormat

2016-05-19 Thread Saisai Shao
0 partitions by 256MB than RDD with > 250,000+ partition all different sizes from 100KB to 128MB > > So, I see only advantages if sc.textFile() starts using CombineTextInputFormat > instead of TextInputFormat > > Alex > > On Thu, May 19, 2016 at 8:30 PM, Saisai Shao <s

Re: combitedTextFile and CombineTextInputFormat

2016-05-19 Thread Saisai Shao
>From my understanding I think newAPIHadoopFile or hadoopFIle is generic enough for you to support any InputFormat you wanted. IMO it is not so necessary to add a new API for this. On Fri, May 20, 2016 at 12:59 AM, Alexander Pivovarov wrote: > Spark users might not know

Re: HDFS as Shuffle Service

2016-04-26 Thread Saisai Shao
Quite curious about the benefits of using HDFS as shuffle service, also what's the problem of using current shuffle service? Thanks Saisai On Wed, Apr 27, 2016 at 4:31 AM, Timothy Chen wrote: > Are you suggesting to have shuffle service persist and fetch data with > hdfs,

Re: RFC: Remote "HBaseTest" from examples?

2016-04-20 Thread Saisai Shao
+1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector in HBase repo has evolved a lot, it would be better to guide user to refer to that not here in Spark example. So good to remove it. Thanks Saisai On Wed, Apr 20, 2016 at 1:41 AM, Josh Rosen

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Saisai Shao
mment is >>> added at the last without responses from the author? >>> >>> >>> IMHO, If the committers are not sure whether the patch would be useful, >>> then I think they should leave some comments why they are not sure, not >>> just ignorin

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Saisai Shao
It would be better to have a specific technical reason why this PR should be closed, either the implementation is not good or the problem is not valid, or something else. That will actually help the contributor to shape their codes and reopen the PR again. Otherwise reasons like "feel free to

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Saisai Shao
eliminate this. > > > On Fri, Apr 1, 2016, 7:25 PM Saisai Shao <sai.sai.s...@gmail.com> wrote: > >> Hi Michael, shuffle data (mapper output) have to be materialized into >> disk finally, no matter how large memory you have, it is the design purpose >> of Spark. In you scenari

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Saisai Shao
Hi Michael, shuffle data (mapper output) have to be materialized into disk finally, no matter how large memory you have, it is the design purpose of Spark. In you scenario, since you have a big memory, shuffle spill should not happen frequently, most of the disk IO you see might be final shuffle

Re: Dynamic allocation availability on standalone mode. Misleading doc.

2016-03-07 Thread Saisai Shao
Yes, we need to fix the document. On Tue, Mar 8, 2016 at 9:07 AM, Mark Hamstra wrote: > Yes, it works in standalone mode. > > On Mon, Mar 7, 2016 at 4:25 PM, Eugene Morozov > wrote: > >> Hi, the feature looks like the one I'd like to use,

Re: sbt publish-local fails with 2.0.0-SNAPSHOT

2016-02-01 Thread Saisai Shao
I think it is due to our recent changes to override the external resolvers in sbt building profile, I just created a JIRA ( https://issues.apache.org/jira/browse/SPARK-13109) to track this. On Mon, Feb 1, 2016 at 3:01 PM, Mike Hynes <91m...@gmail.com> wrote: > Hi devs, > > I used to be able to

Re: spark with label nodes in yarn

2015-12-15 Thread Saisai Shao
SPARK-6470 only supports node label expression for executors. SPARK-7173 supports node label expression for AM (will be in 1.6). If you want to schedule your whole application through label expression, you have to configure both am and executor label expression. If you only want to schedule

Re: spark with label nodes in yarn

2015-12-15 Thread Saisai Shao
zzq98...@alibaba-inc.com] > *发送时间:* 2015年12月16日 9:21 > *收件人:* 'Ted Yu' > *抄送:* 'Saisai Shao'; 'dev' > *主题:* Re: spark with label nodes in yarn > > > > Oops... > > > > I do use spark 1.5.0 and apache hadoop 2.6.0 (spark 1.4.1 + apache hadoop > 2.6.0 is

Re: tests blocked at "don't call ssc.stop in listener"

2015-11-26 Thread Saisai Shao
Might be related to this JIRA ( https://issues.apache.org/jira/browse/SPARK-11761), not very sure about it. On Fri, Nov 27, 2015 at 10:22 AM, Nan Zhu wrote: > Hi, all > > Anyone noticed that some of the tests just blocked at the test case “don't > call ssc.stop in

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-20 Thread Saisai Shao
+1. Hadoop 2.6 would be a good choice with many features added (like supporting long running service, label based scheduling). Currently there's lot of reflection codes to support multiple version of Yarn, so upgrading to a newer version will really ease the pain :). Thanks Saisai On Fri, Nov

Re: Streaming Receiverless Kafka API + Offset Management

2015-11-16 Thread Saisai Shao
Kafka now build-in supports managing metadata itself besides ZK, it is easy to use and change from current ZK implementation. I think here the problem is do we need to manage offset in Spark Streaming level or leave this question to user. If you want to manage offset in user level, letting Spark

Re: Spark driver reducing total executors count even when Dynamic Allocation is disabled.

2015-10-20 Thread Saisai Shao
d this is when i have not enabled > Dynamic allocation. My cluster has other DN's available, AM should request > the killed executors from yarn, and get it on some other DN's. > > Regards, > Prakhar > > > On Mon, Oct 19, 2015 at 2:47 PM, Saisai Shao <sai.sai.s...@gmail.co

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-12 Thread Saisai Shao
a...@yelp.com: Thanks, Jerry. That's what I suspected based on the code I looked at. Any pointers on what is needed to build in this support would be great. This is critical to the project we are currently working on. Thanks! On Thu, Jun 11, 2015 at 10:54 PM, Saisai Shao sai.sai.s

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-11 Thread Saisai Shao
. Spark does not support any state persistence across deployments so this is something we need to handle on our own. Hope that helps. Let me know if not. Thanks! Amit On Thu, Jun 11, 2015 at 10:02 PM, Saisai Shao sai.sai.s...@gmail.com wrote: Hi, What is your meaning of getting

Re: Spark 1.4: Python API for getting Kafka offsets in direct mode?

2015-06-11 Thread Saisai Shao
Hi, What is your meaning of getting the offsets from the RDD, from my understanding, the offsetRange is a parameter you offered to KafkaRDD, why do you still want to get the one previous you set into? Thanks Jerry 2015-06-12 12:36 GMT+08:00 Amit Ramesh a...@yelp.com: Congratulations on the

Re: python/run-tests fails at spark master branch

2015-04-22 Thread Saisai Shao
April 2015 07:38 AM, Saisai Shao wrote: Hi Hrishikesh, Now we add Kafka unit test for python which relies on Kafka assembly jar, so you need to run `sbt assembly` or mvn `package` at first to get an assemble jar. 2015-04-22 1:15 GMT+08:00 Marcelo Vanzin van...@cloudera.com: On Tue, Apr

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Saisai Shao
Hi Hrishikesh, Now we add Kafka unit test for python which relies on Kafka assembly jar, so you need to run `sbt assembly` or mvn `package` at first to get an assemble jar. 2015-04-22 1:15 GMT+08:00 Marcelo Vanzin van...@cloudera.com: On Tue, Apr 21, 2015 at 1:30 AM, Hrishikesh Subramonian

Re: Understanding shuffle file name conflicts

2015-03-25 Thread Saisai Shao
the same. -- Kannan On Tue, Mar 24, 2015 at 7:35 PM, Saisai Shao sai.sai.s...@gmail.com wrote: Hi Kannan, As I know the shuffle Id in ShuffleDependency will be increased, so even if you run the same job twice, the shuffle dependency as well as shuffle id is different, so

Re: Understanding shuffle file name conflicts

2015-03-25 Thread Saisai Shao
. On the other hand, DiskBlockManager.getFile is used to create the shuffle index and data file. -- Kannan On Tue, Mar 24, 2015 at 11:56 PM, Saisai Shao sai.sai.s...@gmail.com wrote: Yes as Josh said, when application is started, Spark will create a unique application-wide folder

Re: Understanding shuffle file name conflicts

2015-03-24 Thread Saisai Shao
Hi Kannan, As I know the shuffle Id in ShuffleDependency will be increased, so even if you run the same job twice, the shuffle dependency as well as shuffle id is different, so the shuffle file name which is combined by (shuffleId+mapId+reduceId) will be changed, so there's no name conflict even