Re: Apache Spark git repo moved to gitbox.apache.org

2019-02-12 Thread Xiao Li
The above instruction is different from what the website document:
https://github.com/apache/spark-website/commit/92606b2e7849b9d743ef2a8176438142420a83e5#diff-17faa4bab13b7530a3e1b627bb798ad0

Some committers are using gitbox, but the others are following the website
instruction and using github.

Due to the mismatch, gitbox and github becomes inconsistent. I opened an
infra ticket. https://issues.apache.org/jira/browse/INFRA-17842 Hopefully,
it can be fixed soon. We should let all the committers follow the same way;
otherwise, it could break the commit history easily.

Xiao




Sean Owen  于2018年12月10日周一 上午8:30写道:

> Per the thread last week, the Apache Spark repos have migrated from
> https://git-wip-us.apache.org/repos/asf to
> https://gitbox.apache.org/repos/asf
>
>
> Non-committers:
>
> This just means repointing any references to the old repository to the
> new one. It won't affect you if you were already referencing
> https://github.com/apache/spark .
>
>
> Committers:
>
> Follow the steps at https://reference.apache.org/committer/github to
> fully sync your ASF and Github accounts, and then wait up to an hour
> for it to finish.
>
> Then repoint your git-wip-us remotes to gitbox in your git checkouts.
> For our standard setup that works with the merge script, that should
> be your 'apache' remote. For example here are my current remotes:
>
> $ git remote -v
> apache https://gitbox.apache.org/repos/asf/spark.git (fetch)
> apache https://gitbox.apache.org/repos/asf/spark.git (push)
> apache-github git://github.com/apache/spark (fetch)
> apache-github git://github.com/apache/spark (push)
> origin https://github.com/srowen/spark (fetch)
> origin https://github.com/srowen/spark (push)
> upstream https://github.com/apache/spark (fetch)
> upstream https://github.com/apache/spark (push)
>
> In theory we also have read/write access to github.com now too, but
> right now it hadn't yet worked for me. It may need to sync. This note
> just makes sure anyone knows how to keep pushing commits right now to
> the new ASF repo.
>
> Report any problems here!
>
> Sean
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] [RESULT] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-12 Thread Moein Hosseini
++1 from me.

On Wed, Feb 13, 2019 at 2:19 AM Xiangrui Meng  wrote:

> Hi all,
>
> The vote passed with the following +1s (* = binding) and no 0s/-1s:
>
> * Denny Lee
> * Jules Damji
> * Xiao Li*
> * Dongjoon Hyun
> * Mingjie Tang
> * Yanbo Liang*
> * Marco Gaido
> * Joseph Bradley*
> * Xiangrui Meng*
>
> Please watch SPARK-25994 and join future discussions there. Thanks!
>
> Best,
> Xiangrui
>


-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] 
[image: twitter] 


Re: building docker images for GPU

2019-02-12 Thread Chen Qin
Just notice current spark task scheduling doesn't recognize any /device as
constraints.
What might happen as a result would be multiple tasks stuck on racing to
acquire GPU/FPGA (you name it)

Not sure if "multiple process"on one GPU works same as how CPU designed. If
not, we should consider kinda binding in task scheduler and executorInfo.
eg. task 0 executor 1 2 cpu /device/gpu/0
  task 1 executor 1 2 cpu /device/gpu/1

Chen

On Tue, Feb 12, 2019 at 11:04 AM Marcelo Vanzin 
wrote:

> I think I remember someone mentioning a thread about this on the PR
> discussion, and digging a bit I found this:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html
>
> It started a discussion but I haven't really found any conclusion.
>
> In my view here the discussion is the same: what is the contract
> between the Spark code that launches the driver / executor pods, and
> the images?
>
> Right now the contract is defined by the code, which makes it a little
> awkward for people to have their own customized images. They need to
> kinda follow what the images in the repo do and hope they get it
> right.
>
> If instead you define the contract and make the code follow it, then
> it becomes easier for people to provide whatever image they want.
>
> Matt also filed SPARK-24655, which has seen no progress nor discussion.
>
> Someone else filed SPARK-26773, which is similar.
>
> And another person filed SPARK-26597, which is also in the same vein,
> and also suggests something that in the end I agree with: Spark
> shouldn't be opinionated about the image and what it has; it should
> tell the container to run a Spark command to start the driver or
> executor, which should be in the image's path, and shouldn't require
> an entry point at all.
>
> Anyway, just wanted to point out that this discussion isn't as simple
> as "GPU vs. not GPU", but it's a more fundamental discussion about
> what should the container image look like, so that people can
> customize it easily. After all, that's one of the main points of using
> container images, right?
>
> On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah  wrote:
> >
> > I will reiterate some feedback I left on the PR. Firstly, it’s not
> immediately clear if we should be opinionated around supporting GPUs in the
> Docker image in a first class way.
> >
> >
> >
> > Firstly there’s the question of how we arbitrate the kinds of
> customizations we support moving forward. For example if we say we support
> GPUs now, what’s to say that we should not also support FPGAs?
> >
> >
> >
> > Also what kind of testing can we add to CI to ensure what we’ve provided
> in this Dockerfile works?
> >
> >
> >
> > Instead we can make the Spark images have bare minimum support for basic
> Spark applications, and then provide detailed instructions for how to build
> custom Docker images (mostly just needing to make sure the custom image has
> the right entry point).
> >
> >
> >
> > -Matt Cheah
> >
> >
> >
> > From: Rong Ou 
> > Date: Friday, February 8, 2019 at 2:28 PM
> > To: "dev@spark.apache.org" 
> > Subject: building docker images for GPU
> >
> >
> >
> > Hi spark dev,
> >
> >
> >
> > I created a JIRA issue a while ago (
> https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to
> add GPU support to Spark docker images, and sent a PR (
> https://github.com/apache/spark/pull/23347 [github.com]) that went
> through several iterations. It was suggested that it should be discussed on
> the dev mailing list, so here we are. Please chime in if you have any
> questions or concerns.
> >
> >
> >
> > A little more background. I mainly looked at running XGBoost on Spark
> using GPUs. Preliminary results have shown that there is potential for
> significant speedup in training time. This seems like a popular use case
> for Spark. In any event, it'd be nice for Spark to have better support for
> GPUs. Building gpu-enabled docker images seems like a useful first step.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Rong
> >
> >
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[VOTE] [RESULT] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-12 Thread Xiangrui Meng
Hi all,

The vote passed with the following +1s (* = binding) and no 0s/-1s:

* Denny Lee
* Jules Damji
* Xiao Li*
* Dongjoon Hyun
* Mingjie Tang
* Yanbo Liang*
* Marco Gaido
* Joseph Bradley*
* Xiangrui Meng*

Please watch SPARK-25994 and join future discussions there. Thanks!

Best,
Xiangrui


Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-12 Thread Xiangrui Meng
+1 from myself.

The vote passed with the following +1s and no -1s:

* Denny Lee
* Jules Damji
* Xiao Li*
* Dongjoon Hyun
* Mingjie Tang
* Yanbo Liang*
* Marco Gaido
* Joseph Bradley*
* Xiangrui Meng*

I will send a result email soon. Please watch SPARK-25994 for future
discussions. Thanks!

Best,
Xiangrui


On Mon, Feb 11, 2019 at 10:14 PM Joseph Bradley 
wrote:

> +1  This will be a great long-term investment for Spark.
>
> On Wed, Feb 6, 2019 at 8:44 AM Marco Gaido  wrote:
>
>> +1 from me as well.
>>
>> Il giorno mer 6 feb 2019 alle ore 16:58 Yanbo Liang 
>> ha scritto:
>>
>>> +1 for the proposal
>>>
>>>
>>>
>>> On Thu, Jan 31, 2019 at 12:46 PM Mingjie Tang 
>>> wrote:
>>>
 +1, this is a very very important feature.

 Mingjie

 On Thu, Jan 31, 2019 at 12:42 AM Xiao Li  wrote:

> Change my vote from +1 to ++1
>
> Xiangrui Meng  于2019年1月30日周三 上午6:20写道:
>
>> Correction: +0 vote doesn't mean "Don't really care". Thanks Ryan for
>> the offline reminder! Below is the Apache official interpretation
>> 
>> of fraction values:
>>
>> The in-between values are indicative of how strongly the voting
>> individual feels. Here are some examples of fractional votes and ways in
>> which they might be intended and interpreted:
>> +0: 'I don't feel strongly about it, but I'm okay with this.'
>> -0: 'I won't get in the way, but I'd rather we didn't do this.'
>> -0.5: 'I don't like this idea, but I can't find any rational
>> justification for my feelings.'
>> ++1: 'Wow! I like this! Let's do it!'
>> -0.9: 'I really don't like this, but I'm not going to stand in the
>> way if everyone else wants to go ahead with it.'
>> +0.9: 'This is a cool idea and i like it, but I don't have time/the
>> skills necessary to help out.'
>>
>>
>> On Wed, Jan 30, 2019 at 12:31 AM Martin Junghanns
>>  wrote:
>>
>>> Hi Dongjoon,
>>>
>>> Thanks for the hint! I updated the SPIP accordingly.
>>>
>>> I also changed the access permissions for the SPIP and design sketch
>>> docs so that anyone can comment.
>>>
>>> Best,
>>>
>>> Martin
>>> On 29.01.19 18:59, Dongjoon Hyun wrote:
>>>
>>> Hi, Xiangrui Meng.
>>>
>>> +1 for the proposal.
>>>
>>> However, please update the following section for this vote. As we
>>> see, it seems to be inaccurate because today is Jan. 29th. (Almost
>>> February).
>>> (Since I cannot comment on the SPIP, I replied here.)
>>>
>>> Q7. How long will it take?
>>>
>>>-
>>>
>>>If accepted by the community by the end of December 2018, we
>>>predict to be feature complete by mid-end March, allowing for QA 
>>> during
>>>April 2019, making the SPIP part of the next major Spark release 
>>> (3.0, ETA
>>>May, 2019).
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Tue, Jan 29, 2019 at 8:52 AM Xiao Li 
>>> wrote:
>>>
 +1

 Jules Damji  于2019年1月29日周二 上午8:14写道:

> +1 (non-binding)
> (Heard their proposed tech-talk at Spark + A.I summit in London.
> Well attended & well received.)
>
> —
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
> On Jan 29, 2019, at 7:30 AM, Denny Lee 
> wrote:
>
> +1
>
> yay - let's do it!
>
> On Tue, Jan 29, 2019 at 6:28 AM Xiangrui Meng 
> wrote:
>
>> Hi all,
>>
>> I want to call for a vote of SPARK-25994
>> . It
>> introduces a new DataFrame-based component to Spark, which supports
>> property graph construction, Cypher queries, and graph algorithms. 
>> The
>> proposal
>> 
>> was made available on user@
>> 
>> and dev@
>> 
>>  to
>> collect input. You can also find a sketch design doc attached to
>> SPARK-26028 .
>>
>> The vote will be up for the next 72 hours. Please reply with your
>> vote:
>>
>> +1: Yeah, let's go forward and implement the SPIP.
>> +0: Don't really care.
>> -1: I don't think this is a good idea because of the following
>> technical reasons.
>>
>> Best,
>> Xiangrui
>>
>
>
> --
>
> Joseph 

Re: building docker images for GPU

2019-02-12 Thread Marcelo Vanzin
I think I remember someone mentioning a thread about this on the PR
discussion, and digging a bit I found this:
http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html

It started a discussion but I haven't really found any conclusion.

In my view here the discussion is the same: what is the contract
between the Spark code that launches the driver / executor pods, and
the images?

Right now the contract is defined by the code, which makes it a little
awkward for people to have their own customized images. They need to
kinda follow what the images in the repo do and hope they get it
right.

If instead you define the contract and make the code follow it, then
it becomes easier for people to provide whatever image they want.

Matt also filed SPARK-24655, which has seen no progress nor discussion.

Someone else filed SPARK-26773, which is similar.

And another person filed SPARK-26597, which is also in the same vein,
and also suggests something that in the end I agree with: Spark
shouldn't be opinionated about the image and what it has; it should
tell the container to run a Spark command to start the driver or
executor, which should be in the image's path, and shouldn't require
an entry point at all.

Anyway, just wanted to point out that this discussion isn't as simple
as "GPU vs. not GPU", but it's a more fundamental discussion about
what should the container image look like, so that people can
customize it easily. After all, that's one of the main points of using
container images, right?

On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah  wrote:
>
> I will reiterate some feedback I left on the PR. Firstly, it’s not 
> immediately clear if we should be opinionated around supporting GPUs in the 
> Docker image in a first class way.
>
>
>
> Firstly there’s the question of how we arbitrate the kinds of customizations 
> we support moving forward. For example if we say we support GPUs now, what’s 
> to say that we should not also support FPGAs?
>
>
>
> Also what kind of testing can we add to CI to ensure what we’ve provided in 
> this Dockerfile works?
>
>
>
> Instead we can make the Spark images have bare minimum support for basic 
> Spark applications, and then provide detailed instructions for how to build 
> custom Docker images (mostly just needing to make sure the custom image has 
> the right entry point).
>
>
>
> -Matt Cheah
>
>
>
> From: Rong Ou 
> Date: Friday, February 8, 2019 at 2:28 PM
> To: "dev@spark.apache.org" 
> Subject: building docker images for GPU
>
>
>
> Hi spark dev,
>
>
>
> I created a JIRA issue a while ago 
> (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to 
> add GPU support to Spark docker images, and sent a PR 
> (https://github.com/apache/spark/pull/23347 [github.com]) that went through 
> several iterations. It was suggested that it should be discussed on the dev 
> mailing list, so here we are. Please chime in if you have any questions or 
> concerns.
>
>
>
> A little more background. I mainly looked at running XGBoost on Spark using 
> GPUs. Preliminary results have shown that there is potential for significant 
> speedup in training time. This seems like a popular use case for Spark. In 
> any event, it'd be nice for Spark to have better support for GPUs. Building 
> gpu-enabled docker images seems like a useful first step.
>
>
>
> Thanks,
>
>
>
> Rong
>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Nan Zhu
just filed a JIRA in https://issues.apache.org/jira/browse/SPARK-26862
'
this issue only happens in 2.4.0 but not in 2.3.2

anyone would help to look into that?



On Tue, Feb 12, 2019 at 10:41 AM DB Tsai  wrote:

> Great. I'll prepare the release for voting. Thanks!
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |  
> Apple, Inc
>
> > On Feb 12, 2019, at 4:11 AM, Wenchen Fan  wrote:
> >
> > +1 for 2.4.1
> >
> > On Tue, Feb 12, 2019 at 7:55 PM Hyukjin Kwon 
> wrote:
> > +1 for 2.4.1
> >
> > 2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성:
> > > SPARK-23539 is a non-trivial improvement, so probably would not be
> back-ported to 2.4.x.
> >
> > Got it. It seems reasonable.
> >
> > Committers:
> >
> > Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this
> feature.
> >
> > Thanks,
> > Dongjin
> >
> > On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro 
> wrote:
> > +1, too.
> > branch-2.4 accumulates too many commits..:
> >
> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
> >
> > On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun 
> wrote:
> > Thank you, DB.
> >
> > +1, Yes. It's time for preparing 2.4.1 release.
> >
> > Bests,
> > Dongjoon.
> >
> > On 2019/02/12 03:16:05, Sean Owen  wrote:
> > > I support a 2.4.1 release now, yes.
> > >
> > > SPARK-23539 is a non-trivial improvement, so probably would not be
> > > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
> > > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
> > > it, but it could go in if otherwise ready.
> > >
> > >
> > > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee 
> wrote:
> > > >
> > > > Hi DB,
> > > >
> > > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a
> little bit ago, but it has not included in 2.3.0 nor get enough review.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
> > > > [^2]: https://github.com/apache/spark/pull/22282
> > > >
> > > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
> wrote:
> > > >>
> > > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is
> submitted, I hope it can be reviewed and included within Spark 2.4.1 -
> otherwise it will be a long-live correctness issue.
> > > >>
> > > >> Thanks,
> > > >> Jungtaek Lim (HeartSaVioR)
> > > >>
> > > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
> > > >> 2. https://github.com/apache/spark/pull/23634
> > > >>
> > > >>
> > > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
> > > >>>
> > > >>> Hello all,
> > > >>>
> > > >>> I am preparing to cut a new Apache 2.4.1 release as there are many
> bugs and correctness issues fixed in branch-2.4.
> > > >>>
> > > >>> The list of addressed issues are
> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
> > > >>>
> > > >>> Let me know if you have any concern or any PR you would like to
> get in.
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>>
> -
> > > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > > >>>
> > > >
> > > >
> > > > --
> > > > Dongjin Lee
> > > >
> > > > A hitchhiker in the mathematical world.
> > > >
> > > > github: github.com/dongjinleekr
> > > > linkedin: kr.linkedin.com/in/dongjinleekr
> > > > speakerdeck: speakerdeck.com/dongjin
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> >
> > --
> > ---
> > Takeshi Yamamuro
> >
> >
> > --
> > Dongjin Lee
> >
> > A hitchhiker in the mathematical world.
> >
> > github: github.com/dongjinleekr
> > linkedin: kr.linkedin.com/in/dongjinleekr
> > speakerdeck: speakerdeck.com/dongjin
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread DB Tsai
Great. I'll prepare the release for voting. Thanks!

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc

> On Feb 12, 2019, at 4:11 AM, Wenchen Fan  wrote:
> 
> +1 for 2.4.1
> 
> On Tue, Feb 12, 2019 at 7:55 PM Hyukjin Kwon  wrote:
> +1 for 2.4.1
> 
> 2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성:
> > SPARK-23539 is a non-trivial improvement, so probably would not be 
> > back-ported to 2.4.x.
> 
> Got it. It seems reasonable.
> 
> Committers:
> 
> Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this feature.
> 
> Thanks,
> Dongjin
> 
> On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro  
> wrote:
> +1, too.
> branch-2.4 accumulates too many commits..:
> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
> 
> On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun  wrote:
> Thank you, DB.
> 
> +1, Yes. It's time for preparing 2.4.1 release.
> 
> Bests,
> Dongjoon.
> 
> On 2019/02/12 03:16:05, Sean Owen  wrote: 
> > I support a 2.4.1 release now, yes.
> > 
> > SPARK-23539 is a non-trivial improvement, so probably would not be
> > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
> > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
> > it, but it could go in if otherwise ready.
> > 
> > 
> > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
> > >
> > > Hi DB,
> > >
> > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little 
> > > bit ago, but it has not included in 2.3.0 nor get enough review.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
> > > [^2]: https://github.com/apache/spark/pull/22282
> > >
> > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  wrote:
> > >>
> > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I 
> > >> hope it can be reviewed and included within Spark 2.4.1 - otherwise it 
> > >> will be a long-live correctness issue.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
> > >> 2. https://github.com/apache/spark/pull/23634
> > >>
> > >>
> > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
> > >>>
> > >>> Hello all,
> > >>>
> > >>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs 
> > >>> and correctness issues fixed in branch-2.4.
> > >>>
> > >>> The list of addressed issues are 
> > >>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
> > >>>
> > >>> Let me know if you have any concern or any PR you would like to get in.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> -
> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>>
> > >
> > >
> > > --
> > > Dongjin Lee
> > >
> > > A hitchhiker in the mathematical world.
> > >
> > > github: github.com/dongjinleekr
> > > linkedin: kr.linkedin.com/in/dongjinleekr
> > > speakerdeck: speakerdeck.com/dongjin
> > 
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > 
> > 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 
> 
> -- 
> ---
> Takeshi Yamamuro
> 
> 
> -- 
> Dongjin Lee
> 
> A hitchhiker in the mathematical world.
> 
> github: github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck: speakerdeck.com/dongjin


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Contribution

2019-02-12 Thread Valeria Vasylieva
Hi Gabor,

Ok, sure I will!

Best regards,

Valeria

вт, 12 февр. 2019 г. в 17:00, Gabor Somogyi :

> Hi Valeria,
>
> Welcome, ping me if you need review.
>
> BR,
> G
>
>
> On Tue, Feb 12, 2019 at 2:51 PM Valeria Vasylieva <
> valeria.vasyli...@gmail.com> wrote:
>
>> Hi Gabor,
>>
>> Thank you for clarification! Will do it!
>> I am happy to join the community!
>>
>> Best Regards,
>> Valeria
>>
>> вт, 12 февр. 2019 г. в 16:32, Gabor Somogyi :
>>
>>> Hi Valeria,
>>>
>>> Glad to hear you would like to contribute! It will be assigned to you
>>> when you create a PR.
>>> Before you create it please read the following guide which describe the
>>> details: https://spark.apache.org/contributing.html
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Tue, Feb 12, 2019 at 2:28 PM Valeria Vasylieva <
>>> valeria.vasyli...@gmail.com> wrote:
>>>
 Hi!

 My name is Valeria Vasylieva and I would like to help with the task:
 https://issues.apache.org/jira/browse/SPARK-20597

 Please assign it to me, my JIRA account is:
 nimfadora (
 https://issues.apache.org/jira/secure/ViewProfile.jspa?name=nimfadora)

 Thank you!

>>>


Re: Contribution

2019-02-12 Thread Gabor Somogyi
Hi Valeria,

Welcome, ping me if you need review.

BR,
G


On Tue, Feb 12, 2019 at 2:51 PM Valeria Vasylieva <
valeria.vasyli...@gmail.com> wrote:

> Hi Gabor,
>
> Thank you for clarification! Will do it!
> I am happy to join the community!
>
> Best Regards,
> Valeria
>
> вт, 12 февр. 2019 г. в 16:32, Gabor Somogyi :
>
>> Hi Valeria,
>>
>> Glad to hear you would like to contribute! It will be assigned to you
>> when you create a PR.
>> Before you create it please read the following guide which describe the
>> details: https://spark.apache.org/contributing.html
>>
>> BR,
>> G
>>
>>
>> On Tue, Feb 12, 2019 at 2:28 PM Valeria Vasylieva <
>> valeria.vasyli...@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> My name is Valeria Vasylieva and I would like to help with the task:
>>> https://issues.apache.org/jira/browse/SPARK-20597
>>>
>>> Please assign it to me, my JIRA account is:
>>> nimfadora (
>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=nimfadora)
>>>
>>> Thank you!
>>>
>>


Re: Contribution

2019-02-12 Thread Valeria Vasylieva
Hi Gabor,

Thank you for clarification! Will do it!
I am happy to join the community!

Best Regards,
Valeria

вт, 12 февр. 2019 г. в 16:32, Gabor Somogyi :

> Hi Valeria,
>
> Glad to hear you would like to contribute! It will be assigned to you when
> you create a PR.
> Before you create it please read the following guide which describe the
> details: https://spark.apache.org/contributing.html
>
> BR,
> G
>
>
> On Tue, Feb 12, 2019 at 2:28 PM Valeria Vasylieva <
> valeria.vasyli...@gmail.com> wrote:
>
>> Hi!
>>
>> My name is Valeria Vasylieva and I would like to help with the task:
>> https://issues.apache.org/jira/browse/SPARK-20597
>>
>> Please assign it to me, my JIRA account is:
>> nimfadora (
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=nimfadora)
>>
>> Thank you!
>>
>


Re: Contribution

2019-02-12 Thread Gabor Somogyi
Hi Valeria,

Glad to hear you would like to contribute! It will be assigned to you when
you create a PR.
Before you create it please read the following guide which describe the
details: https://spark.apache.org/contributing.html

BR,
G


On Tue, Feb 12, 2019 at 2:28 PM Valeria Vasylieva <
valeria.vasyli...@gmail.com> wrote:

> Hi!
>
> My name is Valeria Vasylieva and I would like to help with the task:
> https://issues.apache.org/jira/browse/SPARK-20597
>
> Please assign it to me, my JIRA account is:
> nimfadora (
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=nimfadora)
>
> Thank you!
>


Contribution

2019-02-12 Thread Valeria Vasylieva
Hi!

My name is Valeria Vasylieva and I would like to help with the task:
https://issues.apache.org/jira/browse/SPARK-20597

Please assign it to me, my JIRA account is:
nimfadora (
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=nimfadora)

Thank you!


Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Wenchen Fan
+1 for 2.4.1

On Tue, Feb 12, 2019 at 7:55 PM Hyukjin Kwon  wrote:

> +1 for 2.4.1
>
> 2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성:
>
>> > SPARK-23539 is a non-trivial improvement, so probably would not be
>> back-ported to 2.4.x.
>>
>> Got it. It seems reasonable.
>>
>> Committers:
>>
>> Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this
>> feature.
>>
>> Thanks,
>> Dongjin
>>
>> On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro 
>> wrote:
>>
>>> +1, too.
>>> branch-2.4 accumulates too many commits..:
>>>
>>> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
>>>
>>> On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you, DB.

 +1, Yes. It's time for preparing 2.4.1 release.

 Bests,
 Dongjoon.

 On 2019/02/12 03:16:05, Sean Owen  wrote:
 > I support a 2.4.1 release now, yes.
 >
 > SPARK-23539 is a non-trivial improvement, so probably would not be
 > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
 > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
 > it, but it could go in if otherwise ready.
 >
 >
 > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee 
 wrote:
 > >
 > > Hi DB,
 > >
 > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a
 little bit ago, but it has not included in 2.3.0 nor get enough review.
 > >
 > > Thanks,
 > > Dongjin
 > >
 > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
 > > [^2]: https://github.com/apache/spark/pull/22282
 > >
 > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
 wrote:
 > >>
 > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is
 submitted, I hope it can be reviewed and included within Spark 2.4.1 -
 otherwise it will be a long-live correctness issue.
 > >>
 > >> Thanks,
 > >> Jungtaek Lim (HeartSaVioR)
 > >>
 > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
 > >> 2. https://github.com/apache/spark/pull/23634
 > >>
 > >>
 > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
 > >>>
 > >>> Hello all,
 > >>>
 > >>> I am preparing to cut a new Apache 2.4.1 release as there are
 many bugs and correctness issues fixed in branch-2.4.
 > >>>
 > >>> The list of addressed issues are
 https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
 > >>>
 > >>> Let me know if you have any concern or any PR you would like to
 get in.
 > >>>
 > >>> Thanks!
 > >>>
 > >>>
 -
 > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 > >>>
 > >
 > >
 > > --
 > > Dongjin Lee
 > >
 > > A hitchhiker in the mathematical world.
 > >
 > > github: github.com/dongjinleekr
 > > linkedin: kr.linkedin.com/in/dongjinleekr
 > > speakerdeck: speakerdeck.com/dongjin
 >
 > -
 > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>> --
>> *Dongjin Lee*
>>
>> *A hitchhiker in the mathematical world.*
>> *github:  github.com/dongjinleekr
>> linkedin: kr.linkedin.com/in/dongjinleekr
>> speakerdeck: speakerdeck.com/dongjin
>> *
>>
>


Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Hyukjin Kwon
+1 for 2.4.1

2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성:

> > SPARK-23539 is a non-trivial improvement, so probably would not be
> back-ported to 2.4.x.
>
> Got it. It seems reasonable.
>
> Committers:
>
> Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this
> feature.
>
> Thanks,
> Dongjin
>
> On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro 
> wrote:
>
>> +1, too.
>> branch-2.4 accumulates too many commits..:
>>
>> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
>>
>> On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you, DB.
>>>
>>> +1, Yes. It's time for preparing 2.4.1 release.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On 2019/02/12 03:16:05, Sean Owen  wrote:
>>> > I support a 2.4.1 release now, yes.
>>> >
>>> > SPARK-23539 is a non-trivial improvement, so probably would not be
>>> > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
>>> > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
>>> > it, but it could go in if otherwise ready.
>>> >
>>> >
>>> > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee 
>>> wrote:
>>> > >
>>> > > Hi DB,
>>> > >
>>> > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a
>>> little bit ago, but it has not included in 2.3.0 nor get enough review.
>>> > >
>>> > > Thanks,
>>> > > Dongjin
>>> > >
>>> > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
>>> > > [^2]: https://github.com/apache/spark/pull/22282
>>> > >
>>> > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
>>> wrote:
>>> > >>
>>> > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is
>>> submitted, I hope it can be reviewed and included within Spark 2.4.1 -
>>> otherwise it will be a long-live correctness issue.
>>> > >>
>>> > >> Thanks,
>>> > >> Jungtaek Lim (HeartSaVioR)
>>> > >>
>>> > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
>>> > >> 2. https://github.com/apache/spark/pull/23634
>>> > >>
>>> > >>
>>> > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>>> > >>>
>>> > >>> Hello all,
>>> > >>>
>>> > >>> I am preparing to cut a new Apache 2.4.1 release as there are many
>>> bugs and correctness issues fixed in branch-2.4.
>>> > >>>
>>> > >>> The list of addressed issues are
>>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>>> > >>>
>>> > >>> Let me know if you have any concern or any PR you would like to
>>> get in.
>>> > >>>
>>> > >>> Thanks!
>>> > >>>
>>> > >>>
>>> -
>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>>
>>> > >
>>> > >
>>> > > --
>>> > > Dongjin Lee
>>> > >
>>> > > A hitchhiker in the mathematical world.
>>> > >
>>> > > github: github.com/dongjinleekr
>>> > > linkedin: kr.linkedin.com/in/dongjinleekr
>>> > > speakerdeck: speakerdeck.com/dongjin
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck: speakerdeck.com/dongjin
> *
>


Re: Tungsten Memory Consumer

2019-02-12 Thread Jack Kolokasis

Hello,

    I am sorry about my first explanation, was not concrete. Well I 
will explain further about TaskMemoryManager. TaskMemoryManager manages 
the execution memory of each task application as follow:


    1. MemoryConsumer is the entry for the Spark task to run. 
MemoryConsumer requests execution memory from TaskMemoryManager.


    2. TaskMemoryManager requests memory from ExeuctionMemoryPool. If 
the execution memory is insufficient, it will borrow storage memory. If 
it is not enough, it will force the cached data in storage to be flushed 
to disk to free memory.


    3. If the Memory returned by the ExecutionMemoryPool is 
insufficient, the MemoryConsumer.spill method is called to flush the 
memory data occupied by the MemoryConsumer to the disk to free the memory.


    4. Then use HeapMemoryAllocator to allocate memory for 
MemoryConsumer OnHeap or use UnsafeMemoryAllocator to allocate memory 
for MemoryConsumer OffHeap, based on the MemoryMode.


    5. Wrap the allocation memory to a MemoryBlock and each MemoryBlock 
is corresponding to a page.


    6. The taskpageManager maintains the page Table of the task, and 
the task can query the corresponding MemoryBlock by page number.


So using the name "TungstenConsumer" I mean the MemoryConsumer that use 
offHeap memory execution. Running some tests to see when 
HeapMemoryAllocator is called, I see that for some applications 
HeapMemoryAllocator is called and for some others not. Could you please 
explain me why this happens ? HeapMemoryAllocator would not always 
called by MemoryConsumer  ?


--Iacovos

On 11/02/2019 11:06 πμ, Wenchen Fan wrote:

what do you mean by ''Tungsten Consumer"?

On Fri, Feb 8, 2019 at 6:11 PM Jack Kolokasis > wrote:


Hello all,
 I am studying about Tungsten Project and I am wondering when
Spark
creates a Tungsten consumer. While I am running some applications,
I see
that Spark creates Tungsten Consumer while in other applications not
(using the same configuration). When does this happens ?

I am looking forward for your reply.

--Jack Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




--
Iacovos Kolokasis
Email: koloka...@ics.forth.gr
Postgraduate Student CSD, University of Crete
Researcher in CARV Lab ICS FORTH



Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Dongjin Lee
> SPARK-23539 is a non-trivial improvement, so probably would not be
back-ported to 2.4.x.

Got it. It seems reasonable.

Committers:

Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this
feature.

Thanks,
Dongjin

On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro 
wrote:

> +1, too.
> branch-2.4 accumulates too many commits..:
>
> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
>
> On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun 
> wrote:
>
>> Thank you, DB.
>>
>> +1, Yes. It's time for preparing 2.4.1 release.
>>
>> Bests,
>> Dongjoon.
>>
>> On 2019/02/12 03:16:05, Sean Owen  wrote:
>> > I support a 2.4.1 release now, yes.
>> >
>> > SPARK-23539 is a non-trivial improvement, so probably would not be
>> > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
>> > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
>> > it, but it could go in if otherwise ready.
>> >
>> >
>> > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
>> > >
>> > > Hi DB,
>> > >
>> > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a
>> little bit ago, but it has not included in 2.3.0 nor get enough review.
>> > >
>> > > Thanks,
>> > > Dongjin
>> > >
>> > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
>> > > [^2]: https://github.com/apache/spark/pull/22282
>> > >
>> > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
>> wrote:
>> > >>
>> > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is
>> submitted, I hope it can be reviewed and included within Spark 2.4.1 -
>> otherwise it will be a long-live correctness issue.
>> > >>
>> > >> Thanks,
>> > >> Jungtaek Lim (HeartSaVioR)
>> > >>
>> > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
>> > >> 2. https://github.com/apache/spark/pull/23634
>> > >>
>> > >>
>> > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>> > >>>
>> > >>> Hello all,
>> > >>>
>> > >>> I am preparing to cut a new Apache 2.4.1 release as there are many
>> bugs and correctness issues fixed in branch-2.4.
>> > >>>
>> > >>> The list of addressed issues are
>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>> > >>>
>> > >>> Let me know if you have any concern or any PR you would like to get
>> in.
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>>
>> -
>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>>
>> > >
>> > >
>> > > --
>> > > Dongjin Lee
>> > >
>> > > A hitchhiker in the mathematical world.
>> > >
>> > > github: github.com/dongjinleekr
>> > > linkedin: kr.linkedin.com/in/dongjinleekr
>> > > speakerdeck: speakerdeck.com/dongjin
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*