Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-02-11 Thread Joseph Bradley
+1  This will be a great long-term investment for Spark.

On Wed, Feb 6, 2019 at 8:44 AM Marco Gaido  wrote:

> +1 from me as well.
>
> Il giorno mer 6 feb 2019 alle ore 16:58 Yanbo Liang 
> ha scritto:
>
>> +1 for the proposal
>>
>>
>>
>> On Thu, Jan 31, 2019 at 12:46 PM Mingjie Tang  wrote:
>>
>>> +1, this is a very very important feature.
>>>
>>> Mingjie
>>>
>>> On Thu, Jan 31, 2019 at 12:42 AM Xiao Li  wrote:
>>>
 Change my vote from +1 to ++1

 Xiangrui Meng  于2019年1月30日周三 上午6:20写道:

> Correction: +0 vote doesn't mean "Don't really care". Thanks Ryan for
> the offline reminder! Below is the Apache official interpretation
> 
> of fraction values:
>
> The in-between values are indicative of how strongly the voting
> individual feels. Here are some examples of fractional votes and ways in
> which they might be intended and interpreted:
> +0: 'I don't feel strongly about it, but I'm okay with this.'
> -0: 'I won't get in the way, but I'd rather we didn't do this.'
> -0.5: 'I don't like this idea, but I can't find any rational
> justification for my feelings.'
> ++1: 'Wow! I like this! Let's do it!'
> -0.9: 'I really don't like this, but I'm not going to stand in the way
> if everyone else wants to go ahead with it.'
> +0.9: 'This is a cool idea and i like it, but I don't have time/the
> skills necessary to help out.'
>
>
> On Wed, Jan 30, 2019 at 12:31 AM Martin Junghanns
>  wrote:
>
>> Hi Dongjoon,
>>
>> Thanks for the hint! I updated the SPIP accordingly.
>>
>> I also changed the access permissions for the SPIP and design sketch
>> docs so that anyone can comment.
>>
>> Best,
>>
>> Martin
>> On 29.01.19 18:59, Dongjoon Hyun wrote:
>>
>> Hi, Xiangrui Meng.
>>
>> +1 for the proposal.
>>
>> However, please update the following section for this vote. As we
>> see, it seems to be inaccurate because today is Jan. 29th. (Almost
>> February).
>> (Since I cannot comment on the SPIP, I replied here.)
>>
>> Q7. How long will it take?
>>
>>-
>>
>>If accepted by the community by the end of December 2018, we
>>predict to be feature complete by mid-end March, allowing for QA 
>> during
>>April 2019, making the SPIP part of the next major Spark release 
>> (3.0, ETA
>>May, 2019).
>>
>> Bests,
>> Dongjoon.
>>
>> On Tue, Jan 29, 2019 at 8:52 AM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Jules Damji  于2019年1月29日周二 上午8:14写道:
>>>
 +1 (non-binding)
 (Heard their proposed tech-talk at Spark + A.I summit in London.
 Well attended & well received.)

 —
 Sent from my iPhone
 Pardon the dumb thumb typos :)

 On Jan 29, 2019, at 7:30 AM, Denny Lee 
 wrote:

 +1

 yay - let's do it!

 On Tue, Jan 29, 2019 at 6:28 AM Xiangrui Meng 
 wrote:

> Hi all,
>
> I want to call for a vote of SPARK-25994
> . It
> introduces a new DataFrame-based component to Spark, which supports
> property graph construction, Cypher queries, and graph algorithms. The
> proposal
> 
> was made available on user@
> 
> and dev@
> 
>  to
> collect input. You can also find a sketch design doc attached to
> SPARK-26028 .
>
> The vote will be up for the next 72 hours. Please reply with your
> vote:
>
> +1: Yeah, let's go forward and implement the SPIP.
> +0: Don't really care.
> -1: I don't think this is a good idea because of the following
> technical reasons.
>
> Best,
> Xiangrui
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Takeshi Yamamuro
+1, too.
branch-2.4 accumulates too many commits..:
https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092

On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun  wrote:

> Thank you, DB.
>
> +1, Yes. It's time for preparing 2.4.1 release.
>
> Bests,
> Dongjoon.
>
> On 2019/02/12 03:16:05, Sean Owen  wrote:
> > I support a 2.4.1 release now, yes.
> >
> > SPARK-23539 is a non-trivial improvement, so probably would not be
> > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
> > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
> > it, but it could go in if otherwise ready.
> >
> >
> > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
> > >
> > > Hi DB,
> > >
> > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little
> bit ago, but it has not included in 2.3.0 nor get enough review.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
> > > [^2]: https://github.com/apache/spark/pull/22282
> > >
> > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
> wrote:
> > >>
> > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted,
> I hope it can be reviewed and included within Spark 2.4.1 - otherwise it
> will be a long-live correctness issue.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
> > >> 2. https://github.com/apache/spark/pull/23634
> > >>
> > >>
> > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
> > >>>
> > >>> Hello all,
> > >>>
> > >>> I am preparing to cut a new Apache 2.4.1 release as there are many
> bugs and correctness issues fixed in branch-2.4.
> > >>>
> > >>> The list of addressed issues are
> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
> > >>>
> > >>> Let me know if you have any concern or any PR you would like to get
> in.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> -
> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>>
> > >
> > >
> > > --
> > > Dongjin Lee
> > >
> > > A hitchhiker in the mathematical world.
> > >
> > > github: github.com/dongjinleekr
> > > linkedin: kr.linkedin.com/in/dongjinleekr
> > > speakerdeck: speakerdeck.com/dongjin
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
---
Takeshi Yamamuro


Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Dongjoon Hyun
Thank you, DB.

+1, Yes. It's time for preparing 2.4.1 release.

Bests,
Dongjoon.

On 2019/02/12 03:16:05, Sean Owen  wrote: 
> I support a 2.4.1 release now, yes.
> 
> SPARK-23539 is a non-trivial improvement, so probably would not be
> back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
> be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
> it, but it could go in if otherwise ready.
> 
> 
> On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
> >
> > Hi DB,
> >
> > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit 
> > ago, but it has not included in 2.3.0 nor get enough review.
> >
> > Thanks,
> > Dongjin
> >
> > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
> > [^2]: https://github.com/apache/spark/pull/22282
> >
> > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  wrote:
> >>
> >> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I 
> >> hope it can be reviewed and included within Spark 2.4.1 - otherwise it 
> >> will be a long-live correctness issue.
> >>
> >> Thanks,
> >> Jungtaek Lim (HeartSaVioR)
> >>
> >> 1. https://issues.apache.org/jira/browse/SPARK-26154
> >> 2. https://github.com/apache/spark/pull/23634
> >>
> >>
> >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
> >>>
> >>> Hello all,
> >>>
> >>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs 
> >>> and correctness issues fixed in branch-2.4.
> >>>
> >>> The list of addressed issues are 
> >>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
> >>>
> >>> Let me know if you have any concern or any PR you would like to get in.
> >>>
> >>> Thanks!
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >
> >
> > --
> > Dongjin Lee
> >
> > A hitchhiker in the mathematical world.
> >
> > github: github.com/dongjinleekr
> > linkedin: kr.linkedin.com/in/dongjinleekr
> > speakerdeck: speakerdeck.com/dongjin
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Sean Owen
I support a 2.4.1 release now, yes.

SPARK-23539 is a non-trivial improvement, so probably would not be
back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
it, but it could go in if otherwise ready.


On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
>
> Hi DB,
>
> Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit 
> ago, but it has not included in 2.3.0 nor get enough review.
>
> Thanks,
> Dongjin
>
> [^1]: https://issues.apache.org/jira/browse/SPARK-23539
> [^2]: https://github.com/apache/spark/pull/22282
>
> On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  wrote:
>>
>> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I hope 
>> it can be reviewed and included within Spark 2.4.1 - otherwise it will be a 
>> long-live correctness issue.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-26154
>> 2. https://github.com/apache/spark/pull/23634
>>
>>
>> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>>>
>>> Hello all,
>>>
>>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs and 
>>> correctness issues fixed in branch-2.4.
>>>
>>> The list of addressed issues are 
>>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>>>
>>> Let me know if you have any concern or any PR you would like to get in.
>>>
>>> Thanks!
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>
>
> --
> Dongjin Lee
>
> A hitchhiker in the mathematical world.
>
> github: github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck: speakerdeck.com/dongjin

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Dongjin Lee
Hi DB,

Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit
ago, but it has not included in 2.3.0 nor get enough review.

Thanks,
Dongjin

[^1]: https://issues.apache.org/jira/browse/SPARK-23539
[^2]: https://github.com/apache/spark/pull/22282

On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  wrote:

> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I
> hope it can be reviewed and included within Spark 2.4.1 - otherwise it will
> be a long-live correctness issue.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-26154
> 2. https://github.com/apache/spark/pull/23634
>
>
> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>
>> Hello all,
>>
>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs
>> and correctness issues fixed in branch-2.4.
>>
>> The list of addressed issues are
>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>>
>> Let me know if you have any concern or any PR you would like to get in.
>>
>> Thanks!
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Jungtaek Lim
Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I
hope it can be reviewed and included within Spark 2.4.1 - otherwise it will
be a long-live correctness issue.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-26154
2. https://github.com/apache/spark/pull/23634


2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:

> Hello all,
>
> I am preparing to cut a new Apache 2.4.1 release as there are many bugs
> and correctness issues fixed in branch-2.4.
>
> The list of addressed issues are
> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>
> Let me know if you have any concern or any PR you would like to get in.
>
> Thanks!
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Time to cut an Apache 2.4.1 release?

2019-02-11 Thread DB Tsai
Hello all,

I am preparing to cut a new Apache 2.4.1 release as there are many bugs and 
correctness issues fixed in branch-2.4.

The list of addressed issues are 
https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC

Let me know if you have any concern or any PR you would like to get in.

Thanks!

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-11 Thread Marcelo Vanzin
+1. Ran our regression tests for YARN and Hive, all look good.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> P.S.
> I checked all the tests passed in the Amazon Linux 2 AMI;
> $ java -version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Psparkr 
> test
>
> --
> ---
> Takeshi Yamamuro



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: building docker images for GPU

2019-02-11 Thread Matt Cheah
I will reiterate some feedback I left on the PR. Firstly, it’s not immediately 
clear if we should be opinionated around supporting GPUs in the Docker image in 
a first class way.

 

Firstly there’s the question of how we arbitrate the kinds of customizations we 
support moving forward. For example if we say we support GPUs now, what’s to 
say that we should not also support FPGAs?

 

Also what kind of testing can we add to CI to ensure what we’ve provided in 
this Dockerfile works?

 

Instead we can make the Spark images have bare minimum support for basic Spark 
applications, and then provide detailed instructions for how to build custom 
Docker images (mostly just needing to make sure the custom image has the right 
entry point).

 

-Matt Cheah

 

From: Rong Ou 
Date: Friday, February 8, 2019 at 2:28 PM
To: "dev@spark.apache.org" 
Subject: building docker images for GPU

 

Hi spark dev, 

 

I created a JIRA issue a while ago 
(https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to add 
GPU support to Spark docker images, and sent a PR 
(https://github.com/apache/spark/pull/23347 [github.com]) that went through 
several iterations. It was suggested that it should be discussed on the dev 
mailing list, so here we are. Please chime in if you have any questions or 
concerns.

 

A little more background. I mainly looked at running XGBoost on Spark using 
GPUs. Preliminary results have shown that there is potential for significant 
speedup in training time. This seems like a popular use case for Spark. In any 
event, it'd be nice for Spark to have better support for GPUs. Building 
gpu-enabled docker images seems like a useful first step.

 

Thanks,

 

Rong

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Static functions

2019-02-11 Thread Jacek Laskowski
Hi Jean,

I thought the functions have already been tagged?

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Sun, Feb 10, 2019 at 11:48 PM Jean Georges Perrin  wrote:

> Hey guys,
>
> We have 381 static functions now (including the deprecated). I am trying
> to sort them out by group/tag them.
>
> So far, I have:
>
>- Array
>- Conversion
>- Date
>- Math
>   - Trigo (sub group of maths)
>- Security
>- Streaming
>- String
>- Technical
>
> Do you see more categories? Tags?
>
> Thanks!
>
> jg
>
> —
> Jean Georges Perrin / @jgperrin
>
>


Re: Tungsten Memory Consumer

2019-02-11 Thread Wenchen Fan
what do you mean by ''Tungsten Consumer"?

On Fri, Feb 8, 2019 at 6:11 PM Jack Kolokasis 
wrote:

> Hello all,
>  I am studying about Tungsten Project and I am wondering when Spark
> creates a Tungsten consumer. While I am running some applications, I see
> that Spark creates Tungsten Consumer while in other applications not
> (using the same configuration). When does this happens ?
>
> I am looking forward for your reply.
>
> --Jack Kolokasis
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>