from:"Denny Lee"

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Denny Lee

+1 (non-binding)


On Thu, Apr 25, 2024 at 19:26 Xinrong Meng  wrote:

> +1
>
> On Thu, Apr 25, 2024 at 2:08 PM Holden Karau 
> wrote:
>
>> +1
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Thu, Apr 25, 2024 at 11:18 AM Maciej  wrote:
>>
>>> +1
>>>
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> PGP: A30CEF0C31A501EC
>>>
>>> On 4/25/24 6:21 PM, Reynold Xin wrote:
>>>
>>> +1
>>>
>>> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale
>>>  
>>> wrote:
>>>
 +1

 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun 
 wrote:

> FYI, there is a proposal to drop Python 3.8 because its EOL is October
> 2024.
>
> https://github.com/apache/spark/pull/46228
> [SPARK-47993][PYTHON] Drop Python 3.8
>
> Since it's still alive and there will be an overlap between the
> lifecycle of Python 3.8 and Apache Spark 4.0.0, please give us your
> feedback on the PR, if you have any concerns.
>
> From my side, I agree with this decision.
>
> Thanks,
> Dongjoon.
>

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Denny Lee

+1 (non-binding)

On Sat, Apr 13, 2024 at 7:49 PM huaxin gao  wrote:

> +1
>
> On Sat, Apr 13, 2024 at 4:36 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Sat, Apr 13, 2024 at 4:12 PM Hyukjin Kwon 
>> wrote:
>> >
>> > +1
>> >
>> > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun  wrote:
>> >>
>> >> +1.
>> >>
>> >> This feature is very helpful for guarding against correctness issues,
>> such as null results due to invalid input or math overflows. It’s been
>> there for a while now and it’s a good time to enable it by default as Spark
>> enters the next major release.
>> >>
>> >> On Sat, Apr 13, 2024 at 3:27 PM Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> I'll start from my +1.
>> >>>
>> >>> Dongjoon.
>> >>>
>> >>> On 2024/04/13 22:22:05 Dongjoon Hyun wrote:
>> >>> > Please vote on SPARK-4 to use ANSI SQL mode by default.
>> >>> > The technical scope is defined in the following PR which is
>> >>> > one line of code change and one line of migration guide.
>> >>> >
>> >>> > - DISCUSSION:
>> >>> > https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz
>> >>> > - JIRA: https://issues.apache.org/jira/browse/SPARK-4
>> >>> > - PR: https://github.com/apache/spark/pull/46013
>> >>> >
>> >>> > The vote is open until April 17th 1AM (PST) and passes
>> >>> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>> >
>> >>> > [ ] +1 Use ANSI SQL mode by default
>> >>> > [ ] -1 Do not use ANSI SQL mode by default because ...
>> >>> >
>> >>> > Thank you in advance.
>> >>> >
>> >>> > Dongjoon
>> >>> >
>> >>>
>> >>> -
>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Denny Lee

+1 (non-binding)


On Mon, Apr 1, 2024 at 9:24 AM Hussein Awala  wrote:

> +1(non-binding) I add to the difference will it make that it will also
> simplify package maintenance and easily release a bug fix/new feature
> without needing to wait for Pyspark to release.
>
> On Mon, Apr 1, 2024 at 4:56 PM Chao Sun  wrote:
>
>> +1
>>
>> On Sun, Mar 31, 2024 at 10:31 PM Hyukjin Kwon 
>> wrote:
>>
>>> Oh I didn't send the discussion thread out as it's pretty simple,
>>> non-invasive and the discussion was sort of done as part of the Spark
>>> Connect initial discussion ..
>>>
>>> On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan 
>>> wrote:
>>>

 Can you point me to the SPIP’s discussion thread please ?
 I was not able to find it, but I was on vacation, and so might have
 missed this …


 Regards,
 Mridul

>>>
 On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
  wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI
>> (Spark Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Denny Lee

+1 (non-binding)

On Sun, Mar 10, 2024 at 23:36 Gengliang Wang  wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Structured Logging Framework for
> Apache Spark
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
>
> Gengliang Wang
>

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Denny Lee

+1 (non-binding)

On Fri, Mar 1, 2024 at 18:58 kazuyuki tanimura 
wrote:

> +1 (non-binding)
>
> Kazu
>
> > On Mar 1, 2024, at 5:44 PM, L. C. Hsieh  wrote:
> >
> > +1 (binding)
> >
> > On Fri, Mar 1, 2024 at 1:25 PM Joris Van den Bossche
> >  wrote:
> >>
> >> +1 (binding)
> >>
> >> On Fri, 1 Mar 2024 at 22:18, Sutou Kouhei  wrote:
> >>>
> >>> +1
> >>>
> >>> In  >
> >>>  "[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache
> Project" on Fri, 1 Mar 2024 06:33:08 -0500,
> >>>  Andrew Lamb  wrote:
> >>>
>  Hello,
> 
>  As we have discussed[1][2] I would like to vote on the proposal to
>  create a new Apache Top Level Project for DataFusion. The text of the
>  proposed resolution and background document is copy/pasted below
> 
>  If the community is in favor of this, we plan to submit the resolution
>  to the ASF board for approval with the next Arrow report (for the
>  April 2024 board meeting).
> 
>  The vote will be open for at least 7 days.
> 
>  [ ] +1 Accept this Proposal
>  [ ] +0
>  [ ] -1 Do not accept this proposal because...
> 
>  Andrew
> 
>  [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>  [2] https://github.com/apache/arrow-datafusion/discussions/6475
> 
>  -- Proposed Resolution -
> 
>  Resolution to Create the Apache DataFusion Project from the Apache
>  Arrow DataFusion Sub Project
> 
>  =
> 
>  X. Establish the Apache DataFusion Project
> 
>  WHEREAS, the Board of Directors deems it to be in the best
>  interests of the Foundation and consistent with the
>  Foundation's purpose to establish a Project Management
>  Committee charged with the creation and maintenance of
>  open-source software related to an extensible query engine
>  for distribution at no charge to the public.
> 
>  NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>  Committee (PMC), to be known as the "Apache DataFusion Project",
>  be and hereby is established pursuant to Bylaws of the
>  Foundation; and be it further
> 
>  RESOLVED, that the Apache DataFusion Project be and hereby is
>  responsible for the creation and maintenance of software
>  related to an extensible query engine; and be it further
> 
>  RESOLVED, that the office of "Vice President, Apache DataFusion" be
>  and hereby is created, the person holding such office to
>  serve at the direction of the Board of Directors as the chair
>  of the Apache DataFusion Project, and to have primary responsibility
>  for management of the projects within the scope of
>  responsibility of the Apache DataFusion Project; and be it further
> 
>  RESOLVED, that the persons listed immediately below be and
>  hereby are appointed to serve as the initial members of the
>  Apache DataFusion Project:
> 
>  * Andy Grove (agr...@apache.org)
>  * Andrew Lamb (al...@apache.org)
>  * Daniël Heres (dhe...@apache.org)
>  * Jie Wen (jake...@apache.org)
>  * Kun Liu (liu...@apache.org)
>  * Liang-Chi Hsieh (vii...@apache.org)
>  * Qingping Hou: (ho...@apache.org)
>  * Wes McKinney(w...@apache.org)
>  * Will Jones (wjones...@apache.org)
> 
>  RESOLVED, that the Apache DataFusion Project be and hereby
>  is tasked with the migration and rationalization of the Apache
>  Arrow DataFusion sub-project; and be it further
> 
>  RESOLVED, that all responsibilities pertaining to the Apache
>  Arrow DataFusion sub-project encumbered upon the
>  Apache Arrow Project are hereafter discharged.
> 
>  NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
>  be appointed to the office of Vice President, Apache DataFusion, to
>  serve in accordance with and subject to the direction of the
>  Board of Directors and the Bylaws of the Foundation until
>  death, resignation, retirement, removal or disqualification,
>  or until a successor is appointed.
>  =
> 
> 
>  ---
> 
> 
>  Summary:
> 
>  We propose creating a new top level project, Apache DataFusion, from
>  an existing sub project of Apache Arrow to facilitate additional
>  community and project growth.
> 
>  Abstract
> 
>  Apache Arrow DataFusion[1]  is a very fast, extensible query engine
>  for building high-quality data-centric systems in Rust, using the
>  Apache Arrow in-memory format. DataFusion offers SQL and Dataframe
>  APIs, excellent performance, built-in support for CSV, Parquet, JSON,
>  and Avro, extensive customization, and a great community.
> 
>  [1] https://arrow.apache.org/datafusion/
> 
> 
>  Proposal
> 
>  We propose creating a

Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-28 Thread Denny Lee

+1 (non-binding) RAD stack for fast queries, what's not to love!

On Sat, Jan 27, 2024 at 9:24 PM Jacob Wujciak-Jens
 wrote:

> +1 (non-binding)
>
> Jorge Cardoso Leitão  schrieb am So., 28. Jan.
> 2024, 05:17:
>
> > +1
> >
> > On Sun, 28 Jan 2024, 00:00 Wes McKinney,  wrote:
> >
> > > +1 (binding)
> > >
> > > On Sat, Jan 27, 2024 at 12:26 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > >
> > > > +1 Binding
> > > >
> > > > On Sat, Jan 27, 2024 at 10:21 AM David Li 
> wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote:
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Sat, Jan 27, 2024 at 8:10 AM Andrew Lamb <
> al...@influxdata.com>
> > > > > wrote:
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> This is super exciting
> > > > > >>
> > > > > >> On Sat, Jan 27, 2024 at 11:00 AM Daniël Heres <
> > > danielhe...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > +1 (binding). Awesome addition to the DataFusion ecosystem!!!
> > > > > >> >
> > > > > >> > Daniël
> > > > > >> >
> > > > > >> >
> > > > > >> > On Sat, Jan 27, 2024, 16:57 vin jake 
> > > wrote:
> > > > > >> >
> > > > > >> > > +1 (binding)
> > > > > >> > >
> > > > > >> > > Andy Grove  于 2024年1月27日周六
> 下午11:43写道：
> > > > > >> > >
> > > > > >> > > > Hello,
> > > > > >> > > >
> > > > > >> > > > This vote is to determine if the Arrow PMC is in favor of
> > > > > accepting the
> > > > > >> > > > donation of Comet (a Spark native engine that is powered
> by
> > > > > DataFusion
> > > > > >> > > and
> > > > > >> > > > the Rust implementation of Arrow).
> > > > > >> > > >
> > > > > >> > > > The donation was previously discussed on the mailing list
> > [1].
> > > > > >> > > >
> > > > > >> > > > The proposed donation is at [2].
> > > > > >> > > >
> > > > > >> > > > The Arrow PMC will start the IP clearance process if the
> > vote
> > > > > passes.
> > > > > >> > > There
> > > > > >> > > > is a Google document [3] where the community is working on
> > the
> > > > > draft
> > > > > >> > > > contents for the IP clearance form.
> > > > > >> > > >
> > > > > >> > > > The vote will be open for at least 72 hours.
> > > > > >> > > >
> > > > > >> > > > [ ] +1 : Accept the donation
> > > > > >> > > > [ ] 0 : No opinion
> > > > > >> > > > [ ] -1 : Reject donation because...
> > > > > >> > > >
> > > > > >> > > > My vote: +1
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > >
> > > > > >> > > > Andy.
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > [1]
> > > > > https://lists.apache.org/thread/0q1rb11jtpopc7vt1ffdzro0omblsh0s
> > > > > >> > > > [2]
> https://github.com/apache/arrow-datafusion-comet/pull/1
> > > > > >> > > > [3]
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1azmxE1LERNUdnpzqDO5ortKTsPKrhNgQC4oZSmXa8x4/edit?usp=sharing
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove

2023-11-27 Thread Denny Lee

Congratulations Andy and thanks Andrew!

On Mon, Nov 27, 2023 at 5:45 PM Ian Joiner  wrote:

> Congrats Andy and thanks Andrew!
>
>
> On Monday, November 27, 2023, Yibo Cai  wrote:
>
> > Congrats Andy!
> >
> > On 11/28/23 06:03, L. C. Hsieh wrote:
> >
> >> Congrats Andy!
> >>
> >> Thanks Andrew for the efforts to lead the Arrow project in the past
> year!
> >>
> >> On Tue, Nov 28, 2023 at 3:51 AM Krisztián Szűcs
> >>  wrote:
> >>
> >>>
> >>> Congrats Andy & Thanks Andrew!
> >>>
> >>> On Mon, Nov 27, 2023 at 6:55 PM Chao Sun  wrote:
> >>>
> 
>  Congratulations Andy! And thanks Andrew for the awesome work in the
>  past year!
> 
>  Chao
> 
>  On Mon, Nov 27, 2023 at 9:51 AM Jeremy Dyer  wrote:
> 
> >
> > Thanks for your leadership this past year Andrew and I know we are in
> > good hands with Andy going forward. Congrats Andy!
> >
> > - Jeremy Dyer
> >
> > Get Outlook for iOS
> > 
> > From: Nic Crane 
> > Sent: Monday, November 27, 2023 11:18:33 AM
> > To: dev@arrow.apache.org 
> > Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove
> >
> > Congrats Andy!
> >
> > On Mon, 27 Nov 2023 at 15:17, Gang Wu  wrote:
> >
> > Congrats Andy!
> >>
> >> Thanks Andrew for the past year as well.
> >>
> >> Best,
> >> Gang
> >>
> >> On Mon, Nov 27, 2023 at 10:59 PM Matt Topol
> >> 
> >> wrote:
> >>
> >> Congrats Andy!
> >>>
> >>> On Mon, Nov 27, 2023 at 9:44 AM Gavin Ray 
> >>> wrote:
> >>>
> >>> Yay, congrats Andy! Well-deserved!
> 
>  On Mon, Nov 27, 2023 at 9:13 AM Kevin Gurney
> 
> >>>  >>>
> 
> > wrote:
> 
>  Congratulations, Andy!
> > 
> > From: Raúl Cumplido 
> > Sent: Monday, November 27, 2023 8:58 AM
> > To: dev@arrow.apache.org 
> > Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove
> >
> > Congratulations Andy and thanks for the effort during last year
> >
>  Andrew!
> >>
> >>>
> > El lun, 27 nov 2023 a las 14:54, David Li ( >)
> > escribió:
> >
> >>
> >> Congrats Andy!
> >>
> >> On Mon, Nov 27, 2023, at 08:02, Mehmet Ozan Kabak wrote:
> >>
> >>> Congratulations Andy. I am sure we will keep building great
> tech
> >>>
> >> this
> >>>
>  year, just like last year, under your watch.
> >>>
> >>> Mehmet Ozan Kabak
> >>>
> >>>
> >>> On Nov 27, 2023, at 3:47 PM, Daniël Heres <
> 
> >>> danielhe...@gmail.com>
> >>
> >>> wrote:
> >
> >>
>  Congrats Andy!
> 
>  Op ma 27 nov 2023 om 13:47 schreef Andrew Lamb <
> 
> >>> al...@influxdata.com
> 
> > :
> >>
> >>>
>  I am pleased to announce that the Arrow Project has a new PMC
> >
>  chair
> >>>
>  and VP
> >
> >> as per our tradition of rotating the chair once a year. I have
> >
>  resigned and
> >
> >> Andy Grove was duly elected by the PMC and approved unanimously
> >
>  by
> >>>
>  the
> >
> >> board.
> >
> > Please join me in congratulating Andy Grove!
> >
> > Thanks,
> > Andrew
> >
> >
> 
>  --
>  Daniël Heres
> 
> >>>
> >
> >
> 
> >>>
> >>
>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Denny Lee

+1

On Tue, Sep 26, 2023 at 10:52 Maciej  wrote:

> +1
>
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
> On 9/26/23 17:12, Michel Miotto Barbosa wrote:
>
> +1
>
> A disposição | At your disposal
>
> Michel Miotto Barbosa
> https://www.linkedin.com/in/michelmiottobarbosa/
> mmiottobarb...@gmail.com
> +55 11 984 342 347
>
>
>
>
> On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell
>   wrote:
>
>> +1
>>
>> On Tue, Sep 26, 2023 at 10:39 AM yangjie01 
>>  wrote:
>>
>>> +1
>>>
>>>
>>>
>>> *发件人**: *Yikun Jiang 
>>> *日期**: *2023年9月26日 星期二 18:06
>>> *收件人**: *dev 
>>> *抄送**: *Hyukjin Kwon , Ruifeng Zheng <
>>> ruife...@apache.org>
>>> *主题**: *Re: [VOTE] Updating documentation hosted for EOL and
>>> maintenance releases
>>>
>>>
>>>
>>> +1, I believe it is a wise choice to update the EOL policy of the
>>> document based on the real demands of community users.
>>>
>>>
>>> Regards,
>>>
>>> Yikun
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng 
>>> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to start the vote for updating documentation hosted for EOL
>>> and maintenance releases to improve the usability here, and in order for
>>> end users to read the proper and correct documentation.
>>>
>>>
>>> For discussion thread, please refer to
>>> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx
>>> .
>>>
>>>
>>>
>>>
>>> Here is one example:
>>> - https://github.com/apache/spark/pull/42989
>>> 
>>>
>>> - https://github.com/apache/spark-website/pull/480
>>> 
>>>
>>>
>>>
>>> Starting with my own +1.
>>>
>>>

Re: First Time contribution.

2023-09-17 Thread Denny Lee

Hi Ram,

We have some good guidance at
https://spark.apache.org/contributing.html

HTH!
Denny


On Sun, Sep 17, 2023 at 17:18 ram manickam  wrote:

>
>
>
> Hello All,
> Recently, joined this community and would like to contribute. Is there a
> guideline or recommendation on tasks that can be picked up by a first timer
> or a started task?.
>
> Tried looking at stack overflow tag: apache-spark
> , couldn't find
> any information for first time contributors.
>
> Looking forward to learning and contributing.
>
> Thanks
> Ram
>

Re: First Time contribution.

2023-09-17 Thread Denny Lee

Hi Ram,

We have some good guidance at
https://spark.apache.org/contributing.html

HTH!
Denny


On Sun, Sep 17, 2023 at 17:18 ram manickam  wrote:

>
>
>
> Hello All,
> Recently, joined this community and would like to contribute. Is there a
> guideline or recommendation on tasks that can be picked up by a first timer
> or a started task?.
>
> Tried looking at stack overflow tag: apache-spark
> , couldn't find
> any information for first time contributors.
>
> Looking forward to learning and contributing.
>
> Thanks
> Ram
>

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Denny Lee

+1 (non-binding)

On Fri, Jul 7, 2023 at 00:50 Maciej  wrote:

> +0
>
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
> On 7/6/23 17:41, Xiao Li wrote:
>
> +1
>
> Xiao
>
> Hyukjin Kwon  于2023年7月5日周三 17:28写道：
>
>> +1.
>>
>> See https://youtu.be/yj7XlTB1Jvc?t=604 :-).
>>
>> On Thu, 6 Jul 2023 at 09:15, Allison Wang
>> 
>>  wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Python Data Source API.
>>>
>>> The high-level summary for the SPIP is that it aims to introduce a
>>> simple API in Python for Data Sources. The idea is to enable Python
>>> developers to create data sources without learning Scala or dealing with
>>> the complexities of the current data source APIs. This would make Spark
>>> more accessible to the wider Python developer community.
>>>
>>> References:
>>>
>>>- SPIP doc
>>>
>>> 
>>>- JIRA ticket 
>>>- Discussion thread
>>>
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because __.
>>>
>>> Thanks,
>>> Allison
>>>
>>

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Denny Lee

Slightly biased, but per my conversations - this would be awesome to have!

On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari 
wrote:

> I would definitely use it - is it's available :)
>
> On Mon, 19 Jun 2023, 21:56 Jacek Laskowski,  wrote:
>
>> Hi Allison and devs,
>>
>> Although I was against this idea at first sight (probably because I'm a
>> Scala dev), I think it could work as long as there are people who'd be
>> interested in such an API. Were there any? I'm just curious. I've seen no
>> emails requesting it.
>>
>> I also doubt that Python devs would like to work on new data sources but
>> support their wishes wholeheartedly :)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>>
>> On Fri, Jun 16, 2023 at 6:14 AM Allison Wang
>>  wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to start a discussion on “Python Data Source API”.
>>>
>>> This proposal aims to introduce a simple API in Python for Data Sources.
>>> The idea is to enable Python developers to create data sources without
>>> having to learn Scala or deal with the complexities of the current data
>>> source APIs. The goal is to make a Python-based API that is simple and easy
>>> to use, thus making Spark more accessible to the wider Python developer
>>> community. This proposed approach is based on the recently introduced
>>> Python user-defined table functions with extensions to support data sources.
>>>
>>> *SPIP Doc*:
>>> https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing
>>>
>>> *SPIP JIRA*: https://issues.apache.org/jira/browse/SPARK-44076
>>>
>>> Looking forward to your feedback.
>>>
>>> Thanks,
>>> Allison
>>>
>>

Re: JDK version support policy?

2023-06-06 Thread Denny Lee

+1 on dropping Java 8 in Spark 4.0, saying this as a fan of the fast-paced
(positive) updates to Arrow, eh?!

On Tue, Jun 6, 2023 at 4:02 PM Sean Owen  wrote:

> I haven't followed this discussion closely, but I think we could/should
> drop Java 8 in Spark 4.0, which is up next after 3.5?
>
> On Tue, Jun 6, 2023 at 2:44 PM David Li  wrote:
>
>> Hello Spark developers,
>>
>> I'm from the Apache Arrow project. We've discussed Java version support
>> [1], and crucially, whether to continue supporting Java 8 or not. As Spark
>> is a big user of Arrow in Java, I was curious what Spark's policy here was.
>>
>> If Spark intends to stay on Java 8, for instance, we may also want to
>> stay on Java 8 or otherwise provide some supported version of Arrow for
>> Java 8.
>>
>> We've seen dependencies dropping or planning to drop support. gRPC may
>> drop Java 8 at any time [2], possibly this September [3], which may affect
>> Spark (due to Spark Connect). And today we saw that Arrow had issues
>> running tests with Mockito on Java 20, but we couldn't update Mockito since
>> it had dropped Java 8 support. (We pinned the JDK version in that CI
>> pipeline for now.)
>>
>> So at least, I am curious if Arrow could start the long process of
>> migrating Java versions without impacting Spark, or if we should continue
>> to cooperate. Arrow Java doesn't see quite so much activity these days, so
>> it's not quite critical, but it's possible that these dependency issues
>> will start to affect us more soon. And looking forward, Java is working on
>> APIs that should also allow us to ditch the --add-opens flag requirement
>> too.
>>
>> [1]: https://lists.apache.org/thread/phpgpydtt3yrgnncdyv4qdq1gf02s0yj
>> [2]:
>> https://github.com/grpc/proposal/blob/master/P5-jdk-version-support.md
>> [3]: https://github.com/grpc/grpc-java/issues/9386
>>
>

Re: [CONNECT] New Clients for Go and Rust

2023-05-24 Thread Denny Lee

+1 on separate repo allowing different APIs to run at different speeds and
ensuring they get community support.

On Wed, May 24, 2023 at 00:37 Hyukjin Kwon  wrote:

> I think we can just start this with a separate repo.
> I am fine with the second option too but in this case we would have to
> triage which language to add into the main repo.
>
> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>
>> Hi,
>>
>> Personally, I'm strongly against the second option and have some
>> preference towards the third one (or maybe a mix of the first one and the
>> third one).
>>
>> The project is already pretty large as-is and, with an extremely
>> conservative approach towards removal of APIs, it only tends to grow over
>> time. Making it even larger is not going to make things more maintainable
>> and is likely to create an entry barrier for new contributors (that's
>> similar to Jia's arguments).
>>
>> Moreover, we've seen quite a few different language clients over the
>> years and all but one or two survived while none is particularly active, as
>> far as I'm aware.  Taking responsibility for more clients, without being
>> sure that we have resources to maintain them and there is enough community
>> around them to make such effort worthwhile, doesn't seem like a good idea.
>>
>> --
>> Best regards,
>> Maciej Szymkiewicz
>>
>> Web: https://zero323.net
>> PGP: A30CEF0C31A501EC
>>
>>
>>
>> On 5/19/23 14:57, Jia Fan wrote:
>>
>> Hi,
>>
>> Thanks for contribution!
>> I prefer (1). There are some reason:
>>
>> 1. Different repository can maintain independent versions, different
>> release times, and faster bug fix releases.
>>
>> 2. Different languages have different build tools. Putting them in one
>> repository will make the main repository more and more complicated, and it
>> will become extremely difficult to perform a complete build in the main
>> repository.
>>
>> 3. Different repository will make CI configuration and execute easier,
>> and the PR and commit lists will be clearer.
>>
>> 4. Other repository also have different client to governed, like
>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>> https://github.com/ClickHouse/clickhouse-java
>> https://github.com/ClickHouse/clickhouse-odbc
>> https://github.com/ClickHouse/clickhouse-cpp
>>
>> PS: I'm looking forward to the javascript connect client!
>>
>> Thanks Regards
>> Jia Fan
>>
>> Martin Grund  于2023年5月19日周五 20:03写道：
>>
>>> Hi folks,
>>>
>>> When Bo (thanks for the time and contribution) started the work on
>>> https://github.com/apache/spark/pull/41036 he started the Go client
>>> directly in the Spark repository. In the meantime, I was approached by
>>> other engineers who are willing to contribute to working on a Rust client
>>> for Spark Connect.
>>>
>>> Now one of the key questions is where should these connectors live and
>>> how we manage expectations most effectively.
>>>
>>> At the high level, there are two approaches:
>>>
>>> (1) "3rd party" (non-JVM / Python) clients should live in separate
>>> repositories owned and governed by the Apache Spark community.
>>>
>>> (2) All clients should live in the main Apache Spark repository in the
>>> `connector/connect/client` directory.
>>>
>>> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
>>> the Apache Spark repository and governance rules.
>>>
>>> Before we iron out how exactly, we mark these clients as experimental
>>> and how we align their release process etc with Spark, my suggestion would
>>> be to get a consensus on this first question.
>>>
>>> Personally, I'm fine with (1) and (2) with a preference for (2).
>>>
>>> Would love to get feedback from other members of the community!
>>>
>>> Thanks
>>> Martin
>>>
>>>
>>>
>>>
>>

Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Denny Lee

Thanks Dongjoon, but I don't think this is misleading insofar that this is
not a *self-service process* but an invite process which admittedly I did
not state explicitly in my previous thread.  And thanks for the invite to
the-ASF Slack - I just joined :)

Saying this, I do completely agree with your two assertions:

   - *Shall we narrow-down our focus on comparing the ASF Slack vs another
   3rd-party Slack because all of us agree that this is important? *
   - Yes, I do agree that is an important aspect, all else being equal.


   - *I'm wondering what ASF misses here if Apache Spark PMC invites all
   remaining subscribers of `user@spark` and `dev@spark` mailing lists.*
   - The key question here is that do PMC members have the bandwidth of
  inviting everyone in user@ and dev@?   There is a lot of overhead of
  maintaining this so that's my key concern is if we have the number of
  volunteers to manage this.  Note, I'm willing to help with this
process as
  well it was just more of a matter that there are a lot of folks
to approve
  - A reason why we may want to consider Spark's own Slack is because
  we can potentially create different channels within Slack to more easily
  group messages (e.g. different threads for troubleshooting, RDDs,
  streaming, etc.).  Again, we'd need someone to manage this so that way we
  don't have an out of control number of channels.

WDYT?



On Wed, Apr 5, 2023 at 10:50 PM Dongjoon Hyun 
wrote:

> Thank you so much, Denny.
> Yes, let me comment on a few things.
>
> >  - While there is an ASF Slack <https://infra.apache.org/slack.html>, it
> >requires an @apache.org email address
>
> 1. This sounds a little misleading because we can see `guest` accounts in
> the same link. People can be invited by "Invite people to ASF" link. I
> invited you, Denny, and attached the screenshots.
>
> >   using linen.dev as its Slack archive (so we can surpass the 90 days
> limit)
>
> 2. The official Foundation-supported Slack workspace preserves all
> messages.
> (the-asf.slack.com)
>
> > Why: Allows for the community to have the option to communicate with each
> > other using Slack; a pretty popular async communication.
>
> 3. ASF foundation not only allows but also provides the option to
> communicate with each other using Slack as of today.
>
> Given the above (1) and (3), I don't think we asked the right questions
> during most of the parts.
>
> 1. Shall we narrow-down our focus on comparing the ASF Slack vs another
> 3rd-party Slack because all of us agree that this is important?
> 2. I'm wondering what ASF misses here if Apache Spark PMC invites all
> remaining subscribers of `user@spark` and `dev@spark` mailing lists.
>
> Thanks,
> Dongjoon.
>
> [image: invitation.png]
> [image: invited.png]
>
> On Wed, Apr 5, 2023 at 7:23 PM Denny Lee  wrote:
>
>> There have been a number of threads discussing creating a Slack for the
>> Spark community that I'd like to try to help reconcile.
>>
>> Topic: Slack for Spark
>>
>> Why: Allows for the community to have the option to communicate with each
>> other using Slack; a pretty popular async communication.
>>
>> Discussion points:
>>
>>- There are other ASF projects that use Slack including Druid
>><https://druid.apache.org/community/>, Parquet
>><https://parquet.apache.org/community/>, Iceberg
>><https://iceberg.apache.org/community/>, and Hudi
>><https://hudi.apache.org/community/get-involved/>
>>- Flink <https://flink.apache.org/community/> is also using Slack and
>>using linen.dev as its Slack archive (so we can surpass the 90 days
>>limit) which is also Google searchable (Delta Lake
>><https://www.linen.dev/s/delta-lake/> is also using this service as
>>well)
>>- While there is an ASF Slack <https://infra.apache.org/slack.html>,
>>it requires an @apache.org email address to use which is quite
>>limiting which is why these (and many other) OSS projects are using the
>>free-tier Slack
>>- It does require managing Slack properly as Slack free edition
>>limits you to approx 100 invites.  One of the ways to resolve this is to
>>create a bit.ly link so we can manage the invites without regularly
>>updating the website with the new invite link.
>>
>> Are there any other points of discussion that we should add here?  I'm
>> glad to work with whomever to help manage the various aspects of Slack
>> (code of conduct, linen.dev and search/archive process, invite
>> management, etc.).
>>
>> HTH!
>> Denny
>>
>>
>>

Slack for Spark Community: Merging various threads

2023-04-05 Thread Denny Lee

There have been a number of threads discussing creating a Slack for the
Spark community that I'd like to try to help reconcile.

Topic: Slack for Spark

Why: Allows for the community to have the option to communicate with each
other using Slack; a pretty popular async communication.

Discussion points:

   - There are other ASF projects that use Slack including Druid
   , Parquet
   , Iceberg
   , and Hudi
   
   - Flink  is also using Slack and
   using linen.dev as its Slack archive (so we can surpass the 90 days
   limit) which is also Google searchable (Delta Lake
    is also using this service as well)
   - While there is an ASF Slack , it
   requires an @apache.org email address to use which is quite limiting
   which is why these (and many other) OSS projects are using the free-tier
   Slack
   - It does require managing Slack properly as Slack free edition limits
   you to approx 100 invites.  One of the ways to resolve this is to create a
   bit.ly link so we can manage the invites without regularly updating the
   website with the new invite link.

Are there any other points of discussion that we should add here?  I'm glad
to work with whomever to help manage the various aspects of Slack (code of
conduct, linen.dev and search/archive process, invite management, etc.).

HTH!
Denny

Re: Slack for PySpark users

2023-04-03 Thread Denny Lee

;
>>>>> On Thu, Mar 30, 2023 at 9:10 PM Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>
>>>>>> I'm reading through the page "Briefing: The Apache Way", and in the
>>>>>> section of "Open Communications", restriction of communication inside ASF
>>>>>> INFRA (mailing list) is more about code and decision-making.
>>>>>>
>>>>>> https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define
>>>>>>
>>>>>> It's unavoidable if "users" prefer to use an alternative
>>>>>> communication mechanism rather than the user mailing list. Before Stack
>>>>>> Overflow days, there had been a meaningful number of questions around 
>>>>>> user@.
>>>>>> It's just impossible to let them go back and post to the user mailing 
>>>>>> list.
>>>>>>
>>>>>> We just need to make sure it is not the purpose of employing Slack to
>>>>>> move all discussions about developments, direction of the project, etc
>>>>>> which must happen in dev@/private@. The purpose of Slack thread here
>>>>>> does not seem to aim to serve the purpose.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 31, 2023 at 7:00 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Good discussions and proposals.all around.
>>>>>>>
>>>>>>> I have used slack in anger on a customer site before. For small and
>>>>>>> medium size groups it is good and affordable. Alternatives have been
>>>>>>> suggested as well so those who like investigative search can agree and 
>>>>>>> come
>>>>>>> up with a freebie one.
>>>>>>> I am inclined to agree with Bjorn that this slack has more social
>>>>>>> dimensions than the mailing list. It is akin to a sports club using
>>>>>>> WhatsApp groups for communication. Remember we were originally looking 
>>>>>>> for
>>>>>>> space for webinars, including Spark on Linkedin that Denney Lee 
>>>>>>> suggested.
>>>>>>> I think Slack and mailing groups can coexist happily. On a more serious
>>>>>>> note, when I joined the user group back in 2015-2016, there was a lot of
>>>>>>> traffic. Currently we hardly get many mails daily <> less than 5. So 
>>>>>>> having
>>>>>>> a slack type medium may improve members participation.
>>>>>>>
>>>>>>> so +1 for me as well.
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>> Palantir Technologies Limited
>>>>>>>
>>>>>>>
>>>>>>>view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 30 Mar 2023 at 22:19, Denny Lee 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1.
>>>>>>>>
>>>>>>>> To Shani’s point, there are multiple OSS projects that use the free
>>>>>>>> Slack version - top of mind include Delta, Presto, Flink, Trino, 
>>>>>>>> Datahub,
>>>>>>>> MLflow, etc.
>>>>>>>>
>>>>>>>> On Thu, Mar 30, 2023 at 14:15  wrote:
>>>>>>

Re: Slack for PySpark users

2023-04-03 Thread Denny Lee

;
>>>>> On Thu, Mar 30, 2023 at 9:10 PM Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>
>>>>>> I'm reading through the page "Briefing: The Apache Way", and in the
>>>>>> section of "Open Communications", restriction of communication inside ASF
>>>>>> INFRA (mailing list) is more about code and decision-making.
>>>>>>
>>>>>> https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define
>>>>>>
>>>>>> It's unavoidable if "users" prefer to use an alternative
>>>>>> communication mechanism rather than the user mailing list. Before Stack
>>>>>> Overflow days, there had been a meaningful number of questions around 
>>>>>> user@.
>>>>>> It's just impossible to let them go back and post to the user mailing 
>>>>>> list.
>>>>>>
>>>>>> We just need to make sure it is not the purpose of employing Slack to
>>>>>> move all discussions about developments, direction of the project, etc
>>>>>> which must happen in dev@/private@. The purpose of Slack thread here
>>>>>> does not seem to aim to serve the purpose.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 31, 2023 at 7:00 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Good discussions and proposals.all around.
>>>>>>>
>>>>>>> I have used slack in anger on a customer site before. For small and
>>>>>>> medium size groups it is good and affordable. Alternatives have been
>>>>>>> suggested as well so those who like investigative search can agree and 
>>>>>>> come
>>>>>>> up with a freebie one.
>>>>>>> I am inclined to agree with Bjorn that this slack has more social
>>>>>>> dimensions than the mailing list. It is akin to a sports club using
>>>>>>> WhatsApp groups for communication. Remember we were originally looking 
>>>>>>> for
>>>>>>> space for webinars, including Spark on Linkedin that Denney Lee 
>>>>>>> suggested.
>>>>>>> I think Slack and mailing groups can coexist happily. On a more serious
>>>>>>> note, when I joined the user group back in 2015-2016, there was a lot of
>>>>>>> traffic. Currently we hardly get many mails daily <> less than 5. So 
>>>>>>> having
>>>>>>> a slack type medium may improve members participation.
>>>>>>>
>>>>>>> so +1 for me as well.
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>> Palantir Technologies Limited
>>>>>>>
>>>>>>>
>>>>>>>view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 30 Mar 2023 at 22:19, Denny Lee 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1.
>>>>>>>>
>>>>>>>> To Shani’s point, there are multiple OSS projects that use the free
>>>>>>>> Slack version - top of mind include Delta, Presto, Flink, Trino, 
>>>>>>>> Datahub,
>>>>>>>> MLflow, etc.
>>>>>>>>
>>>>>>>> On Thu, Mar 30, 2023 at 14:15  wrote:
>>>>>>

Re: Slack for PySpark users

2023-03-30 Thread Denny Lee

ng
>>>> list because we didn't set up any rule here yet.
>>>>
>>>> To Xiao. I understand what you mean. That's the reason why I added
>>>> Matei from your side.
>>>> > I did not see an objection from the ASF board.
>>>>
>>>> There is on-going discussion about the communication channels outside
>>>> ASF email which is specifically concerning Slack.
>>>> Please hold on any official action for this topic. We will know how to
>>>> support it seamlessly.
>>>>
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>>>
>>>>> Hi, Dongjoon,
>>>>>
>>>>> The other communities (e.g., Pinot, Druid, Flink) created their own
>>>>> Slack workspaces last year. I did not see an objection from the ASF board.
>>>>> At the same time, Slack workspaces are very popular and useful in most
>>>>> non-ASF open source communities. TBH, we are kind of late. I think we can
>>>>> do the same in our community?
>>>>>
>>>>> We can follow the guide when the ASF has an official process for ASF
>>>>> archiving. Since our PMC are the owner of the slack workspace, we can make
>>>>> a change based on the policy. WDYT?
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>> Dongjoon Hyun  于2023年3月30日周四 09:03写道：
>>>>>
>>>>>> Hi, Xiao and all.
>>>>>>
>>>>>> (cc Matei)
>>>>>>
>>>>>> Please hold on the vote.
>>>>>>
>>>>>> There is a concern expressed by ASF board because recent Slack
>>>>>> activities created an isolated silo outside of ASF mailing list archive.
>>>>>>
>>>>>> We need to establish a way to embrace it back to ASF archive before
>>>>>> starting anything official.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li 
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> + @d...@spark.apache.org 
>>>>>>>
>>>>>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>>>>>>> Flink) have created their own dedicated Slack workspaces for faster
>>>>>>> communication. We can do the same in Apache Spark. The Slack workspace 
>>>>>>> will
>>>>>>> be maintained by the Apache Spark PMC. I propose to initiate a vote for 
>>>>>>> the
>>>>>>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh  于2023年3月28日周二 07:07写道：
>>>>>>>
>>>>>>>> I created one at slack called pyspark
>>>>>>>>
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>>> Palantir Technologies Limited
>>>>>>>>
>>>>>>>>
>>>>>>>>view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 good idea, I d like to join as well.
>>>>>>>>>
>>>>>>>>> Le mar. 28 mars 2023 à 04:09, Winston Lai 
>>>>>>>>> a écrit :
>>>>>>>>>
>>>>>>>>>> Please let us know when the channel is created. I'd like to join
>>>>>>>>>> :)
>>>>>>>>>>
>>>>>>>>>> Thank You & Best Regards
>>>>>>>>>> Winston Lai
>>>>>>>>>> --
>>>>>>>>>> *From:* Denny Lee 
>>>>>>>>>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>>>>>>>>>> *To:* Hyukjin Kwon 
>>>>>>>>>> *Cc:* keen ; user@spark.apache.org <
>>>>>>>>>> user@spark.apache.org>
>>>>>>>>>> *Subject:* Re: Slack for PySpark users
>>>>>>>>>>
>>>>>>>>>> +1 I think this is a great idea!
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Yeah, actually I think we should better have a slack channel so
>>>>>>>>>> we can easily discuss with users and developers.
>>>>>>>>>>
>>>>>>>>>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I really like *Slack *as communication channel for a tech
>>>>>>>>>> community.
>>>>>>>>>> There is a Slack workspace for *delta lake users* (
>>>>>>>>>> https://go.delta.io/slack) that I enjoy a lot.
>>>>>>>>>> I was wondering if there is something similar for PySpark users.
>>>>>>>>>>
>>>>>>>>>> If not, would there be anything wrong with creating a new
>>>>>>>>>> Slack workspace for PySpark users? (when explicitly mentioning that 
>>>>>>>>>> this is
>>>>>>>>>> *not* officially part of Apache Spark)?
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Asma ZGOLLI
>>>>>>>>>
>>>>>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>>>>>
>>>>>>>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4
>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail=g>,
>> 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

Re: Slack for PySpark users

2023-03-30 Thread Denny Lee

ng
>>>> list because we didn't set up any rule here yet.
>>>>
>>>> To Xiao. I understand what you mean. That's the reason why I added
>>>> Matei from your side.
>>>> > I did not see an objection from the ASF board.
>>>>
>>>> There is on-going discussion about the communication channels outside
>>>> ASF email which is specifically concerning Slack.
>>>> Please hold on any official action for this topic. We will know how to
>>>> support it seamlessly.
>>>>
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Mar 30, 2023 at 9:21 AM Xiao Li  wrote:
>>>>
>>>>> Hi, Dongjoon,
>>>>>
>>>>> The other communities (e.g., Pinot, Druid, Flink) created their own
>>>>> Slack workspaces last year. I did not see an objection from the ASF board.
>>>>> At the same time, Slack workspaces are very popular and useful in most
>>>>> non-ASF open source communities. TBH, we are kind of late. I think we can
>>>>> do the same in our community?
>>>>>
>>>>> We can follow the guide when the ASF has an official process for ASF
>>>>> archiving. Since our PMC are the owner of the slack workspace, we can make
>>>>> a change based on the policy. WDYT?
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>> Dongjoon Hyun  于2023年3月30日周四 09:03写道：
>>>>>
>>>>>> Hi, Xiao and all.
>>>>>>
>>>>>> (cc Matei)
>>>>>>
>>>>>> Please hold on the vote.
>>>>>>
>>>>>> There is a concern expressed by ASF board because recent Slack
>>>>>> activities created an isolated silo outside of ASF mailing list archive.
>>>>>>
>>>>>> We need to establish a way to embrace it back to ASF archive before
>>>>>> starting anything official.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 29, 2023 at 11:32 PM Xiao Li 
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> + @dev@spark.apache.org 
>>>>>>>
>>>>>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid,
>>>>>>> Flink) have created their own dedicated Slack workspaces for faster
>>>>>>> communication. We can do the same in Apache Spark. The Slack workspace 
>>>>>>> will
>>>>>>> be maintained by the Apache Spark PMC. I propose to initiate a vote for 
>>>>>>> the
>>>>>>> creation of a new Apache Spark Slack workspace. Does that sound good?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh  于2023年3月28日周二 07:07写道：
>>>>>>>
>>>>>>>> I created one at slack called pyspark
>>>>>>>>
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>> Lead Solutions Architect/Engineering Lead
>>>>>>>> Palantir Technologies Limited
>>>>>>>>
>>>>>>>>
>>>>>>>>view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 good idea, I d like to join as well.
>>>>>>>>>
>>>>>>>>> Le mar. 28 mars 2023 à 04:09, Winston Lai 
>>>>>>>>> a écrit :
>>>>>>>>>
>>>>>>>>>> Please let us know when the channel is created. I'd like to join
>>>>>>>>>> :)
>>>>>>>>>>
>>>>>>>>>> Thank You & Best Regards
>>>>>>>>>> Winston Lai
>>>>>>>>>> --
>>>>>>>>>> *From:* Denny Lee 
>>>>>>>>>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM
>>>>>>>>>> *To:* Hyukjin Kwon 
>>>>>>>>>> *Cc:* keen ; u...@spark.apache.org <
>>>>>>>>>> u...@spark.apache.org>
>>>>>>>>>> *Subject:* Re: Slack for PySpark users
>>>>>>>>>>
>>>>>>>>>> +1 I think this is a great idea!
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Yeah, actually I think we should better have a slack channel so
>>>>>>>>>> we can easily discuss with users and developers.
>>>>>>>>>>
>>>>>>>>>> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I really like *Slack *as communication channel for a tech
>>>>>>>>>> community.
>>>>>>>>>> There is a Slack workspace for *delta lake users* (
>>>>>>>>>> https://go.delta.io/slack) that I enjoy a lot.
>>>>>>>>>> I was wondering if there is something similar for PySpark users.
>>>>>>>>>>
>>>>>>>>>> If not, would there be anything wrong with creating a new
>>>>>>>>>> Slack workspace for PySpark users? (when explicitly mentioning that 
>>>>>>>>>> this is
>>>>>>>>>> *not* officially part of Apache Spark)?
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Asma ZGOLLI
>>>>>>>>>
>>>>>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>>>>>
>>>>>>>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4
>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail=g>,
>> 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

Re: Slack for PySpark users

2023-03-27 Thread Denny Lee

+1 I think this is a great idea!

On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon  wrote:

> Yeah, actually I think we should better have a slack channel so we can
> easily discuss with users and developers.
>
> On Tue, 28 Mar 2023 at 03:08, keen  wrote:
>
>> Hi all,
>> I really like *Slack *as communication channel for a tech community.
>> There is a Slack workspace for *delta lake users* (
>> https://go.delta.io/slack) that I enjoy a lot.
>> I was wondering if there is something similar for PySpark users.
>>
>> If not, would there be anything wrong with creating a new Slack workspace
>> for PySpark users? (when explicitly mentioning that this is *not*
>> officially part of Apache Spark)?
>>
>> Cheers
>> Martin
>>
>

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee

What we can do is get into the habit of compiling the list on LinkedIn but
making sure this list is shared and broadcast here, eh?!

As well, when we broadcast the videos, we can do this using zoom/jitsi/
riverside.fm as well as simulcasting this on LinkedIn. This way you can
view directly on the former without ever logging in with a user ID.

HTH!!

On Wed, Mar 15, 2023 at 4:30 PM Mich Talebzadeh 
wrote:

> Understood Nitin It would be wrong to act against one's conviction. I am
> sure we can find a way around providing the contents
>
> Regards
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
> wrote:
>
>> Hi Mich,
>>
>> Thanks for your prompt response ... much appreciated. I know how to and
>> can create login IDs on such sites but I had taken conscious decision some
>> 20 years ago ( and i will be going against my principles) not to be on such
>> sites. Hence I had asked for is there any other way I can join/view
>> recording of webinar.
>>
>> Anyways not to worry.
>>
>> Thanks & Regards
>>
>> Nitin.
>>
>>
>> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi Nitin,
>>
>> Linkedin is more of a professional media.  FYI, I am only a member of
>> Linkedin, no facebook, etc.There is no reason for you NOT to create a
>> profile for yourself  in linkedin :)
>>
>>
>> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>>
>> see you there as well.
>>
>> Best of luck.
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
>> wrote:
>>
>> Hello Mich,
>>
>> My apologies  ...  but I am not on any of such social/professional sites?
>> Any other way to access such webinars/classes?
>>
>> Thanks & Regards
>> Nitin.
>>
>> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
>> denny.g@gmail.com> wrote:
>>
>>
>> Thanks Mich for tackling this!  I encourage everyone to add to the list
>> so we can have a comprehensive list of topics, eh?!
>>
>> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
>> wrote:
>>
>> Hi all,
>>
>> Thanks to @Denny Lee   to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli 
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee

What we can do is get into the habit of compiling the list on LinkedIn but
making sure this list is shared and broadcast here, eh?!

As well, when we broadcast the videos, we can do this using zoom/jitsi/
riverside.fm as well as simulcasting this on LinkedIn. This way you can
view directly on the former without ever logging in with a user ID.

HTH!!

On Wed, Mar 15, 2023 at 4:30 PM Mich Talebzadeh 
wrote:

> Understood Nitin It would be wrong to act against one's conviction. I am
> sure we can find a way around providing the contents
>
> Regards
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 15 Mar 2023 at 22:34, Nitin Bhansali 
> wrote:
>
>> Hi Mich,
>>
>> Thanks for your prompt response ... much appreciated. I know how to and
>> can create login IDs on such sites but I had taken conscious decision some
>> 20 years ago ( and i will be going against my principles) not to be on such
>> sites. Hence I had asked for is there any other way I can join/view
>> recording of webinar.
>>
>> Anyways not to worry.
>>
>> Thanks & Regards
>>
>> Nitin.
>>
>>
>> On Wednesday, 15 March 2023 at 20:37:55 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi Nitin,
>>
>> Linkedin is more of a professional media.  FYI, I am only a member of
>> Linkedin, no facebook, etc.There is no reason for you NOT to create a
>> profile for yourself  in linkedin :)
>>
>>
>> https://www.linkedin.com/help/linkedin/answer/a1338223/sign-up-to-join-linkedin?lang=en
>>
>> see you there as well.
>>
>> Best of luck.
>>
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali 
>> wrote:
>>
>> Hello Mich,
>>
>> My apologies  ...  but I am not on any of such social/professional sites?
>> Any other way to access such webinars/classes?
>>
>> Thanks & Regards
>> Nitin.
>>
>> On Wednesday, 15 March 2023 at 18:26:51 GMT, Denny Lee <
>> denny.g@gmail.com> wrote:
>>
>>
>> Thanks Mich for tackling this!  I encourage everyone to add to the list
>> so we can have a comprehensive list of topics, eh?!
>>
>> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
>> wrote:
>>
>> Hi all,
>>
>> Thanks to @Denny Lee   to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli 
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee

Thanks Mich for tackling this!  I encourage everyone to add to the list so
we can have a comprehensive list of topics, eh?!

On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
wrote:

> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
>> Hi Denny,
>>
>> That Apache Spark Linkedin page
>> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
>> a wider audience to benefit from it.
>>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>>
>>> In the past, we've been using the Apache Spark LinkedIn page
>>> <https://www.linkedin.com/company/apachespark/> and group to broadcast
>>> these type of events - if you're cool with this?  Or we could go through
>>> the process of submitting and updating the current
>>> https://spark.apache.org or request to leverage the original Spark
>>> confluence page <https://cwiki.apache.org/confluence/display/SPARK>.
>>>  WDYT?
>>>
>>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Well that needs to be created first for this purpose. The appropriate
>>>> name etc. to be decided. Maybe @Denny Lee   can
>>>> facilitate this as he offered his help.
>>>>
>>>>
>>>> cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>>>>
>>>>> Hello Mich,
>>>>>
>>>>> Can you please provide the link for the confluence page?
>>>>>
>>>>> Many thanks
>>>>> Asma
>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>
>>>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> a écrit :
>>>>>
>>>>>> Apologies I missed the list.
>>>>>>
>>>>>> To move forward I selected these topics from the thread "Online
>>>>>> classes for spark topics".
>>>>>>
>>>>>> To take this further I propose a confluence page to be seup.
>>>>>>
>>>>>>
>>>>>>1. Spark UI
>>>>>>2. Dynam

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee

Thanks Mich for tackling this!  I encourage everyone to add to the list so
we can have a comprehensive list of topics, eh?!

On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh 
wrote:

> Hi all,
>
> Thanks to @Denny Lee   to give access to
>
> https://www.linkedin.com/company/apachespark/
>
> and contribution from @asma zgolli 
>
> You will see my post at the bottom. Please add anything else on topics to
> the list as a comment.
>
> We will then put them together in an article perhaps. Comments and
> contributions are welcome.
>
> HTH
>
> Mich Talebzadeh,
> Lead Solutions Architect/Engineering Lead,
> Palantir Technologies Limited
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh 
> wrote:
>
>> Hi Denny,
>>
>> That Apache Spark Linkedin page
>> https://www.linkedin.com/company/apachespark/ looks fine. It also allows
>> a wider audience to benefit from it.
>>
>> +1 for me
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 14:23, Denny Lee  wrote:
>>
>>> In the past, we've been using the Apache Spark LinkedIn page
>>> <https://www.linkedin.com/company/apachespark/> and group to broadcast
>>> these type of events - if you're cool with this?  Or we could go through
>>> the process of submitting and updating the current
>>> https://spark.apache.org or request to leverage the original Spark
>>> confluence page <https://cwiki.apache.org/confluence/display/SPARK>.
>>>  WDYT?
>>>
>>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Well that needs to be created first for this purpose. The appropriate
>>>> name etc. to be decided. Maybe @Denny Lee   can
>>>> facilitate this as he offered his help.
>>>>
>>>>
>>>> cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>>>>
>>>>> Hello Mich,
>>>>>
>>>>> Can you please provide the link for the confluence page?
>>>>>
>>>>> Many thanks
>>>>> Asma
>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>
>>>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> a écrit :
>>>>>
>>>>>> Apologies I missed the list.
>>>>>>
>>>>>> To move forward I selected these topics from the thread "Online
>>>>>> classes for spark topics".
>>>>>>
>>>>>> To take this further I propose a confluence page to be seup.
>>>>>>
>>>>>>
>>>>>>1. Spark UI
>>>>>>2. Dynam

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee

In the past, we've been using the Apache Spark LinkedIn page
<https://www.linkedin.com/company/apachespark/> and group to broadcast
these type of events - if you're cool with this?  Or we could go through
the process of submitting and updating the current https://spark.apache.org
or request to leverage the original Spark confluence page
<https://cwiki.apache.org/confluence/display/SPARK>.WDYT?

On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
wrote:

> Well that needs to be created first for this purpose. The appropriate name
> etc. to be decided. Maybe @Denny Lee   can
> facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>
>> Hello Mich,
>>
>> Can you please provide the link for the confluence page?
>>
>> Many thanks
>> Asma
>> Ph.D. in Big Data - Applied Machine Learning
>>
>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
>> a écrit :
>>
>>> Apologies I missed the list.
>>>
>>> To move forward I selected these topics from the thread "Online classes
>>> for spark topics".
>>>
>>> To take this further I propose a confluence page to be seup.
>>>
>>>
>>>1. Spark UI
>>>2. Dynamic allocation
>>>3. Tuning of jobs
>>>4. Collecting spark metrics for monitoring and alerting
>>>5.  For those who prefer to use Pandas API on Spark since the
>>>release of Spark 3.2, What are some important notes for those users? For
>>>example, what are the additional factors affecting the Spark performance
>>>using Pandas API on Spark? How to tune them in addition to the 
>>> conventional
>>>Spark tuning methods applied to Spark SQL users.
>>>6. Spark internals and/or comparing spark 3 and 2
>>>7. Spark Streaming & Spark Structured Streaming
>>>8. Spark on notebooks
>>>9. Spark on serverless (for example Spark on Google Cloud)
>>>10. Spark on k8s
>>>
>>> Opinions and how to is welcome
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
>>> wrote:
>>>
>>>> Hi guys
>>>>
>>>> To move forward I selected these topics from the thread "Online classes
>>>> for spark topics".
>>>>
>>>> To take this further I propose a confluence page to be seup.
>>>>
>>>> Opinions and how to is welcome
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>
>>
>>
>>

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee

In the past, we've been using the Apache Spark LinkedIn page
<https://www.linkedin.com/company/apachespark/> and group to broadcast
these type of events - if you're cool with this?  Or we could go through
the process of submitting and updating the current https://spark.apache.org
or request to leverage the original Spark confluence page
<https://cwiki.apache.org/confluence/display/SPARK>.WDYT?

On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh 
wrote:

> Well that needs to be created first for this purpose. The appropriate name
> etc. to be decided. Maybe @Denny Lee   can
> facilitate this as he offered his help.
>
>
> cheers
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 13 Mar 2023 at 16:29, asma zgolli  wrote:
>
>> Hello Mich,
>>
>> Can you please provide the link for the confluence page?
>>
>> Many thanks
>> Asma
>> Ph.D. in Big Data - Applied Machine Learning
>>
>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh 
>> a écrit :
>>
>>> Apologies I missed the list.
>>>
>>> To move forward I selected these topics from the thread "Online classes
>>> for spark topics".
>>>
>>> To take this further I propose a confluence page to be seup.
>>>
>>>
>>>1. Spark UI
>>>2. Dynamic allocation
>>>3. Tuning of jobs
>>>4. Collecting spark metrics for monitoring and alerting
>>>5.  For those who prefer to use Pandas API on Spark since the
>>>release of Spark 3.2, What are some important notes for those users? For
>>>example, what are the additional factors affecting the Spark performance
>>>using Pandas API on Spark? How to tune them in addition to the 
>>> conventional
>>>Spark tuning methods applied to Spark SQL users.
>>>6. Spark internals and/or comparing spark 3 and 2
>>>7. Spark Streaming & Spark Structured Streaming
>>>8. Spark on notebooks
>>>9. Spark on serverless (for example Spark on Google Cloud)
>>>10. Spark on k8s
>>>
>>> Opinions and how to is welcome
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh 
>>> wrote:
>>>
>>>> Hi guys
>>>>
>>>> To move forward I selected these topics from the thread "Online classes
>>>> for spark topics".
>>>>
>>>> To take this further I propose a confluence page to be seup.
>>>>
>>>> Opinions and how to is welcome
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>
>>
>>
>>

Re: Online classes for spark topics

2023-03-12 Thread Denny Lee

Looks like we have some good topics here - I'm glad to help with setting up
the infrastructure to broadcast if it helps?

On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani 
wrote:

> I am happy to be a part of this discussion as well.
>
> Regards,
> Neeraj
>
> On Wed, 8 Mar 2023 at 22:41, Winston Lai  wrote:
>
>> +1, any webinar on Spark related topic is appreciated 
>>
>> Thank You & Best Regards
>> Winston Lai
>> --
>> *From:* asma zgolli 
>> *Sent:* Thursday, March 9, 2023 5:43:06 AM
>> *To:* karan alang 
>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com <
>> ashok34...@yahoo.com>; User 
>> *Subject:* Re: Online classes for spark topics
>>
>> +1
>>
>> Le mer. 8 mars 2023 à 21:32, karan alang  a
>> écrit :
>>
>> +1 .. I'm happy to be part of these discussions as well !
>>
>>
>>
>>
>> On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Hi,
>>
>> I guess I can schedule this work over a course of time. I for myself can
>> contribute plus learn from others.
>>
>> So +1 for me.
>>
>> Let us see if anyone else is interested.
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
>> wrote:
>>
>>
>> Hello Mich.
>>
>> Greetings. Would you be able to arrange for Spark Structured Streaming
>> learning webinar.?
>>
>> This is something I haven been struggling with recently. it will be very
>> helpful.
>>
>> Thanks and Regard
>>
>> AK
>> On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> This might  be a worthwhile exercise on the assumption that the
>> contributors will find the time and bandwidth to chip in so to speak.
>>
>> I am sure there are many but on top of my head I can think of Holden
>> Karau for k8s, and Sean Owen for data science stuff. They are both very
>> experienced.
>>
>> Anyone else 樂
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
>>  wrote:
>>
>> Hello gurus,
>>
>> Does Spark arranges online webinars for special topics like Spark on K8s,
>> data science and Spark Structured Streaming?
>>
>> I would be most grateful if experts can share their experience with
>> learners with intermediate knowledge like myself. Hopefully we will find
>> the practical experiences told valuable.
>>
>> Respectively,
>>
>> AK
>>
>>
>>
>>
>

Re: Online classes for spark topics

2023-03-08 Thread Denny Lee

We used to run Spark webinars on the Apache Spark LinkedIn group
 but
honestly the turnout was pretty low.  We had dove into various features.
If there are particular topics that. you would like to discuss during a
live session, please let me know and we can try to restart them.  HTH!

On Wed, Mar 8, 2023 at 9:45 PM Sofia’s World  wrote:

> +1
>
> On Wed, Mar 8, 2023 at 10:40 PM Winston Lai  wrote:
>
>> +1, any webinar on Spark related topic is appreciated 
>>
>> Thank You & Best Regards
>> Winston Lai
>> --
>> *From:* asma zgolli 
>> *Sent:* Thursday, March 9, 2023 5:43:06 AM
>> *To:* karan alang 
>> *Cc:* Mich Talebzadeh ; ashok34...@yahoo.com <
>> ashok34...@yahoo.com>; User 
>> *Subject:* Re: Online classes for spark topics
>>
>> +1
>>
>> Le mer. 8 mars 2023 à 21:32, karan alang  a
>> écrit :
>>
>> +1 .. I'm happy to be part of these discussions as well !
>>
>>
>>
>>
>> On Wed, Mar 8, 2023 at 12:27 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Hi,
>>
>> I guess I can schedule this work over a course of time. I for myself can
>> contribute plus learn from others.
>>
>> So +1 for me.
>>
>> Let us see if anyone else is interested.
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 8 Mar 2023 at 17:48, ashok34...@yahoo.com 
>> wrote:
>>
>>
>> Hello Mich.
>>
>> Greetings. Would you be able to arrange for Spark Structured Streaming
>> learning webinar.?
>>
>> This is something I haven been struggling with recently. it will be very
>> helpful.
>>
>> Thanks and Regard
>>
>> AK
>> On Tuesday, 7 March 2023 at 20:24:36 GMT, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> This might  be a worthwhile exercise on the assumption that the
>> contributors will find the time and bandwidth to chip in so to speak.
>>
>> I am sure there are many but on top of my head I can think of Holden
>> Karau for k8s, and Sean Owen for data science stuff. They are both very
>> experienced.
>>
>> Anyone else 樂
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
>>  wrote:
>>
>> Hello gurus,
>>
>> Does Spark arranges online webinars for special topics like Spark on K8s,
>> data science and Spark Structured Streaming?
>>
>> I would be most grateful if experts can share their experience with
>> learners with intermediate knowledge like myself. Hopefully we will find
>> the practical experiences told valuable.
>>
>> Respectively,
>>
>> AK
>>
>>
>>
>>
>

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Denny Lee

Woohoo, Holden!  I’m in (hopefully my schedule allows me) - PST.

On Tue, Feb 7, 2023 at 15:34 Andrew Melo  wrote:

> I'm Central US time (AKA UTC -6:00)
>
> On Tue, Feb 7, 2023 at 5:32 PM Holden Karau  wrote:
> >
> > Awesome, I guess I should have asked folks for timezones that they’re in.
> >
> > On Tue, Feb 7, 2023 at 3:30 PM Andrew Melo 
> wrote:
> >>
> >> Hello Holden,
> >>
> >> We are interested in Spark on k8s and would like the opportunity to
> >> speak with devs about what we're looking for slash better ways to use
> >> spark.
> >>
> >> Thanks!
> >> Andrew
> >>
> >> On Tue, Feb 7, 2023 at 5:24 PM Holden Karau 
> wrote:
> >> >
> >> > Hi Folks,
> >> >
> >> > It seems like we could maybe use some additional shared context
> around Spark on Kube so I’d like to try and schedule a virtual coffee
> session.
> >> >
> >> > Who all would be interested in virtual adventures around Spark on
> Kube development?
> >> >
> >> > No pressure if the idea of hanging out in a virtual chat with coffee
> and Spark devs does not sound like your thing, just trying to make
> something informal so we can have a better understanding of everyone’s
> goals here.
> >> >
> >> > Cheers,
> >> >
> >> > Holden :)
> >> > --
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Prometheus with spark

2022-10-27 Thread Denny Lee

Hi Raja,

A little atypical way to respond to your question - please check out the
most recent Spark AMA where we discuss this:
https://www.linkedin.com/posts/apachespark_apachespark-ama-committers-activity-6989052811397279744-jpWH?utm_source=share_medium=member_ios

HTH!
Denny



On Tue, Oct 25, 2022 at 09:16 Raja bhupati 
wrote:

> We have use case where we would like process Prometheus metrics data with
> spark
>
> On Tue, Oct 25, 2022, 19:49 Jacek Laskowski  wrote:
>
>> Hi Raj,
>>
>> Do you want to do the following?
>>
>> spark.read.format("prometheus").load...
>>
>> I haven't heard of such a data source / format before.
>>
>> What would you like it for?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>>
>> On Fri, Oct 21, 2022 at 6:12 PM Raj ks  wrote:
>>
>>> Hi Team,
>>>
>>>
>>> We wanted to query Prometheus data with spark. Any suggestions will
>>> be appreciated
>>>
>>> Searched for documents but did not got any prompt one
>>>
>>

Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-22 Thread Denny Lee

+1 (non-binding)

On Wed, Sep 21, 2022 at 10:33 PM Ankit Gupta  wrote:

> +1
>
> Regards,
>
> Ankit Prakash Gupta
>
> On Thu, Sep 22, 2022 at 10:38 AM Yang,Jie(INF) 
> wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Regards,
>>
>> Yang Jie
>>
>>
>>
>> *发件人**: *Gengliang Wang 
>> *日期**: *2022年9月22日 星期四 12:22
>> *收件人**: *Xiangrui Meng 
>> *抄送**: *Kent Yao , Hyukjin Kwon ,
>> dev 
>> *主题**: *Re: [VOTE] SPIP: Support Docker Official Image for Spark
>>
>>
>>
>> +1
>>
>>
>>
>> On Wed, Sep 21, 2022 at 7:26 PM Xiangrui Meng  wrote:
>>
>> +1
>>
>>
>>
>> On Wed, Sep 21, 2022 at 6:53 PM Kent Yao  wrote:
>>
>> +1
>>
>>
>>
>> *Kent Yao *
>>
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>
>> *a spark enthusiast*
>>
>> *kyuubi
>> **is
>> a unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of **Apache Spark
>> *
>> *.*
>> *spark-authorizer
>> **A
>> Spark SQL extension which provides SQL Standard Authorization for **Apache
>> Spark
>> *
>> *.*
>> *spark-postgres
>> 
>>  **A
>> library for reading data from and transferring data to Postgres / Greenplum
>> with Spark SQL and DataFrames, 10~100x faster.*
>> *itatchi
>> **A
>> library that brings useful functions from various modern database
>> management systems to **Apache Spark
>> *
>> *.*
>>
>>
>>
>>
>>
>>  Replied Message 
>>
>> From
>>
>> Hyukjin Kwon 
>>
>> Date
>>
>> 09/22/2022 09:43
>>
>> To
>>
>> dev 
>>
>> Subject
>>
>> Re: [VOTE] SPIP: Support Docker Official Image for Spark
>>
>> Starting with my +1.
>>
>>
>>
>> On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon  wrote:
>>
>> Hi all,
>>
>> I would like to start a vote for SPIP: "Support Docker Official Image for
>> Spark"
>>
>> The goal of the SPIP is to add Docker Official Image(DOI)
>> 
>> to ensure the Spark Docker images
>> meet the quality standards for Docker images, to provide these Docker
>> images for users
>> who want to use Apache Spark via Docker image.
>>
>> Please also refer to:
>>
>> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support
>> Docker Official Image for Spark
>> 
>>
>> - SPIP doc: SPIP: Support Docker Official Image for Spark
>> 
>>
>> - JIRA: SPARK-40513
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>>
>>
>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Denny Lee

+1 (non-binding).

This is a great idea and we should definitely do this.  Count me in to help
as well, eh?! :)

On Sun, Sep 18, 2022 at 7:24 PM bo zhaobo 
wrote:

> +1 (non-binding)
>
> This will bring the good experience to customers. So excited about this.
> ;-)
>
> Yuming Wang  于2022年9月19日周一 10:18写道：
>
>> +1.
>>
>> On Mon, Sep 19, 2022 at 9:44 AM Kent Yao  wrote:
>>
>>> +1
>>>
>>> Gengliang Wang  于2022年9月19日周一 09:23写道：
>>> >
>>> > +1, thanks for the work!
>>> >
>>> > On Sun, Sep 18, 2022 at 6:20 PM Hyukjin Kwon 
>>> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On Mon, 19 Sept 2022 at 09:15, Yikun Jiang 
>>> wrote:
>>> >>>
>>> >>> Hi, all
>>> >>>
>>> >>>
>>> >>> I would like to start the discussion for supporting Docker Official
>>> Image for Spark.
>>> >>>
>>> >>>
>>> >>> This SPIP is proposed to add Docker Official Image(DOI) to ensure
>>> the Spark Docker images meet the quality standards for Docker images, to
>>> provide these Docker images for users who want to use Apache Spark via
>>> Docker image.
>>> >>>
>>> >>>
>>> >>> There are also several Apache projects that release the Docker
>>> Official Images, such as: flink, storm, solr, zookeeper, httpd (with 50M+
>>> to 1B+ download for each). From the huge download statistics, we can see
>>> the real demands of users, and from the support of other apache projects,
>>> we should also be able to do it.
>>> >>>
>>> >>>
>>> >>> After support:
>>> >>>
>>> >>> The Dockerfile will still be maintained by the Apache Spark
>>> community and reviewed by Docker.
>>> >>>
>>> >>> The images will be maintained by the Docker community to ensure the
>>> quality standards for Docker images of the Docker community.
>>> >>>
>>> >>>
>>> >>> It will also reduce the extra docker images maintenance effort (such
>>> as frequently rebuilding, image security update) of the Apache Spark
>>> community.
>>> >>>
>>> >>>
>>> >>> See more in SPIP DOC:
>>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>>> >>>
>>> >>>
>>> >>> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>> Yikun
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: Databricks notebook - cluster taking a long time to get created, often timing out

2021-08-17 Thread Denny Lee

Hi Karan,

You may want to ping Databricks Help  or
Forums  as this is a Databricks
specific question.  I'm a little surprised that a Databricks cluster would
take a long time to create so it may be best to utilize these forums to
grok the cause.

HTH!
Denny

Sent via Superhuman 

On Mon, Aug 16, 2021 at 11:10 PM, karan alang  wrote:

> Hello - i've been using the Databricks notebook(for pyspark or scala/spark
> development), and recently have had issues wherein the cluster creation
> takes a long time to get created, often timing out.
>
> Any ideas on how to resolve this ?
> Any other alternatives to databricks notebook ?
>

Re: Append to an existing Delta Lake using structured streaming

2021-07-21 Thread Denny Lee

Including the Delta Lake Users and Developers DL to help out.

Saying this, could you clarify how data is not being added?  By any chance
do you have any code samples to recreate this?

Sent via Superhuman 


On Wed, Jul 21, 2021 at 2:49 AM,  wrote:

> Hi all,
>   I stumbled upon an interessting problem. I have an existing Deltalake
> with data recovered from a backup and would like to append to this
> Deltalake using Spark structured streaming. This does not work. Although
> the streaming job is running no data is appended.
> If I created the original file with structured streaming than appending to
> this file with a streaming job (at least with the same job) works
> flawlessly.  Did I missunderstand something here?
>
> best regards
>Eugen Wintersberger
>

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-28 Thread Denny Lee

+1 (non-binding)

On Sun, Mar 28, 2021 at 9:06 PM 郑瑞峰  wrote:

> +1 (non-binding)
>
>
> -- 原始邮件 --
> *发件人:* "Maxim Gekk" ;
> *发送时间:* 2021年3月29日(星期一) 凌晨2:08
> *收件人:* "Matei Zaharia";
> *抄送:* "Gengliang Wang";"Mridul Muralidharan"<
> mri...@gmail.com>;"Xiao Li";"Spark dev list"<
> dev@spark.apache.org>;"Takeshi Yamamuro";
> *主题:* Re: [VOTE] SPIP: Support pandas API layer on PySpark
>
> +1 (non-binding)
>
> On Sun, Mar 28, 2021 at 8:53 PM Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Mar 28, 2021, at 1:45 AM, Gengliang Wang  wrote:
>>
>> +1 (non-binding)
>>
>> On Sun, Mar 28, 2021 at 11:12 AM Mridul Muralidharan 
>> wrote:
>>
>>> +1
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Sat, Mar 27, 2021 at 6:09 PM Xiao Li  wrote:
>>>
 +1

 Xiao

 Takeshi Yamamuro  于2021年3月26日周五 下午4:14写道：

> +1 (non-binding)
>
> On Sat, Mar 27, 2021 at 4:53 AM Liang-Chi Hsieh 
> wrote:
>
>> +1 (non-binding)
>>
>>
>> rxin wrote
>> > +1. Would open up a huge persona for Spark.
>> >
>> > On Fri, Mar 26 2021 at 11:30 AM, Bryan Cutler <
>>
>> > cutlerb@
>>
>> >  > wrote:
>> >
>> >>
>> >> +1 (non-binding)
>> >>
>> >>
>> >> On Fri, Mar 26, 2021 at 9:49 AM Maciej <
>>
>> > mszymkiewicz@
>>
>> >  > wrote:
>> >>
>> >>
>> >>> +1 (nonbinding)
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>

>>

Re: I'm going to be out starting Nov 5th

2020-10-31 Thread Denny Lee

Best wishes Holden! :)

On Sat, Oct 31, 2020 at 11:00 Dongjoon Hyun  wrote:

> Take care, Holden! I believe everything goes well.
>
> Bests,
> Dongjoon.
>
> On Sat, Oct 31, 2020 at 10:24 AM Reynold Xin  wrote:
>
>> Take care Holden and best of luck with everything!
>>
>>
>> On Sat, Oct 31 2020 at 10:21 AM, Holden Karau 
>> wrote:
>>
>>> Hi Folks,
>>>
>>> Just a heads up so folks working on decommissioning or other areas I've
>>> been active in don't block on me, I'm going to be out for at least a week
>>> and possibly more starting on November 5th. If there is anything that folks
>>> want me to review before then please let me know and I'll make the time for
>>> it. If you are curious I've got more details at
>>> http://blog.holdenkarau.com/2020/10/taking-break-surgery.html
>>>
>>> Happy Sparking Everyone,
>>>
>>> Holden :)
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Denny Lee

+1 (non-binding)

On Sun, Jun 7, 2020 at 3:21 PM Jungtaek Lim 
wrote:

> I'm seeing the effort of including the correctness issue SPARK-28067 [1]
> to 3.0.0 via SPARK-31894 [2]. That doesn't seem to be a regression so
> technically doesn't block the release, so while it'd be good to weigh its
> worth (it requires some SS users to discard the state so might bring less
> frightened requiring it in major version upgrade), it looks to be optional
> to include SPARK-28067 to 3.0.0.
>
> Besides, I see all blockers look to be resolved, thanks all for the
> amazing efforts!
>
> +1 (non-binding) if the decision of SPARK-28067 is "later".
>
> 1. https://issues.apache.org/jira/browse/SPARK-28067
> 2. https://issues.apache.org/jira/browse/SPARK-31894
>
> On Mon, Jun 8, 2020 at 5:23 AM Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Jun 7, 2020, at 6:53 AM, Maxim Gekk  wrote:
>>
>> +1 (non-binding)
>>
>> On Sun, Jun 7, 2020 at 2:34 PM Takeshi Yamamuro 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> I don't see any ongoing PR to fix critical bugs in my area.
>>> Bests,
>>> Takeshi
>>>
>>> On Sun, Jun 7, 2020 at 7:24 PM Mridul Muralidharan 
>>> wrote:
>>>
 +1

 Regards,
 Mridul

 On Sat, Jun 6, 2020 at 1:20 PM Reynold Xin  wrote:

> Apologies for the mistake. The vote is open till 11:59pm Pacific time
> on Mon June 9th.
>
> On Sat, Jun 6, 2020 at 1:08 PM Reynold Xin 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 3.0.0.
>>
>> The vote is open until [DUE DAY] and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.0.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.0.0-rc3 (commit
>> 3fdfce3120f307147244e5eaf46d61419a723d50):
>> https://github.com/apache/spark/tree/v3.0.0-rc3
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1350/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc3-docs/
>>
>> The list of bug fixes going into 3.0.0 can be found at the following
>> URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>>
>> This release is using the release script of the tag v3.0.0-rc3.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.0.0?
>> ===
>>
>> The current list of open tickets targeted at 3.0.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.0.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>

Re: How to unsubscribe

2020-05-06 Thread Denny Lee

Hi Fred,

To unsubscribe, could you please email: user-unsubscr...@spark.apache.org
(for more information, please refer to
https://spark.apache.org/community.html).

Thanks!
Denny

On Wed, May 6, 2020 at 10:12 AM Fred Liu  wrote:

> Hi guys
>
>
>
> -
>
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
>
> *From:* Fred Liu 
> *Sent:* Wednesday, May 6, 2020 10:10 AM
> *To:* user@spark.apache.org
> *Subject:* Unsubscribe
>
>
>
> *[External E-mail]*
>
> *CAUTION: This email originated from outside the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.*
>
>
>
>
>

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Denny Lee

There are a number of really good datasets already available including (but
not limited to):
- South Korea COVID-19 Dataset

- 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns
Hopkins CSSE 
- COVID-19 Open Research Dataset Challenge (CORD-19)


BTW, I had co-presented in a recent tech talk on Analyzing COVID-19: Can
the Data Community Help? 

In the US, there is a good resource Coronavirus in the United States:
Mapping the COVID-19 outbreak
 and
there are various global starter projects on Reddit's r/CovidProjects
.

There are a lot of good projects that we can all help individually or
together.  I would suggest to see what hospitals/academic institutions that
are doing analysis in your local region.  Even if you're analyzing public
worldwide data,  how it acts in your local region will often be different.







On Thu, Mar 26, 2020 at 12:30 PM Rajev Agarwal 
wrote:

> Actually I thought these sites exist look at John's hopkins and
> worldometers
>
> On Thu, Mar 26, 2020, 2:27 PM Zahid Rahman  wrote:
>
>>
>> "We can then donate this to WHO or others and we can make it very modular
>> though microservices etc."
>>
>> I have no interest because there are 8 million muslims locked up in their
>> home for 8 months by the Hindutwa (Indians)
>> You didn't take any notice of them.
>> Now you are locked up in your home and you want to contribute to the WHO.
>> The same WHO and you who didn't take any notice of the 8 million Kashmiri
>> Muslims.
>> The daily rapes of women and the imprisonment and torture of  men.
>>
>> Indian is the most dangerous country for women.
>>
>>
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>> 
>>
>>
>> On Thu, 26 Mar 2020 at 14:53, Mich Talebzadeh 
>> wrote:
>>
>>> Thanks but nobody claimed we can fix it. However, we can all contribute
>>> to it. When it utilizes the cloud then it become a global digitization
>>> issue.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 26 Mar 2020 at 14:43, Laurent Bastien Corbeil <
>>> bastiencorb...@gmail.com> wrote:
>>>
 People in tech should be more humble and admit this is not something
 they can fix. There's already plenty of visualizations, dashboards etc
 showing the spread of the virus. This is not even a big data problem, so
 Spark would have limited use.

 On Thu, Mar 26, 2020 at 10:37 AM Sol Rodriguez 
 wrote:

> IMO it's not about technology, it's about data... if we don't have
> access to the data there's no point throwing "microservices" and "kafka" 
> at
> the problem. You might find that the most effective analysis might be
> delivered through an excel sheet ;)
> So before technology I'd suggest to get access to sources and then
> figure out how to best exploit them and deliver the information to the
> right people
>
> On Thu, Mar 26, 2020 at 2:29 PM Chenguang He 
> wrote:
>
>> Have you taken a look at this (
>> https://coronavirus.1point3acres.com/en/test  )?
>>
>> They have a visualizer with a very basic analysis of the outbreak.
>>
>> On Thu, Mar 26, 2020 at 8:54 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> Agreed, computers are not the end but means to an end. We all have
>>> to start from somewhere. It all helps.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for any loss, damage or destruction of data or any other property which 
>>> may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Denny Lee

+1 (non-binding)

On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon  wrote:

> The proposal itself seems good as the factors to consider, Thanks Michael.
>
> Several concerns mentioned look good points, in particular:
>
> > ... assuming that this is for public stable APIs, not APIs that are
> marked as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as
> Experimental, Unstable, etc. and the implication of each is still
> effective. If it's for stable APIs, it makes sense to me as well.
>
> > ... can we expand on 'when' an API change can occur ?  Since we are
> proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver,
> the delta compared to semver will have to be clarified to avoid different
> personal interpretations of the somewhat general principles.
>
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
> Apache Spark 3.0+? ...
>
> Assuming these concerns will be addressed, +1 (binding).
>
>
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro 님이 작성:
>
>> +1 (non-binding)
>>
>> Bests,
>> Takeshi
>>
>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>> gengliang.w...@databricks.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Gengliang
>>>
>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia 
>>> wrote:
>>>
 +1 as well.

 Matei

 On Mar 9, 2020, at 12:05 AM, Wenchen Fan  wrote:

 +1 (binding), assuming that this is for public stable APIs, not APIs
 that are marked as unstable, evolving, etc.

 On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía  wrote:

> +1 (non-binding)
>
> Michael's section on the trade-offs of maintaining / removing an API
> are one of
> the best reads I have seeing in this mailing list. Enthusiast +1
>
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun 
> wrote:
> >
> > This new policy has a good indention, but can we narrow down on the
> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark
> 1.4 and 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level
> difficulty, and it's nice.
> >
> > However, for the other cases, it sounds like `recommending older
> APIs as much as possible` due to the following.
> >
> >  > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should
> aim not to mislead the users and 3rd party library developers to say 
> "older
> is better".
> >
> > Technically, I'm wondering who will use new APIs in their examples
> (of books and StackOverflow) if they need to write an additional warning
> like `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan 
> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we
> are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on
> 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
> mich...@databricks.com> wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic
> Versioning policy and adopt it as the rubric that should be used when
> deciding to break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As
> this is a procedural vote, the measure will pass if there are more
> favourable votes than unfavourable ones. PMC votes are binding, but the
> community is encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark

Re: Block a user from spark-website who repeatedly open the invalid same PR

2020-01-26 Thread Denny Lee

+1

On Sun, Jan 26, 2020 at 09:59 Nicholas Chammas 
wrote:

> +1
>
> I think y'all have shown this person more patience than is merited by
> their behavior.
>
> On Sun, Jan 26, 2020 at 5:16 AM Takeshi Yamamuro 
> wrote:
>
>> +1
>>
>> Bests,
>> Takeshi
>>
>> On Sun, Jan 26, 2020 at 3:05 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I am thinking about opening an infra ticket to block @DataWanderer
>>>  user from spark-website
>>> repository, who repeatedly opens the invalid PRs.
>>>
>>> The PR is about fix a documentation in the released version 2.4.4, and
>>> it should be fixed in spark
>>> repository. It was explained multiple times by me and Sean but this user
>>> opens the same PR
>>> repeatedly which brings overhead to the dev.
>>>
>>> See the PRs below:
>>>
>>> https://github.com/apache/spark-website/pull/257
>>> https://github.com/apache/spark-website/pull/256
>>> https://github.com/apache/spark-website/pull/255
>>> https://github.com/apache/spark-website/pull/254
>>> https://github.com/apache/spark-website/pull/250
>>> https://github.com/apache/spark-website/pull/249
>>>
>>> If there is no objection, and this guy opens the PR again, I am going to
>>> open an infra ticket to block
>>> this guy from spark-webiste repo to prevent such behaviours.
>>>
>>> Please let me know if you guys have any concerns.
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee

+1

On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:

> +1
>
> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>
>> +1 and the draft sounds good
>>
>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>
>>> Here is the draft announcement:
>>>
>>> ===
>>> Plan for dropping Python 2 support
>>>
>>> As many of you already knew, Python core development team and many
>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>> compatibility is an increasing burden and it essentially limits the use of
>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>> coming, we plan to eventually drop Python 2 support as well. The current
>>> plan is as follows:
>>>
>>> * In the next major release in 2019, we will deprecate Python 2 support.
>>> PySpark users will see a deprecation warning if Python 2 is used. We will
>>> publish a migration guide for PySpark users to migrate to Python 3.
>>> * We will drop Python 2 support in a future release in 2020, after
>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>> used.
>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>>> might not take patches that are specific to Python 2.
>>> ===
>>>
>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>> Spark website and announce it here. Let me know if you think we should do a
>>> VOTE instead.
>>>
>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>>
 I created https://issues.apache.org/jira/browse/SPARK-27884 to track
 the work.

 On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
 wrote:

> We don’t usually reference a future release on website
>
> > Spark website and state that Python 2 is deprecated in Spark 3.0
>
> I suspect people will then ask when is Spark 3.0 coming out then.
> Might need to provide some clarity on that.
>

 We can say the "next major release in 2019" instead of Spark 3.0. Spark
 3.0 timeline certainly requires a new thread to discuss.


>
>
> --
> *From:* Reynold Xin 
> *Sent:* Thursday, May 30, 2019 12:59:14 AM
> *To:* shane knapp
> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
> Fen; Xiangrui Meng; dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1 on Xiangrui’s plan.
>
> On Thu, May 30, 2019 at 7:55 AM shane knapp 
> wrote:
>
>> I don't have a good sense of the overhead of continuing to support
>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>
>>> from the build/test side, it will actually be pretty easy to
>> continue support for python2.7 for spark 2.x as the feature sets won't be
>> expanding.
>>
>
>> that being said, i will be cracking a bottle of champagne when i can
>> delete all of the ansible and anaconda configs for python2.x.  :)
>>
>
 On the development side, in a future release that drops Python 2
 support we can remove code that maintains python 2/3 compatibility and
 start using python 3 only features, which is also quite exciting.


>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee

+1

On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:

> +1
>
> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>
>> +1 and the draft sounds good
>>
>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>
>>> Here is the draft announcement:
>>>
>>> ===
>>> Plan for dropping Python 2 support
>>>
>>> As many of you already knew, Python core development team and many
>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>> compatibility is an increasing burden and it essentially limits the use of
>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>> coming, we plan to eventually drop Python 2 support as well. The current
>>> plan is as follows:
>>>
>>> * In the next major release in 2019, we will deprecate Python 2 support.
>>> PySpark users will see a deprecation warning if Python 2 is used. We will
>>> publish a migration guide for PySpark users to migrate to Python 3.
>>> * We will drop Python 2 support in a future release in 2020, after
>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>> used.
>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>>> might not take patches that are specific to Python 2.
>>> ===
>>>
>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>> Spark website and announce it here. Let me know if you think we should do a
>>> VOTE instead.
>>>
>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>>
 I created https://issues.apache.org/jira/browse/SPARK-27884 to track
 the work.

 On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
 wrote:

> We don’t usually reference a future release on website
>
> > Spark website and state that Python 2 is deprecated in Spark 3.0
>
> I suspect people will then ask when is Spark 3.0 coming out then.
> Might need to provide some clarity on that.
>

 We can say the "next major release in 2019" instead of Spark 3.0. Spark
 3.0 timeline certainly requires a new thread to discuss.


>
>
> --
> *From:* Reynold Xin 
> *Sent:* Thursday, May 30, 2019 12:59:14 AM
> *To:* shane knapp
> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
> Fen; Xiangrui Meng; dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1 on Xiangrui’s plan.
>
> On Thu, May 30, 2019 at 7:55 AM shane knapp 
> wrote:
>
>> I don't have a good sense of the overhead of continuing to support
>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>
>>> from the build/test side, it will actually be pretty easy to
>> continue support for python2.7 for spark 2.x as the feature sets won't be
>> expanding.
>>
>
>> that being said, i will be cracking a bottle of champagne when i can
>> delete all of the ansible and anaconda configs for python2.x.  :)
>>
>
 On the development side, in a future release that drops Python 2
 support we can remove code that maintains python 2/3 compatibility and
 start using python 3 only features, which is also quite exciting.


>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-29 Thread Denny Lee

+1

yay - let's do it!

On Tue, Jan 29, 2019 at 6:28 AM Xiangrui Meng  wrote:

> Hi all,
>
> I want to call for a vote of SPARK-25994
> . It introduces a new
> DataFrame-based component to Spark, which supports property graph
> construction, Cypher queries, and graph algorithms. The proposal
> 
> was made available on user@
> 
> and dev@
> 
>  to
> collect input. You can also find a sketch design doc attached to
> SPARK-26028 .
>
> The vote will be up for the next 72 hours. Please reply with your vote:
>
> +1: Yeah, let's go forward and implement the SPIP.
> +0: Don't really care.
> -1: I don't think this is a good idea because of the following technical
> reasons.
>
> Best,
> Xiangrui
>

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Denny Lee

+1


On Wed, Jan 9, 2019 at 4:30 AM Dongjoon Hyun 
wrote:

> +1
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 8, 2019 at 6:30 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Wed, Jan 9, 2019 at 3:37 AM DB Tsai  wrote:
>>
>>> +1
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> --
>>> Web: https://www.dbtsai.com
>>> PGP Key ID: 0x5CED8B896A6BDFA0
>>>
>>> On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.3.
>>> >
>>> > The vote is open until January 11 11:30AM (PST) and passes if a
>>> majority +1 PMC votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.2.3
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.2.3-rc1 (commit
>>> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>>> > https://github.com/apache/spark/tree/v2.2.3-rc1
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1295
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>>> >
>>> > The list of bug fixes going into 2.2.3 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.2.3?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.2.3 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.2.3
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Denny Lee

+1

On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma 
wrote:

> +1
>
> On Wed, 31 Oct 2018 at 11:56, Reynold Xin  wrote:
>
>> +1
>>
>> Look forward to the release!
>>
>>
>>
>> On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.4.0.
>>>
>>> The vote is open until November 1 PST and passes if a majority +1 PMC
>>> votes are cast, with
>>> a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.4.0-rc5 (commit
>>> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
>>> https://github.com/apache/spark/tree/v2.4.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1291
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>>>
>>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.0?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>

Re: welcome a new batch of committers

2018-10-03 Thread Denny Lee

Congratulations!

On Wed, Oct 3, 2018 at 05:26 Dongjin Lee  wrote:

> Congratulations to ALL!!
>
> - Dongjin
>
> On Wed, Oct 3, 2018 at 7:48 PM Jack Kolokasis 
> wrote:
>
>> Congratulations to all !!
>>
>> -Iacovos
>>
>> On 03/10/2018 12:54 μμ, Ted Yu wrote:
>>
>> Congratulations to all !
>>
>>  Original message 
>> From: Jungtaek Lim  
>> Date: 10/3/18 2:41 AM (GMT-08:00)
>> To: Marco Gaido  
>> Cc: dev  
>> Subject: Re: welcome a new batch of committers
>>
>> Congrats all! You all deserved it.
>> On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido  wrote:
>>
>>> Congrats you all!
>>>
>>> Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh <
>>> vii...@gmail.com> ha scritto:
>>>

 Congratulations to all new committers!


 rxin wrote
 > Hi all,
 >
 > The Apache Spark PMC has recently voted to add several new committers
 to
 > the project, for their contributions:
 >
 > - Shane Knapp (contributor to infra)
 > - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
 > - Kazuaki Ishizaki (contributor to Spark SQL)
 > - Xingbo Jiang (contributor to Spark Core and SQL)
 > - Yinan Li (contributor to Spark on Kubernetes)
 > - Takeshi Yamamuro (contributor to Spark SQL)
 >
 > Please join me in welcoming them!





 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>> --
>> Iacovos Kolokasis
>> Email: koloka...@ics.forth.gr
>> Postgraduate Student CSD, University of Crete
>> Researcher in CARV Lab ICS FORTH
>>
>> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare: 
> www.slideshare.net/dongjinleekr
> *
>

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-30 Thread Denny Lee

+1 (non-binding)


On Sat, Sep 29, 2018 at 10:24 AM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> +1
>
> Stavros
>
> On Sat, Sep 29, 2018 at 5:59 AM, Sean Owen  wrote:
>
>> +1, with comments:
>>
>> There are 5 critical issues for 2.4, and no blockers:
>> SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
>> SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
>> SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
>> SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
>> SPARK-25323 ML 2.4 QA: API: Python API coverage
>>
>> Xiangrui, is SPARK-25378 important enough we need to get it into 2.4?
>>
>> I found two issues resolved for 2.4.1 that got into this RC, so marked
>> them as resolved in 2.4.0.
>>
>> I checked the licenses and notice and they look correct now in source
>> and binary builds.
>>
>> The 2.12 artifacts are as I'd expect.
>>
>> I ran all tests for 2.11 and 2.12 and they pass with -Pyarn
>> -Pkubernetes -Pmesos -Phive -Phadoop-2.7 -Pscala-2.12.
>>
>>
>>
>>
>> On Thu, Sep 27, 2018 at 10:00 PM Wenchen Fan  wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.4.0.
>> >
>> > The vote is open until October 1 PST and passes if a majority +1 PMC
>> votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.0-rc2 (commit
>> 42f25f309e91c8cde1814e3720099ac1e64783da):
>> > https://github.com/apache/spark/tree/v2.4.0-rc2
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1287
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>> >
>> > The list of bug fixes going into 2.4.0 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.4.0?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.4.0 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.0
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-20 Thread Denny Lee

+1

On Thu, Sep 20, 2018 at 9:55 AM Xiao Li  wrote:

> +1
>
>
> John Zhuge  于2018年9月19日周三 下午1:17写道：
>
>> +1 (non-binding)
>>
>> Built on Ubuntu 16.04 with Maven flags: -Phadoop-2.7 -Pmesos -Pyarn
>> -Phive-thriftserver -Psparkr -Pkinesis-asl -Phadoop-provided
>>
>> java version "1.8.0_181"
>> Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
>>
>>
>> On Wed, Sep 19, 2018 at 2:31 AM Takeshi Yamamuro 
>> wrote:
>>
>>> +1
>>>
>>> I also checked `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
>>> -Phive-thriftserve` on the openjdk below/macOSv10.12.6
>>>
>>> $ java -version
>>> java version "1.8.0_181"
>>> Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
>>>
>>>
>>> On Wed, Sep 19, 2018 at 10:45 AM Dongjoon Hyun 
>>> wrote:
>>>
 +1.

 I tested with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
 -Phive-thriftserve` on OpenJDK(1.8.0_181)/CentOS 7.5.

 I hit the following test case failure once during testing, but it's not
 persistent.

 KafkaContinuousSourceSuite
 ...
 subscribing topic by name from earliest offsets (failOnDataLoss:
 false) *** FAILED ***

 Thank you, Saisai.

 Bests,
 Dongjoon.

 On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao 
 wrote:

> +1 from my own side.
>
> Thanks
> Saisai
>
> Wenchen Fan  于2018年9月18日周二 上午9:34写道：
>
>> +1. All the blocker issues are all resolved in 2.3.2 AFAIK.
>>
>> On Tue, Sep 18, 2018 at 9:23 AM Sean Owen  wrote:
>>
>>> +1 . Licenses and sigs check out as in previous 2.3.x releases. A
>>> build from source with most profiles passed for me.
>>> On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.3.2.
>>> >
>>> > The vote is open until September 21 PST and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.3.2
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.3.2-rc6 (commit
>>> 02b510728c31b70e6035ad541bfcdc2b59dcd79a):
>>> > https://github.com/apache/spark/tree/v2.3.2-rc6
>>> >
>>> > The release files, including signatures, digests, etc. can be
>>> found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc6-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1286/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc6-docs/
>>> >
>>> > The list of bug fixes going into 2.3.2 can be found at the
>>> following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>> >
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by
>>> taking
>>> > an existing Spark workload and running on this release candidate,
>>> then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and
>>> install
>>> > the current RC and see if anything important breaks, in the
>>> Java/Scala
>>> > you can add the staging repository to your projects resolvers and
>>> test
>>> > with the RC (make sure to clean up the artifact cache before/after
>>> so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.3.2?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.3.2 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for
>>> "Target Version/s" = 2.3.2
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility
>>> should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-18 Thread Denny Lee

+1 (non-binding)
On Tue, Jul 17, 2018 at 23:04 John Zhuge  wrote:

> +1 (non-binding)
>
> On Mon, Jul 16, 2018 at 8:04 PM Saisai Shao 
> wrote:
>
>> I will put my +1 on this RC.
>>
>> For the test failure fix, I will include it if there's another RC.
>>
>> Sean Owen  于2018年7月16日周一 下午10:47写道：
>>
> OK, hm, will try to get to the bottom of it. But if others can build this
>>> module successfully, I give a +1 . The test failure is inevitable here and
>>> should not block release.
>>>
>>> On Sun, Jul 15, 2018 at 9:39 PM Saisai Shao 
>>> wrote:
>>>
>> Hi Sean,

 I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the
 errors you pasted here. I'm not sure how it happens.

 Sean Owen  于2018年7月16日周一 上午6:30写道：

>>> Looks good to me, with the following caveats.
>
> First see the discussion on
> https://issues.apache.org/jira/browse/SPARK-24813 ; the
> flaky HiveExternalCatalogVersionsSuite will probably fail all the time
> right now. That's not a regression and is a test-only issue, so don't 
> think
> it must block the release. However if this fix holds up, and we need
> another RC, worth pulling in for sure.
>
> Also is anyone seeing this while building and testing the Spark SQL +
> Kafka module? I see this error even after a clean rebuild. I sort of get
> what the error is saying but can't figure out why it would only happen at
> test/runtime. Haven't seen it before.
>
> [error] missing or invalid dependency detected while loading class
> file 'MetricsSystem.class'.
>
> [error] Could not access term eclipse in package org,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.
>
> [error] missing or invalid dependency detected while loading class
> file 'MetricsSystem.class'.
>
> [error] Could not access term jetty in value org.eclipse,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.eclipse
>
> On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao 
> wrote:
>
 Please vote on releasing the following candidate as Apache Spark
>> version 2.3.2.
>>
>> The vote is open until July 20 PST and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.3.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.3.2-rc3
>> (commit b3726dadcf2997f20231873ec6e057dba433ae64):
>> https://github.com/apache/spark/tree/v2.3.2-rc3
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1278/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-docs/
>>
>> The list of bug fixes going into 2.3.2 can be found at the following
>> URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>
>> Note. RC2 was cancelled because of one blocking issue SPARK-24781
>> during release preparation.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.2?
>> ===
>>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Denny Lee

+1

On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas 
wrote:

> I'll give that a try, but I'll still have to figure out what to do if none
> of the release builds work with hadoop-aws, since Flintrock deploys Spark
> release builds to set up a cluster. Building Spark is slow, so we only do
> it if the user specifically requests a Spark version by git hash. (This is
> basically how spark-ec2 did things, too.)
>
>
> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin  wrote:
>
>> If you're building your own Spark, definitely try the hadoop-cloud
>> profile. Then you don't even need to pull anything at runtime,
>> everything is already packaged with Spark.
>>
>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>  wrote:
>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
>> > either (even building with -Phadoop-2.7). I guess I’ve been relying on
>> an
>> > unsupported pattern and will need to figure something else out going
>> forward
>> > in order to use s3a://.
>> >
>> >
>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin 
>> wrote:
>> >>
>> >> I have personally never tried to include hadoop-aws that way. But at
>> >> the very least, I'd try to use the same version of Hadoop as the Spark
>> >> build (2.7.3 IIRC). I don't really expect a different version to work,
>> >> and if it did in the past it definitely was not by design.
>> >>
>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>> >>  wrote:
>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>> release,
>> >> > so
>> >> > it appears something has changed since then.
>> >> >
>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>> >> >
>> >> > My goal here is simply to confirm that this release of Spark works
>> with
>> >> > hadoop-aws like past releases did, particularly for Flintrock users
>> who
>> >> > use
>> >> > Spark with S3A.
>> >> >
>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>> builds
>> >> > with
>> >> > every Spark release. If the -hadoop2.7 release build won’t work with
>> >> > hadoop-aws anymore, are there plans to provide a new build type that
>> >> > will?
>> >> >
>> >> > Apologies if the question is poorly formed. I’m batting a bit
>> outside my
>> >> > league here. Again, my goal is simply to confirm that I/my users
>> still
>> >> > have
>> >> > a way to use s3a://. In the past, that way was simply to call pyspark
>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>> similar.
>> >> > If
>> >> > that will no longer work, I’m trying to confirm that the change of
>> >> > behavior
>> >> > is intentional or acceptable (as a review for the Spark project) and
>> >> > figure
>> >> > out what I need to change (as due diligence for Flintrock’s users).
>> >> >
>> >> > Nick
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin 
>> >> > wrote:
>> >> >>
>> >> >> Using the hadoop-aws package is probably going to be a little more
>> >> >> complicated than that. The best bet is to use a custom build of
>> Spark
>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>> >> >> looking at some nasty dependency issues, especially if you end up
>> >> >> mixing different versions of Hadoop.
>> >> >>
>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>> >> >>  wrote:
>> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
>> RC4
>> >> >> > using
>> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
>> >> >> > some
>> >> >> > errors.
>> >> >> >
>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>> >> >> >
>> >> >> > 
>> >> >> >
>> >> >> > :: problems summary ::
>> >> >> >  WARNINGS
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>> >> >> > [NOT FOUND  ]
>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>> >> >> >  local-m2-cache: tried
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>> >> >> >
>> >> >> > I’d guess I’m

Re: Revisiting Online serving of Spark models?

2018-05-30 Thread Denny Lee

I most likely will not be able to join SF next week but definitely up for a
session after Summit in Seattle to dive further into this, eh?!

On Wed, May 30, 2018 at 9:32 AM Felix Cheung 
wrote:

> Hi!
>
> Thank you! Let’s meet then
>
> June 6 4pm
>
> Moscone West Convention Center
> 800 Howard Street, San Francisco, CA 94103
> 
>
> Ground floor (outside of conference area - should be available for all) -
> we will meet and decide where to go
>
> (Would not send invite because that would be too much noise for dev@)
>
> To paraphrase Joseph, we will use this to kick off the discusssion and
> post notes after and follow up online. As for Seattle, I would be very
> interested to meet in person lateen and discuss ;)
>
>
> _
> From: Saikat Kanjilal 
> Sent: Tuesday, May 29, 2018 11:46 AM
>
> Subject: Re: Revisiting Online serving of Spark models?
> To: Maximiliano Felice 
> Cc: Felix Cheung , Holden Karau <
> hol...@pigscanfly.ca>, Joseph Bradley , Leif Walsh
> , dev 
>
>
>
> Would love to join but am in Seattle, thoughts on how to make this work?
>
> Regards
>
> Sent from my iPhone
>
> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> maximilianofel...@gmail.com> wrote:
>
> Big +1 to a meeting with fresh air.
>
> Could anyone send the invites? I don't really know which is the place
> Holden is talking about.
>
> 2018-05-29 14:27 GMT-03:00 Felix Cheung :
>
>> You had me at blue bottle!
>>
>> _
>> From: Holden Karau 
>> Sent: Tuesday, May 29, 2018 9:47 AM
>> Subject: Re: Revisiting Online serving of Spark models?
>> To: Felix Cheung 
>> Cc: Saikat Kanjilal , Maximiliano Felice <
>> maximilianofel...@gmail.com>, Joseph Bradley ,
>> Leif Walsh , dev 
>>
>>
>>
>> I'm down for that, we could all go for a walk maybe to the mint plazaa
>> blue bottle and grab coffee (if the weather holds have our design meeting
>> outside :p)?
>>
>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung 
>> wrote:
>>
>>> Bump.
>>>
>>> --
>>> *From:* Felix Cheung 
>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>>> *Cc:* Leif Walsh; Holden Karau; dev
>>>
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> Hi! How about we meet the community and discuss on June 6 4pm at (near)
>>> the Summit?
>>>
>>> (I propose we meet at the venue entrance so we could accommodate people
>>> might not be in the conference)
>>>
>>> --
>>> *From:* Saikat Kanjilal 
>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
>>> *To:* Maximiliano Felice
>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>
>>> I’m in the same exact boat as Maximiliano and have use cases as well for
>>> model serving and would love to join this discussion.
>>>
>>> Sent from my iPhone
>>>
>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
>>> maximilianofel...@gmail.com> wrote:
>>>
>>> Hi!
>>>
>>> I'm don't usually write a lot on this list but I keep up to date with
>>> the discussions and I'm a heavy user of Spark. This topic caught my
>>> attention, as we're currently facing this issue at work. I'm attending to
>>> the summit and was wondering if it would it be possible for me to join that
>>> meeting. I might be able to share some helpful usecases and ideas.
>>>
>>> Thanks,
>>> Maximiliano Felice
>>>
>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh 
>>> escribió:
>>>
 I’m with you on json being more readable than parquet, but we’ve had
 success using pyarrow’s parquet reader and have been quite happy with it so
 far. If your target is python (and probably if not now, then soon, R), you
 should look in to it.

 On Mon, May 21, 2018 at 16:52 Joseph Bradley 
 wrote:

> Regarding model reading and writing, I'll give quick thoughts here:
> * Our approach was to use the same format but write JSON instead of
> Parquet.  It's easier to parse JSON without Spark, and using the same
> format simplifies architecture.  Plus, some people want to check files 
> into
> version control, and JSON is nice for that.
> * The reader/writer APIs could be extended to take format parameters
> (just like DataFrame reader/writers) to handle JSON (and maybe, 
> eventually,
> handle Parquet in the online serving setting).
>
> This would be a big project, so proposing a SPIP might be best.  If
> people are around at the Spark Summit, that could be a good time to meet 
> up
> & then post notes back to the dev list.
>
> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
> felixcheun...@hotmail.com> wrote:
>
>> Specifically I’d like bring part of the discussion to Model and
>> PipelineModel, and various ModelReader and SharedReadWrite 
>> implementations
>>

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Denny Lee

Awesome - congrats Zhenhua!

On Sun, Apr 1, 2018 at 10:33 PM 叶先进  wrote:

> Big congs.
>
> > On Apr 2, 2018, at 1:28 PM, Wenchen Fan  wrote:
> >
> > Hi all,
> >
> > The Spark PMC recently added Zhenhua Wang as a committer on the project.
> Zhenhua is the major contributor of the CBO project, and has been
> contributing across several areas of Spark for a while, focusing especially
> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua!
> >
> > Wenchen
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Denny Lee

+1 (non-binding)

On Fri, Feb 23, 2018 at 07:08 Josh Goldsborough <
joshgoldsboroughs...@gmail.com> wrote:

> New to testing out Spark RCs for the community but I was able to run some
> of the basic unit tests without error so for what it's worth, I'm a +1.
>
> On Thu, Feb 22, 2018 at 4:23 PM, Sameer Agarwal 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC
>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc5:
>> https://github.com/apache/spark/tree/v2.3.0-rc5
>> (992447fb30ee9ebb3cf794f2d06f4d63a2d792db)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1266/
>>
>> The documentation corresponding to this release can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>
>

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Denny Lee

Note the --packages option works for both PySpark and Spark (Scala).  For
the SparkLauncher class, you should be able to include packages ala:

spark.addSparkArg("--packages", "graphframes:0.5.0-spark2.0-s_2.11")


On Sun, Feb 18, 2018 at 3:30 PM xiaobo <guxiaobo1...@qq.com> wrote:

> Hi Denny,
> The pyspark script uses the --packages option to load graphframe library,
> what about the SparkLauncher class?
>
>
>
> ------ Original --
> *From:* Denny Lee <denny.g@gmail.com>
> *Date:* Sun,Feb 18,2018 11:07 AM
> *To:* 94035420 <guxiaobo1...@qq.com>
> *Cc:* user@spark.apache.org <user@spark.apache.org>
> *Subject:* Re: Does Pyspark Support Graphx?
> That’s correct - you can use GraphFrames though as it does support
> PySpark.
> On Sat, Feb 17, 2018 at 17:36 94035420 <guxiaobo1...@qq.com> wrote:
>
>> I can not find anything for graphx module in the python API document,
>> does it mean it is not supported yet?
>>
>

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-18 Thread Denny Lee

+1 (non-binding)

Built and tested on macOS and Ubuntu.


On Sun, Feb 18, 2018 at 3:19 PM Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> Built and tested on macOS 10.12.6 Java 8 (build 1.8.0_111). No
> regressions detected so far.
>
>
> On 18 February 2018 at 16:12, Sean Owen  wrote:
>
>> +1 from me as last time, same outcome.
>>
>> I saw one test fail, but passed on a second run, so just seems flaky.
>>
>> - subscribing topic by name from latest offsets (failOnDataLoss: true)
>> *** FAILED ***
>>   Error while stopping stream:
>>   query.exception() is not empty after clean stop:
>> org.apache.spark.sql.streaming.StreamingQueryException: Writing job failed.
>>   === Streaming Query ===
>>   Identifier: [id = cdd647ec-d7f0-437b-9950-ce9d79d691d1, runId =
>> 3a7cf7ec-670a-48b6-8185-8b6cd7e27f96]
>>   Current Committed Offsets: {KafkaSource[Subscribe[topic-4]]:
>> {"topic-4":{"2":1,"4":1,"1":0,"3":0,"0":2}}}
>>   Current Available Offsets: {}
>>
>>   Current State: TERMINATED
>>   Thread State: RUNNABLE
>>
>> On Sat, Feb 17, 2018 at 3:41 PM Sameer Agarwal 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
>>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc4:
>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1265/
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>>> currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala you
>>> can add the staging repository to your projects resolvers and test with the
>>> RC (make sure to clean up the artifact cache before/after so you don't end
>>> up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>>> appropriate.
>>>
>>> ===
>>> Why is my bug not fixed?
>>> ===
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.2.0. That being said, if
>>> there is something which is a regression from 2.2.0 and has not been
>>> correctly targeted please ping me or a committer to help target the issue
>>> (you can see the open issues listed as impacting Spark 2.3.0 at
>>> https://s.apache.org/WmoI).
>>>
>>
>

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee

Most likely not as most of the effort is currently on GraphFrames  - a
great blog post on the what GraphFrames offers can be found at:
https://databricks.com/blog/2016/03/03/introducing-graphframes.html.   Is
there a particular scenario or situation that you're addressing that
requires GraphX vs. GraphFrames?

On Sat, Feb 17, 2018 at 8:26 PM xiaobo <guxiaobo1...@qq.com> wrote:

> Thanks Denny, will it be supported in the near future?
>
>
>
> -- Original ------
> *From:* Denny Lee <denny.g@gmail.com>
> *Date:* Sun,Feb 18,2018 11:05 AM
> *To:* 94035420 <guxiaobo1...@qq.com>
> *Cc:* user@spark.apache.org <user@spark.apache.org>
> *Subject:* Re: Does Pyspark Support Graphx?
>
> That’s correct - you can use GraphFrames though as it does support
> PySpark.
> On Sat, Feb 17, 2018 at 17:36 94035420 <guxiaobo1...@qq.com> wrote:
>
>> I can not find anything for graphx module in the python API document,
>> does it mean it is not supported yet?
>>
>

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee

That’s correct - you can use GraphFrames though as it does support PySpark.
On Sat, Feb 17, 2018 at 17:36 94035420  wrote:

> I can not find anything for graphx module in the python API document, does
> it mean it is not supported yet?
>

[jira] [Updated] (SPARK-21866) SPIP: Image support in Spark

2018-01-27 Thread Denny Lee (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Denny Lee updated SPARK-21866:
--
Description:
h2. Background and motivation

As Apache Spark is being used more and more in the industry, some new use cases
are emerging for different data formats beyond the traditional SQL types or the
numerical types (vectors and matrices). Deep Learning applications commonly
deal with image processing. A number of projects add some Deep Learning
capabilities to Spark (see list below), but they struggle to communicate with
each other or with MLlib pipelines because there is no standard way to
represent an image in Spark DataFrames. We propose to federate efforts for
representing images in Spark by defining a representation that caters to the
most common needs of users and library developers.

This SPIP proposes a specification to represent images in Spark DataFrames and
Datasets (based on existing industrial standards), and an interface for loading
sources of images. It is not meant to be a full-fledged image processing
library, but rather the core description that other libraries and users can
rely on. Several packages already offer various processing facilities for
transforming images or doing more complex operations, and each has various
design tradeoffs that make them better as standalone solutions.

This project is a joint collaboration between Microsoft and Databricks, which
have been testing this design in two open source packages: MMLSpark and Deep
Learning Pipelines.

The proposed image format is an in-memory, decompressed representation that
targets low-level applications. It is significantly more liberal in memory
usage than compressed image representations such as JPEG, PNG, etc., but it
allows easy communication with popular image processing libraries and has no
decoding overhead.
h2. Targets users and personas:

Data scientists, data engineers, library developers.
The following libraries define primitives for loading and representing images,
and will gain from a common interchange format (in alphabetical order):
* BigDL
* DeepLearning4J
* Deep Learning Pipelines
* MMLSpark
* TensorFlow (Spark connector)
* TensorFlowOnSpark
* TensorFrames
* Thunder

h2. Goals:
* Simple representation of images in Spark DataFrames, based on pre-existing
industrial standards (OpenCV)
* This format should eventually allow the development of high-performance
integration points with image processing libraries such as libOpenCV, Google
TensorFlow, CNTK, and other C libraries.
* The reader should be able to read popular formats of images from distributed
sources.

h2. Non-Goals:

Images are a versatile medium and encompass a very wide range of formats and
representations. This SPIP explicitly aims at the most common use case in the
industry currently: multi-channel matrices of binary, int32, int64, float or
double data that can fit comfortably in the heap of the JVM:
* the total size of an image should be restricted to less than 2GB (roughly)
* the meaning of color channels is application-specific and is not mandated by
the standard (in line with the OpenCV standard)
* specialized formats used in meteorology, the medical field, etc. are not
supported
* this format is specialized to images and does not attempt to solve the more
general problem of representing n-dimensional tensors in Spark

h2. Proposed API changes

We propose to add a new package in the package structure, under the MLlib
project:
{{org.apache.spark.image}}
h3. Data format

We propose to add the following structure:

imageSchema = StructType([
* StructField("mode", StringType(), False),
** The exact representation of the data.
** The values are described in the following OpenCV convention. Basically, the
type has both "depth" and "number of channels" info: in particular, type
"CV_8UC3" means "3 channel unsigned bytes". BGRA format would be CV_8UC4 (value
32 in the table) with the channel order specified by convention.
** The exact channel ordering and meaning of each channel is dictated by
convention. By default, the order is RGB (3 channels) and BGRA (4 channels).
If the image failed to load, the value is the empty string "".

* StructField("origin", StringType(), True),
** Some information about the origin of the image. The content of this is
application-specific.
** When the image is loaded from files, users should expect to find the file
name in this field.

* StructField("height", IntegerType(), False),
** the height of the image, pixels
** If the image fails to load, the value is -1.

* StructField("width", IntegerType(), False),
** the width of the image, pixels
** If the image fails to load, the value is -1.

* StructField("nChannels", IntegerType(), False),
** The number of ch

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-05 Thread Denny Lee

+1 (non-binding)

On Wed, Oct 4, 2017 at 11:08 PM Holden Karau  wrote:

> Awesome, thanks for digging into the packaging on the R side in more
> detail. I'll look into how to update the keys file as well.
>
> On Wed, Oct 4, 2017 at 10:46 PM Felix Cheung 
> wrote:
>
>> +1
>>
>> Tested SparkR package manually on multiple platforms and checked
>> different Hadoop release jar.
>>
>> And previously tested the last RC on different R releases (see the last
>> RC vote thread)
>>
>> I found some differences in bin release jars created by the different
>> options when running the make-release script, created this JIRA to track
>> https://issues.apache.org/jira/browse/SPARK-22202
>>
>> I've checked to confirm these exist in 2.1.1 release so this isn't a
>> regression, and hence my +1.
>>
>> btw, I think we need to update this file for the new keys used in signing
>> this release https://www.apache.org/dist/spark/KEYS
>>
>>
>> _
>> From: Liwei Lin 
>> Sent: Wednesday, October 4, 2017 6:51 PM
>>
>> Subject: Re: [VOTE] Spark 2.1.2 (RC4)
>> To: Spark dev list 
>>
>>
>> +1 (non-binding)
>>
>>
>> Cheers,
>> Liwei
>>
>> On Wed, Oct 4, 2017 at 4:03 PM, Nick Pentreath 
>> wrote:
>>
>>> Ah right! Was using a new cloud instance and didn't realize I was logged
>>> in as root! thanks
>>>
>>> On Tue, 3 Oct 2017 at 21:13 Marcelo Vanzin  wrote:
>>>
 Maybe you're running as root (or the admin account on your OS)?

 On Tue, Oct 3, 2017 at 12:12 PM, Nick Pentreath
  wrote:
 > Hmm I'm consistently getting this error in core tests:
 >
 > - SPARK-3697: ignore directories that cannot be read. *** FAILED ***
 >   2 was not equal to 1 (FsHistoryProviderSuite.scala:146)
 >
 >
 > Anyone else? Any insight? Perhaps it's my set up.
 >
 >>>
 >>>
 >>> On Tue, Oct 3, 2017 at 7:24 AM Holden Karau 
 wrote:

  Please vote on releasing the following candidate as Apache Spark
 version
  2.1.2. The vote is open until Saturday October 7th at 9:00 PST and
 passes if
  a majority of at least 3 +1 PMC votes are cast.

  [ ] +1 Release this package as Apache Spark 2.1.2
  [ ] -1 Do not release this package because ...

  To learn more about Apache Spark, please see
 https://spark.apache.org/

  The tag to be voted on is v2.1.2-rc4
  (2abaea9e40fce81cd4626498e0f5c28a70917499)

  List of JIRA tickets resolved in this release can be found with
 this
  filter.

  The release files, including signatures, digests, etc. can be
 found at:
  https://home.apache.org/~holden/spark-2.1.2-rc4-bin/

  Release artifacts are signed with a key from:
  https://people.apache.org/~holden/holdens_keys.asc

  The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1252

  The documentation corresponding to this release can be found at:
  https://people.apache.org/~holden/spark-2.1.2-rc4-docs/

  FAQ

  How can I help test this release?

  If you are a Spark user, you can help us test this release by
 taking an
  existing Spark workload and running on this release candidate, then
  reporting any regressions.

  If you're working in PySpark you can set up a virtual env and
 install
  the current RC and see if anything important breaks, in the
 Java/Scala you
  can add the staging repository to your projects resolvers and test
 with the
  RC (make sure to clean up the artifact cache before/after so you
 don't end
  up building with a out of date RC going forward).

  What should happen to JIRA tickets still targeting 2.1.2?

  Committers should look at those and triage. Extremely important bug
  fixes, documentation, and API tweaks that impact compatibility
 should be
  worked on immediately. Everything else please retarget to 2.1.3.

  But my bug isn't fixed!??!

  In order to make timely releases, we will typically not hold the
 release
  unless the bug in question is a regression from 2.1.1. That being
 said if
  there is something which is a regression form 2.1.1 that has not
 been
  correctly targeted please ping a committer to help target the
 issue (you can
  see the open issues listed as impacting Spark 2.1.1 & 2.1.2)

  What are the unresolved issues targeted for 2.1.2?

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-27 Thread Denny Lee

+1 (non-binding)


On Wed, Sep 27, 2017 at 6:54 AM Sean Owen  wrote:

> +1
>
> I tested the source release.
> Hashes and signature (your signature) check out, project builds and tests
> pass with -Phadoop-2.7 -Pyarn -Phive -Pmesos on Debian 9.
> List of issues look good and there are no open issues at all for 2.1.2.
>
> Great work on improving the build process and docs.
>
>
> On Wed, Sep 27, 2017 at 5:47 AM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.2. The vote is open until Wednesday October 4th at 23:59 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.1.2-rc2
>>  (
>> fabbb7f59e47590114366d14e15fbbff8c88593c)
>>
>> List of JIRA tickets resolved in this release can be found with this
>> filter.
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://home.apache.org/~holden/spark-2.1.2-rc2-bin/
>>
>> Release artifacts are signed with a key from:
>> https://people.apache.org/~holden/holdens_keys.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1251
>>
>> The documentation corresponding to this release can be found at:
>> https://people.apache.org/~holden/spark-2.1.2-rc2-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you
>> can add the staging repository to your projects resolvers and test with the
>> RC (make sure to clean up the artifact cache before/after so you don't
>> end up building with a out of date RC going forward).
>>
>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.3.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1. That being said
>> if there is something which is a regression form 2.1.1 that has not been
>> correctly targeted please ping a committer to help target the issue (you
>> can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>> 
>> )
>>
>> *What are the unresolved* issues targeted for 2.1.2
>> 
>> ?
>>
>> At this time there are no open unresolved issues.
>>
>> *Is there anything different about this release?*
>>
>> This is the first release in awhile not built on the AMPLAB Jenkins. This
>> is good because it means future releases can more easily be built and
>> signed securely (and I've been updating the documentation in
>> https://github.com/apache/spark-website/pull/66 as I progress), however
>> the chances of a mistake are higher with any change like this. If there
>> something you normally take for granted as correct when checking a release,
>> please double check this time :)
>>
>> *Should I be committing code to branch-2.1?*
>>
>> Thanks for asking! Please treat this stage in the RC process as "code
>> freeze" so bug fixes only. If you're uncertain if something should be back
>> ported please reach out. If you do commit to branch-2.1 please tag your
>> JIRA issue fix version for 2.1.3 and if we cut another RC I'll move the
>> 2.1.3 fixed into 2.1.2 as appropriate.
>>
>> *Why the longer voting window?*
>>
>> Since there is a large industry big data conference this week I figured
>> I'd add a little bit of extra buffer time just to make sure everyone has a
>> chance to take a look.
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-21 Thread Denny Lee

+1

On Thu, Sep 21, 2017 at 11:15 Sean Owen  wrote:

> Am I right that this doesn't mean other packages would use this
> representation, but that they could?
>
> The representation looked fine to me w.r.t. what DL frameworks need.
>
> My previous comment was that this is actually quite lightweight. It's kind
> of like how I/O support is provided for CSV and JSON, so makes enough sense
> to add to Spark. It doesn't really preclude other solutions.
>
> For those reasons I think it's fine. +1
>
> On Thu, Sep 21, 2017 at 6:32 PM Tim Hunter 
> wrote:
>
>> Hello community,
>>
>> I would like to call for a vote on SPARK-21866. It is a short proposal
>> that has important applications for image processing and deep learning.
>> Joseph Bradley has offered to be the shepherd.
>>
>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866
>> PDF version: https://issues.apache.org/jira/secure/attachment/12884792/
>> SPIP%20-%20Image%20support%20for%20Apache%20Spark%20V1.1.pdf
>>
>> Background and motivation
>>
>> As Apache Spark is being used more and more in the industry, some new use
>> cases are emerging for different data formats beyond the traditional SQL
>> types or the numerical types (vectors and matrices). Deep Learning
>> applications commonly deal with image processing. A number of projects add
>> some Deep Learning capabilities to Spark (see list below), but they
>> struggle to communicate with each other or with MLlib pipelines because
>> there is no standard way to represent an image in Spark DataFrames. We
>> propose to federate efforts for representing images in Spark by defining a
>> representation that caters to the most common needs of users and library
>> developers.
>>
>> This SPIP proposes a specification to represent images in Spark
>> DataFrames and Datasets (based on existing industrial standards), and an
>> interface for loading sources of images. It is not meant to be a
>> full-fledged image processing library, but rather the core description that
>> other libraries and users can rely on. Several packages already offer
>> various processing facilities for transforming images or doing more complex
>> operations, and each has various design tradeoffs that make them better as
>> standalone solutions.
>>
>> This project is a joint collaboration between Microsoft and Databricks,
>> which have been testing this design in two open source packages: MMLSpark
>> and Deep Learning Pipelines.
>>
>> The proposed image format is an in-memory, decompressed representation
>> that targets low-level applications. It is significantly more liberal in
>> memory usage than compressed image representations such as JPEG, PNG, etc.,
>> but it allows easy communication with popular image processing libraries
>> and has no decoding overhead.
>> Targets users and personas:
>>
>> Data scientists, data engineers, library developers.
>> The following libraries define primitives for loading and representing
>> images, and will gain from a common interchange format (in alphabetical
>> order):
>>
>>- BigDL
>>- DeepLearning4J
>>- Deep Learning Pipelines
>>- MMLSpark
>>- TensorFlow (Spark connector)
>>- TensorFlowOnSpark
>>- TensorFrames
>>- Thunder
>>
>> Goals:
>>
>>- Simple representation of images in Spark DataFrames, based on
>>pre-existing industrial standards (OpenCV)
>>- This format should eventually allow the development of
>>high-performance integration points with image processing libraries such 
>> as
>>libOpenCV, Google TensorFlow, CNTK, and other C libraries.
>>- The reader should be able to read popular formats of images from
>>distributed sources.
>>
>> Non-Goals:
>>
>> Images are a versatile medium and encompass a very wide range of formats
>> and representations. This SPIP explicitly aims at the most common use
>> case in the industry currently: multi-channel matrices of binary, int32,
>> int64, float or double data that can fit comfortably in the heap of the JVM:
>>
>>- the total size of an image should be restricted to less than 2GB
>>(roughly)
>>- the meaning of color channels is application-specific and is not
>>mandated by the standard (in line with the OpenCV standard)
>>- specialized formats used in meteorology, the medical field, etc.
>>are not supported
>>- this format is specialized to images and does not attempt to solve
>>the more general problem of representing n-dimensional tensors in Spark
>>
>> Proposed API changes
>>
>> We propose to add a new package in the package structure, under the MLlib
>> project:
>> org.apache.spark.image
>> Data format
>>
>> We propose to add the following structure:
>>
>> imageSchema = StructType([
>>
>>- StructField("mode", StringType(), False),
>>   - The exact representation of the data.
>>   - The values are described in the following OpenCV convention.
>>   Basically, the type has both "depth" and "number

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-15 Thread Denny Lee

+1 (non-binding)

On Thu, Sep 14, 2017 at 10:57 PM Felix Cheung 
wrote:

> +1 tested SparkR package on Windows, r-hub, Ubuntu.
>
> _
> From: Sean Owen 
> Sent: Thursday, September 14, 2017 3:12 PM
> Subject: Re: [VOTE] Spark 2.1.2 (RC1)
> To: Holden Karau , 
>
>
>
> +1
> Very nice. The sigs and hashes look fine, it builds fine for me on Debian
> Stretch with Java 8, yarn/hive/hadoop-2.7 profiles, and passes tests.
>
> Yes as you say, no outstanding issues except for this which doesn't look
> critical, as it's not a regression.
>
> SPARK-21985 PySpark PairDeserializer is broken for double-zipped RDDs
>
>
> On Thu, Sep 14, 2017 at 7:47 PM Holden Karau  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.2. The vote is open until Friday September 22nd at 18:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.2
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.1.2-rc1
>>  (
>> 6f470323a0363656999dd36cb33f528afe627c12)
>>
>> List of JIRA tickets resolved in this release can be found with this
>> filter.
>> 
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://home.apache.org/~pwendell/spark-releases/spark-2.1.2-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1248/
>>
>> The documentation corresponding to this release can be found at:
>> https://people.apache.org/~pwendell/spark-releases/spark-2.1.2-rc1-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> *What should happen to JIRA tickets still targeting 2.1.2?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.3.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1. That being said if
>> there is something which is a regression form 2.1.1 that has not been
>> correctly targeted please ping a committer to help target the issue (you
>> can see the open issues listed as impacting Spark 2.1.1 & 2.1.2
>> 
>> )
>>
>> *What are the unresolved* issues targeted for 2.1.2
>> 
>> ?
>>
>> At the time of the writing, there is one in progress major issue
>> SPARK-21985 , I
>> believe Andrew Ray & HyukjinKwon are looking into this one.
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>

[TYPO3-german] Re: Powermail speichert Flexform nicht vollständig RTE Fehler

2017-08-16 Thread Denny lee


Datenbank utf8_general_ci
Tabelle tt_content: InnoDB  utf8_general_ci
___
TYPO3-german mailing list
TYPO3-german@lists.typo3.org
http://lists.typo3.org/cgi-bin/mailman/listinfo/typo3-german

[TYPO3-german] Powermail speichert Flexform nicht vollständig RTE Fehler

2017-08-16 Thread Denny lee


Hallo,

vorab einmal vielen Dank für das super Forum was mir schon so einige Male gut 
weitergeholfen hat.

Leider stoße ich aktuell bei einem mir nicht erklärbaren Fehler an meine 
Grenzen.


Problembeschreibung:

Beim Bearbeiten des Plugins erscheint die unten aufgeführte Fehlermeldung (Das 
Html-Dokument ist nicht wohlgeformt.). Klickt man diese weg, kann man Inhalte 
bearbeiten.
Füllt man zb. den Empfänger aus und speichert, werden die Daten nicht 
übernommen und die Felder anschließend wieder leer angezeigt.
Weiterhin werden Icons und Infos vom RTE nur bei Powermail doppelt angezeigt.
Meiner Meinung nach hat der RTE Fehler nichts mit dem Speichern der Daten zu 
tun.
Im normalen Text-Element erscheint dieser Fehler nicht und es werden auch keine 
Dopplungen angezeigt.

Folgende Versionen werden verwendet:

Powermail 3.20.0
TYPO3 7.6.9
PHP 5.6 und 7 getestet
Keine Besonderen anderen Extensions

Folgender Fehler erscheint

[img]index.php/fa/17171/0/[/img]

[img]index.php/fa/17170/0/[/img]


Folgende Sachen wurden durchprobiert:

   Verschiedene Powermail Versionen durchinstalliert (auch die 3.21.1)
   TYPO3 Versionen durchinstalliert
   Extension inkl. DB Tabellen entfernt und neu installiert
   Plugin auf verschiedenen Seiten installiert
   Projekt in verschiedenen Browsern getestet
   RTE deaktiviert (dann verschwindet zwar die Meldung aber das Speichern 
funktioniert dennoch nicht)
   Weitere Extensions deinstalliert



Ich würde mich sehr freuen wenn Sie mir mit Tipps und Infos weiterhelfen 
könnten.

Ich bedanke mich im Voraus.

LG Denny

begin 644 fjahjadj.png
MB5!.1PT*&@H-24A$4@```L4```'R"`(```",/B3'```@`$E$051XG.W=
MS6];UYWP<1:8Q>SJ/Z#HJD@1+Q(8!F8Z2)<3Q

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Denny Lee

Congrats!!

On Mon, Aug 7, 2017 at 17:39 Yanbo Liang  wrote:

> Great.
> Congratulations, Hyukjin and Sameer!
>
> On Tue, Aug 8, 2017 at 7:53 AM, Holden Karau  wrote:
>
>> Congrats!
>>
>> On Mon, Aug 7, 2017 at 3:54 PM Bryan Cutler  wrote:
>>
>>> Great work Hyukjin and Sameer!
>>>
>>> On Mon, Aug 7, 2017 at 10:22 AM, Mridul Muralidharan 
>>> wrote:
>>>
 Congratulations Hyukjin, Sameer !

 Regards,
 Mridul

 On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia 
 wrote:
 > Hi everyone,
 >
 > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal
 as committers. Join me in congratulating both of them and thanking them for
 their contributions to the project!
 >
 > Matei
 > -
 > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Denny Lee

This is amazingly awesome! :)

On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com 
wrote:

> That's great!
>
>
>
> On 12 July 2017 at 12:41, Felix Cheung  wrote:
>
>> Awesome! Congrats!!
>>
>> --
>> *From:* holden.ka...@gmail.com  on behalf of
>> Holden Karau 
>> *Sent:* Wednesday, July 12, 2017 12:26:00 PM
>> *To:* user@spark.apache.org
>> *Subject:* With 2.2.0 PySpark is now available for pip install from PyPI
>> :)
>>
>> Hi wonderful Python + Spark folks,
>>
>> I'm excited to announce that with Spark 2.2.0 we finally have PySpark
>> published on PyPI (see https://pypi.python.org/pypi/pyspark /
>> https://twitter.com/holdenkarau/status/885207416173756417). This has
>> been a long time coming (previous releases included pip installable
>> artifacts that for a variety of reasons couldn't be published to PyPI). So
>> if you (or your friends) want to be able to work with PySpark locally on
>> your laptop you've got an easier path getting started (pip install pyspark).
>>
>> If you are setting up a standalone cluster your cluster will still need
>> the "full" Spark packaging, but the pip installed PySpark should be able to
>> work with YARN or an existing standalone cluster installation (of the same
>> version).
>>
>> Happy Sparking y'all!
>>
>> Holden :)
>>
>>
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-03 Thread Denny Lee

+1 (non-binding)

On Mon, Jul 3, 2017 at 6:45 PM Liang-Chi Hsieh  wrote:

> +1
>
>
> Sameer Agarwal wrote
> > +1
> >
> > On Mon, Jul 3, 2017 at 6:08 AM, Wenchen Fan 
>
> > cloud0fan@
>
> >  wrote:
> >
> >> +1
> >>
> >> On 3 Jul 2017, at 8:22 PM, Nick Pentreath 
>
> > nick.pentreath@
>
> > 
> >> wrote:
> >>
> >> +1 (binding)
> >>
> >> On Mon, 3 Jul 2017 at 11:53 Yanbo Liang 
>
> > ybliang8@
>
> >  wrote:
> >>
> >>> +1
> >>>
> >>> On Mon, Jul 3, 2017 at 5:35 AM, Herman van Hövell tot Westerflier <
> >>>
>
> > hvanhovell@
>
> >> wrote:
> >>>
>  +1
> 
>  On Sun, Jul 2, 2017 at 11:32 PM, Ricardo Almeida <
> 
>
> > ricardo.almeida@
>
> >> wrote:
> 
> > +1 (non-binding)
> >
> > Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
> > -Phive -Phive-thriftserver -Pscala-2.11 on
> >
> >- macOS 10.12.5 Java 8 (build 1.8.0_131)
> >- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
> >
> >
> >
> >
> >
> > On 1 Jul 2017 02:45, "Michael Armbrust" 
>
> > michael@
>
> >  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark
> > version 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00
> > PST and passes if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.2.0
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >
> > The tag to be voted on is v2.2.0-rc6
> > https://github.com/apache/spark/tree/v2.2.0-rc6;
> > (a2c7b2133cfee7f
> > a9abfaa2bfbfb637155466783)
> >
> > List of JIRA tickets resolved can be found with this filter
> > 
> https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0
> ;
> > .
> >
> > The release files, including signatures, digests, etc. can be found
> > at:
> >
> https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc6-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachespark-1245/
> >
> > The documentation corresponding to this release can be found at:
> > https://people.apache.org/~pwendell/spark-releases/spark-
> > 2.2.0-rc6-docs/
> >
> >
> > *FAQ*
> >
> > *How can I help test this release?*
> >
> > If you are a Spark user, you can help us test this release by taking
> > an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > *What should happen to JIRA tickets still targeting 2.2.0?*
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be
> > worked on immediately. Everything else please retarget to 2.3.0 or
> > 2.2.1.
> >
> > *But my bug isn't fixed!??!*
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from 2.1.1.
> >
> >
> >
> 
> 
> >>>
> >>
> >
> >
> > --
> > Sameer Agarwal
> > Software Engineer | Databricks Inc.
> > http://cs.berkeley.edu/~sameerag
>
>
>
>
>
> -
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC6-tp21902p21914.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Denny Lee

+1 non-binding

Tested on macOS Sierra, Ubuntu 16.04
test suite includes various test cases including Spark SQL, ML,
GraphFrames, Structured Streaming


On Wed, Jun 7, 2017 at 9:40 PM vaquar khan  wrote:

> +1 non-binding
>
> Regards,
> vaquar khan
>
> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" 
> wrote:
>
> +1 (non-binding)
>
> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
> -Phive-thriftserver -Pscala-2.11 on
>
>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>
>
> On 5 June 2017 at 21:14, Michael Armbrust  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.2.0. The vote is open until Thurs, June 8th, 2017 at 12:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.2.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.2.0-rc4
>>  (
>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.1.
>>
>
>
>

Re: Spark Shell issue on HDInsight

2017-05-14 Thread Denny Lee

Sorry for the delay, you just did as I'm with the Azure CosmosDB (formerly
DocumentDB) team.  If you'd like to make it official, why not add an issue
to the GitHub repo at https://github.com/Azure/azure-documentdb-spark/issues.
HTH!

On Thu, May 11, 2017 at 9:08 PM ayan guha <guha.a...@gmail.com> wrote:

> Works for me tooyou are a life-saver :)
>
> But the question: should/how we report this to Azure team?
>
> On Fri, May 12, 2017 at 10:32 AM, Denny Lee <denny.g@gmail.com> wrote:
>
>> I was able to repro your issue when I had downloaded the jars via blob
>> but when I downloaded them as raw, I was able to get everything up and
>> running.  For example:
>>
>> wget https://github.com/Azure/azure-documentdb-spark/*blob*
>> /master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-1.10.0.jar
>> wget https://github.com/Azure/azure-documentdb-spark/*blob*
>> /master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>
>> resulted in the error:
>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel).
>> [init] error: error while loading , Error accessing
>> /home/sshuser/jars/test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>
>> Failed to initialize compiler: object java.lang.Object in compiler mirror
>> not found.
>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>> ** object programmatically, settings.usejavacp.value = true.
>>
>> But when running:
>> wget
>> https://github.com/Azure/azure-documentdb-spark/raw/master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-1.10.0.jar
>> wget
>> https://github.com/Azure/azure-documentdb-spark/raw/master/releases/azure-documentdb-spark-0.0.3_2.0.2_2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>
>> it was up and running:
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel).
>> 17/05/11 22:54:06 WARN SparkContext: Use an existing SparkContext, some
>> configuration may not take effect.
>> Spark context Web UI available at http://10.0.0.22:4040
>> Spark context available as 'sc' (master = yarn, app id =
>> application_1494248502247_0013).
>> Spark session available as 'spark'.
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 2.0.2.2.5.4.0-121
>>   /_/
>>
>> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala>
>>
>> HTH!
>>
>>
>> On Wed, May 10, 2017 at 11:49 PM ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Thanks for reply, but unfortunately did not work. I am getting same
>>> error.
>>>
>>> sshuser@ed0-svochd:~/azure-spark-docdb-test$ spark-shell --jars
>>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel).
>>> [init] error: error while loading , Error accessing
>>> /home/sshuser/azure-spark-docdb-test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatica

Re: Spark Shell issue on HDInsight

2017-05-11 Thread Denny Lee

ala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:997)
> at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579)
> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567)
> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
> at
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
> at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
> at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
> at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
> at
> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
> at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
> at
> scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
> at org.apache.spark.repl.Main$.doMain(Main.scala:68)
> at org.apache.spark.repl.Main$.main(Main.scala:51)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
> at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> sshuser@ed0-svochd:~/azure-spark-docdb-test$
>
>
> On Mon, May 8, 2017 at 11:50 PM, Denny Lee <denny.g@gmail.com> wrote:
>
>> This appears to be an issue with the Spark to DocumentDB connector,
>> specifically version 0.0.1. Could you run the 0.0.3 version of the jar and
>> see if you're still getting the same error?  i.e.
>>
>> spark-shell --master yarn --jars
>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar
>>
>>
>> On Mon, May 8, 2017 at 5:01 AM ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am facing an issue while trying to use azure-document-db connector
>>> from Microsoft. Instructions/Github
>>> <https://github.com/Azure/azure-documentdb-spark/wiki/Azure-DocumentDB-Spark-Connector-User-Guide>
>>> .
>>>
>>> Error while trying to add jar in spark-shell:
>>>
>>> spark-shell --jars
>>> azure-documentdb-spark-0.0.1.jar,azure-documentdb-1.9.6.jar
>>> SPARK_MAJOR_VERSION is set to 2, using Spark2
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel).
>>> [init] error: error while loading , Error accessing
>>> /home/sshuser/azure-spark-docdb-test/v1/azure-documentdb-spark-0.0.1.jar
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>>
>>> Failed to initialize compiler: object java.lang.Object in compiler
>>> mirror not found.
>>> ** Note that as of 2.8 scala does not assume use of the java classpath.
>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings
>>> ** object programmatically, settings.usejavacp.value = true.
>>> Exception in thread "main" java.lang.NullPointerException
>>> at
>>> scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
>>>

Re: Spark Shell issue on HDInsight

2017-05-08 Thread Denny Lee

This appears to be an issue with the Spark to DocumentDB connector,
specifically version 0.0.1. Could you run the 0.0.3 version of the jar and
see if you're still getting the same error?  i.e.

spark-shell --master yarn --jars
azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar


On Mon, May 8, 2017 at 5:01 AM ayan guha  wrote:

> Hi
>
> I am facing an issue while trying to use azure-document-db connector from
> Microsoft. Instructions/Github
> 
> .
>
> Error while trying to add jar in spark-shell:
>
> spark-shell --jars
> azure-documentdb-spark-0.0.1.jar,azure-documentdb-1.9.6.jar
> SPARK_MAJOR_VERSION is set to 2, using Spark2
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> [init] error: error while loading , Error accessing
> /home/sshuser/azure-spark-docdb-test/v1/azure-documentdb-spark-0.0.1.jar
>
> Failed to initialize compiler: object java.lang.Object in compiler mirror
> not found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programmatically, settings.usejavacp.value = true.
>
> Failed to initialize compiler: object java.lang.Object in compiler mirror
> not found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programmatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.NullPointerException
> at
> scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256)
> at
> scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:896)
> at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:895)
> at
> scala.tools.nsc.interpreter.IMain$Request.headerPreamble$lzycompute(IMain.scala:895)
> at
> scala.tools.nsc.interpreter.IMain$Request.headerPreamble(IMain.scala:895)
> at
> scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:918)
> at
> scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1337)
> at
> scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1336)
> at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64)
> at
> scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1336)
> at
> scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:908)
> at
> scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:1002)
> at
> scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:997)
> at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579)
> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567)
> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
> at
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
> at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
> at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
> at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
> at
> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
> at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
> at
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
> at
> scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
> at org.apache.spark.repl.Main$.doMain(Main.scala:68)
> at org.apache.spark.repl.Main$.main(Main.scala:51)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
> at
>

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Denny Lee

+1

On Fri, Apr 28, 2017 at 9:17 AM Kazuaki Ishizaki 
wrote:

> +1 (non-binding)
>
> I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for
> core have passed..
>
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build
> 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
> $ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7
> package install
> $ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
> ...
> Total number of tests run: 1788
> Suites: completed 198, aborted 0
> Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
> All tests passed.
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 16:30 min
> [INFO] Finished at: 2017-04-29T01:02:29+09:00
> [INFO] Final Memory: 54M/576M
> [INFO]
> 
>
> Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Michael Armbrust 
> To:"dev@spark.apache.org" 
> Date:2017/04/27 09:30
> Subject:[VOTE] Apache Spark 2.1.1 (RC4)
> --
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see *http://spark.apache.org/*
> 
>
> The tag to be voted on is *v2.1.1-rc4*
>  (
> 267aca5bd5042303a718d10635bc0d1a1596853f)
>
> List of JIRA tickets resolved can be found *with this filter*
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> *http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-bin/*
> 
>
> Release artifacts are signed with the following key:
> *https://people.apache.org/keys/committer/pwendell.asc*
> 
>
> The staging repository for this release can be found at:
> *https://repository.apache.org/content/repositories/orgapachespark-1232/*
> 
>
> The documentation corresponding to this release can be found at:
> *http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc4-docs/*
> 
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>
>

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Denny Lee

As well, perhaps another option could be to use the Spark Connector to
DocumentDB (https://github.com/Azure/azure-documentdb-spark) if sticking
with Scala?
On Thu, Apr 20, 2017 at 21:46 Nan Zhu  wrote:

> DocDB does have a java client? Anything prevent you using that?
>
> Get Outlook for iOS 
> --
> *From:* ayan guha 
> *Sent:* Thursday, April 20, 2017 9:24:03 PM
> *To:* Ashish Singh
> *Cc:* user
> *Subject:* Re: Azure Event Hub with Pyspark
>
> Hi
>
> yes, its only scala. I am looking for a pyspark version, as i want to
> write to documentDB which has good python integration.
>
> Thanks in advance
>
> best
> Ayan
>
> On Fri, Apr 21, 2017 at 2:02 PM, Ashish Singh 
> wrote:
>
>> Hi ,
>>
>> You can try https://github.com/hdinsight/spark-eventhubs : which is
>> eventhub receiver for spark streaming
>> We are using it but you have scala version only i guess
>>
>>
>> Thanks,
>> Ashish Singh
>>
>> On Fri, Apr 21, 2017 at 9:19 AM, ayan guha  wrote:
>>
>>> [image: Boxbe]  This message is
>>> eligible for Automatic Cleanup! (guha.a...@gmail.com) Add cleanup rule
>>> 
>>> | More info
>>> 
>>>
>>> Hi
>>>
>>> I am not able to find any conector to be used to connect spark streaming
>>> with Azure Event Hub, using pyspark.
>>>
>>> Does anyone know if there is such library/package exists>?
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Denny Lee

+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun 
wrote:

> +1
>
> I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
> with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> –Psparkr`
>
> At the end of R test, I saw `Had CRAN check errors; see logs.`,
> but tests passed and log file looks good.
>
> Bests,
> Dongjoon.
>
> From: Reynold Xin 
> Date: Wednesday, April 19, 2017 at 3:41 PM
> To: Marcelo Vanzin 
> Cc: Michael Armbrust , "dev@spark.apache.org" <
> dev@spark.apache.org>
> Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)
>
> +1
>
> On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin 
> wrote:
>
>> +1 (non-binding).
>>
>> Ran the hadoop-2.6 binary against our internal tests and things look good.
>>
>> On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
>>  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.1
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.1-rc3
>> > (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>> >
>> > List of JIRA tickets resolved can be found with this filter.
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1230/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>> >
>> >
>> > FAQ
>> >
>> > How can I help test this release?
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > What should happen to JIRA tickets still targeting 2.1.1?
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>> >
>> > But my bug isn't fixed!??!
>> >
>> > In order to make timely releases, we will typically not hold the release
>> > unless the bug in question is a regression from 2.1.0.
>> >
>> > What happened to RC1?
>> >
>> > There were issues with the release packaging and as a result was
>> skipped.
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>

Support Stored By Clause

2017-03-27 Thread Denny Lee

Per SPARK-19630, wondering if there are plans to support "STORED BY" clause
for Spark 2.x?

Thanks!

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Denny Lee

Awesome! Congrats Burak & Holden!!

On Tue, Jan 24, 2017 at 10:39 Joseph Bradley  wrote:

> Congratulations Burak & Holden!
>
> On Tue, Jan 24, 2017 at 10:33 AM, Dongjoon Hyun 
> wrote:
>
> Great! Congratulations, Burak and Holden.
>
> Bests,
> Dongjoon.
>
> On 2017-01-24 10:29 (-0800), Nicholas Chammas 
> wrote:
> >  
> >
> > Congratulations, Burak and Holden.
> >
> > On Tue, Jan 24, 2017 at 1:27 PM Russell Spitzer <
> russell.spit...@gmail.com>
> > wrote:
> >
> > > Great news! Congratulations!
> > >
> > > On Tue, Jan 24, 2017 at 10:25 AM Dean Wampler 
> > > wrote:
> > >
> > > Congratulations to both of you!
> > >
> > > dean
> > >
> > > *Dean Wampler, Ph.D.*
> > > Author: Programming Scala, 2nd Edition
> > > , Fast Data
> > > Architectures for Streaming Applications
> > > <
> http://www.oreilly.com/data/free/fast-data-architectures-for-streaming-applications.csp
> >,
> > > Functional Programming for Java Developers
> > > , and Programming
> Hive
> > >  (O'Reilly)
> > > Lightbend 
> > > @deanwampler 
> > > http://polyglotprogramming.com
> > > https://github.com/deanwampler
> > >
> > > On Tue, Jan 24, 2017 at 6:14 PM, Xiao Li  wrote:
> > >
> > > Congratulations! Burak and Holden!
> > >
> > > 2017-01-24 10:13 GMT-08:00 Reynold Xin :
> > >
> > > Hi all,
> > >
> > > Burak and Holden have recently been elected as Apache Spark committers.
> > >
> > > Burak has been very active in a large number of areas in Spark,
> including
> > > linear algebra, stats/maths functions in DataFrames, Python/R APIs for
> > > DataFrames, dstream, and most recently Structured Streaming.
> > >
> > > Holden has been a long time Spark contributor and evangelist. She has
> > > written a few books on Spark, as well as frequent contributions to the
> > > Python API to improve its usability and performance.
> > >
> > > Please join me in welcoming the two!
> > >
> > >
> > >
> > >
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>

Re: unsubscribe

2017-01-09 Thread Denny Lee

Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org
HTH!
 





On Mon, Jan 9, 2017 4:40 PM, william tellme williamtellme...@gmail.com
wrote:

Re: UNSUBSCRIBE

2017-01-09 Thread Denny Lee

Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org
HTH!
 





On Mon, Jan 9, 2017 4:41 PM, Chris Murphy - ChrisSMurphy.com 
cont...@chrissmurphy.com
wrote:
PLEASE!!

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-18 Thread Denny Lee

+1 (non-binding)


On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin  wrote:

> +1
>
> Cheers,
> Liwei
>
> On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang  wrote:
>
> I hope https://github.com/apache/spark/pull/16252 can be fixed until
> release 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley 
> wrote:
>
> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li  wrote:
>
> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung :
>
> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
> --
> *From:* Sean Owen 
> *Sent:* Friday, December 16, 2016 9:57:15 AM
> *To:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to do for SparkR. For Pyspark, which
> packages a bunch of dependencies, it does include the licenses (good) but I
> think it should include the NOTICE file.
>
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>
>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhov...@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] 
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>
>
>
>

Re: Spark app write too many small parquet files

2016-11-27 Thread Denny Lee

Generally, yes - you should try to have larger data sizes due to the
overhead of opening up files.  Typical guidance is between 64MB-1GB;
personally I usually stick with 128MB-512MB with the default of snappy
codec compression with parquet.  A good reference is Vida Ha's
presentation Data
Storage Tips for Optimal Spark Performance
.

On Sun, Nov 27, 2016 at 9:44 PM Kevin Tran  wrote:

> Hi Everyone,
> Does anyone know what is the best practise of writing parquet file from
> Spark ?
>
> As Spark app write data to parquet and it shows that under that directory
> there are heaps of very small parquet file (such as
> e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only
> 15KB
>
> Should it write each chunk of  bigger data size (such as 128 MB) with
> proper number of files ?
>
> Does anyone find out any performance changes when changing data size of
> each parquet file ?
>
> Thanks,
> Kevin.
>

[jira] [Created] (SPARK-18426) Python Documentation Fix for Structured Streaming Programming Guide

2016-11-13 Thread Denny Lee (JIRA)

Denny Lee created SPARK-18426:
-

 Summary: Python Documentation Fix for Structured Streaming 
Programming Guide
 Key: SPARK-18426
 URL: https://issues.apache.org/jira/browse/SPARK-18426
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.0.1
Reporter: Denny Lee
Priority: Minor
 Fix For: 2.0.2


When running python example in Structured Streaming Guide, get the error:
spark = SparkSession\
TypeError: 'Builder' object is not callable

This is fixed by changing .builder() to .builder 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Handling questions in the mailing lists

2016-11-12 Thread Denny Lee

Hey Reynold,

Looks like we all of the proposed changes into Proposed Community Mailing
Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.
Anything else we can do to update the Spark Community page / welcome email?


Meanwhile, let's all start answering questions on SO, eh?! :)
Denny

On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <hol...@pigscanfly.ca> wrote:

> That's a good question, looking at
> http://stackoverflow.com/tags/apache-spark/topusers shows a few
> contributors who have already been active on SO including some committers
> and  PMC members with very high overall SO reputations for any
> administrative needs (as well as a number of other contributors besides
> just PMC/committers).
>
> On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <assaf.mendel...@rsa.com>
> wrote:
>
> I was just wondering, before we move on to SO.
>
> Do we have enough contributors with enough reputation do manage things in
> SO?
>
> We would need contributors with enough reputation to have relevant
> privilages.
>
> For example: creating tags (requires 1500 reputation), edit questions and
> answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
> access to moderator tools (1, this is required to delete questions
> etc.), protect questions (15000).
>
> All of these are important if we plan to have SO as a main resource.
>
> I know I originally suggested SO, however, if we do not have contributors
> with the required privileges and the willingness to help manage everything
> then I am not sure this is a good fit.
>
> Assaf.
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node=19800=0>]
> *Sent:* Wednesday, November 09, 2016 9:54 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> Agreed that by simply just moving the questions to SO will not solve
> anything but I think the call out about the meta-tags is that we need to
> abide by SO rules and if we were to just jump in and start creating
> meta-tags, we would be violating at minimum the spirit and at maximum the
> actual conventions around SO.
>
>
>
> Saying this, perhaps we could suggest tags that we place in the header of
> the question whether it be SO or the mailing lists that will help us sort
> through all of these questions faster just as you suggested.  The Proposed
> Community Mailing Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
>  has
> been updated to include suggested tags.  WDYT?
>
>
>
> On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node=19799=0>> wrote:
>
> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -   4658 unanswered questions under apache-spark tag.
>
> -  394 unanswered questions under spark-dataframe tag.
>
> -  639 unanswered questions under apache-spark-sql
>
> -  859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-09 Thread Denny Lee

+1 (non binding)



On Tue, Nov 8, 2016 at 10:14 PM vaquar khan  wrote:

> *+1 (non binding)*
>
> On Tue, Nov 8, 2016 at 10:21 PM, Weiqing Yang 
> wrote:
>
>  +1 (non binding)
>
>
> Environment: CentOS Linux release 7.0.1406 (Core) / openjdk version
> "1.8.0_111"
>
>
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr -DskipTests clean package
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr test
>
>
>
> On Tue, Nov 8, 2016 at 7:38 PM, Liwei Lin  wrote:
>
> +1 (non-binding)
>
> Cheers,
> Liwei
>
> On Tue, Nov 8, 2016 at 9:50 PM, Ricardo Almeida <
> ricardo.alme...@actnowib.com> wrote:
>
> +1 (non-binding)
>
> over Ubuntu 16.10, Java 8 (OpenJDK 1.8.0_111) built with Hadoop 2.7.3,
> YARN, Hive
>
>
> On 8 November 2016 at 12:38, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
> +1
>
> On Tue, Nov 8, 2016 at 7:09 AM, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.2. The vote is open until Thu, Nov 10, 2016 at 22:00 PDT and passes if
> a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.2
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.2-rc3
> (584354eaac02531c9584188b143367ba694b0c34)
>
> This release candidate resolves 84 issues:
> https://s.apache.org/spark-2.0.2-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1214/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc3-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.1.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series. Bugs already present
> in 2.0.1, missing features, or bugs related to new features will not
> necessarily block this release.
>
> Q: What fix version should I use for patches merging into branch-2.0 from
> now on?
> A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> (i.e. RC4) is cut, I will change the fix version of those patches to 2.0.2.
>
>
>
>
>
>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
>
> IT Architect / Lead Consultant
> Greater Chicago
>

Re: Handling questions in the mailing lists

2016-11-09 Thread Denny Lee

Here here! :)  Completely agree with you - here's the latest updates
to Proposed
Community Mailing Lists / StackOverflow Changes
.
Keep them coming though at this point, I'd like to limit new verbiage to
prevent it from being too long hence not being read.  Modifications and
suggestions are absolutely welcome - just asking that we don't make it too
much longer.  Thanks!


On Wed, Nov 9, 2016 at 5:36 AM Gerard Maas  wrote:

> Great discussion. Glad to see it happening and lucky to have seen it on
> the mailing list due to its high volume.
>
> I had this same conversation with Patrick Wendell few Spark Summits ago.
> At the time, SO was not even listed as a resource and the idea was to make
> it the primary "go-to" place for questions.
>
> Having contributed to both the list (in its early days) and SO, the
> biggest hurdle IMO is how to deal with lazy people. These days, at SO, I
> spend more time leaving comments than answering in an attempt to moderate
> the requirement of "show some effort" and clarify unclear questions.
>
> It's my impression that the mailing list is much more friendly with "plz
> send me da code" folk and indeed would answer questions that would
> otherwise get down-voted or closed at SO. That also shows in the high email
> volume, which at the same time lowers its value for many of us who get
> overwhelmed. It's hard to separate authentic efforts in getting started,
> which deserve help and encouraging vs moderating "work dumpers" that abuse
> resources to get their thing done. Also, beginner questions always repeat
> and a mailing list has no features to help with that.
>
> The model I had in imagined roughly follows the "Odersky scale":
>  - Users new with the technology and basic "how to" questions belong in
> Stack Overflow. => The search and de-duplication features should help in
> getting an answer if already present, reducing the load.
>  - Advanced discussions and troubleshooting belong in users@
>  - Library bugs, new features and improvements belong in dev@
>
> Off course, there's no hard line between these levels and it would require
> contributor discretion aided with some routing procedure:
>
> - Spark documentation should establish Stack Overflow as the main go-to
> resource.
> - Contributors on the list should friendly redirect "intro level
> questions" to Stack Overflow.
> - SO contributors should redirect potential bugs and questions deserving a
> deeper discussion to @users or @dev as needed
> - @users -> @dev as today
> - Cross-posting SO + @users should be discouraged. The idea is to create
> efficient channels.
>
> A good resource on how and where to ask questions would be a great routing
> channel between the levels above.
> I'm willing to help with moderation efforts on "Spark Overflow" :-) to get
> this going.
>
> The Spark community has always been very welcoming and that spirit should
> be preserved. We just need to channel the efforts in a more efficient way.
>
> my 2c,
>
> Gerard.
>
>
> On Mon, Nov 7, 2016 at 11:24 PM, Maciej Szymkiewicz <
> mszymkiew...@gmail.com> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>- If we are thinking about shifting focus towards SO all attempts of
>micromanaging should be discarded right in the beginning. Especially things
>like meta tags, which are discouraged and "burninated" (
>https://meta.stackoverflow.com/tags/burninate-request/info) , or
>thread bumping. Depending on a context these won't be manageable, go
>against community guidelines or simply obsolete.
>- Lack of expertise is unlikely an issue. Even now there is a number
>of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>- Identifying, improving and promoting canonical questions and
>answers. It means closing duplicate, suggesting edits to improve existing
>answers, providing alternative solutions. This can be also used to identify
>gaps in the documentation.
>- Providing a set of clear posting guidelines to reduce effort
>required to identify the problem (think about
>http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>reproducible example?)
>- Helping users decide if question is a good fit for SO (see below).
>API questions are great fit, debugging problems like "my cluster is slow"
>are not.
>- Actively cleaning (closing, deleting) off-topic and low quality
>questions. The less junk to sieve through the better chance of good
>questions being answered.
>- Repurposing and actively moderating SO docs (
>https://stackoverflow.com/documentation/apache-spark/topics). Right
>now most of the stuff that goes there is useless, duplicated or
>plagiarized, or border case SPAM.
>- Encouraging community to monitor featured (
>

Re: Handling questions in the mailing lists

2016-11-08 Thread Denny Lee

Agreed that by simply just moving the questions to SO will not solve
anything but I think the call out about the meta-tags is that we need to
abide by SO rules and if we were to just jump in and start creating
meta-tags, we would be violating at minimum the spirit and at maximum the
actual conventions around SO.

Saying this, perhaps we could suggest tags that we place in the header of
the question whether it be SO or the mailing lists that will help us sort
through all of these questions faster just as you suggested.  The Proposed
Community Mailing Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
has
been updated to include suggested tags.  WDYT?

On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <assaf.mendel...@rsa.com>
wrote:

> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -   4658 unanswered questions under apache-spark tag.
>
> -  394 unanswered questions under spark-dataframe tag.
>
> -  639 unanswered questions under apache-spark-sql
>
> -  859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for meta tags (the burnination link is about
> removing tags completely so I am not sure how it applies, I believe this
> link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO (
> http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level
> ).
>
>
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be used
> to ask questions for development of spark. There are already tags for some
> spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a
> spark-streaming tag etc.). The main issue I see and the one we can’t seem
> to get around is dividing between simple questions that the community
> should answer and hard questions which only advanced users can answer.
>
>
>
> Maybe SO isn’t the correct platform for that but even within it we can try
> to find a non meta name for spark beginner questions vs. spark advanced
> questions.
>
> Assaf.
>
>
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node=19798=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
>
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work with:
> https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>
>
> Hope this will help us collaborate on this stuff a little faster.
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> <http:///user/SendEmail.jtp?type=node=19770=0>> wrote:
>
> Just a couple of random thoughts regarding Stack Overflo

Re: Handling questions in the mailing lists

2016-11-07 Thread Denny Lee

To help track and get the verbiage for the Spark community page and welcome
email jump started, here's a working document for us to work with:
https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#

Hope this will help us collaborate on this stuff a little faster.

On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz 
wrote:

> Just a couple of random thoughts regarding Stack Overflow...
>
>- If we are thinking about shifting focus towards SO all attempts of
>micromanaging should be discarded right in the beginning. Especially things
>like meta tags, which are discouraged and "burninated" (
>https://meta.stackoverflow.com/tags/burninate-request/info) , or
>thread bumping. Depending on a context these won't be manageable, go
>against community guidelines or simply obsolete.
>- Lack of expertise is unlikely an issue. Even now there is a number
>of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>- Identifying, improving and promoting canonical questions and
>answers. It means closing duplicate, suggesting edits to improve existing
>answers, providing alternative solutions. This can be also used to identify
>gaps in the documentation.
>- Providing a set of clear posting guidelines to reduce effort
>required to identify the problem (think about
>http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>reproducible example?)
>- Helping users decide if question is a good fit for SO (see below).
>API questions are great fit, debugging problems like "my cluster is slow"
>are not.
>- Actively cleaning (closing, deleting) off-topic and low quality
>questions. The less junk to sieve through the better chance of good
>questions being answered.
>- Repurposing and actively moderating SO docs (
>https://stackoverflow.com/documentation/apache-spark/topics). Right
>now most of the stuff that goes there is useless, duplicated or
>plagiarized, or border case SPAM.
>- Encouraging community to monitor featured (
>https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>and active & upvoted & unanswered (
>https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>- Implementing some procedure to identify questions which are likely
>to be bugs or a material for feature requests. Personally I am quite often
>tempted to simply send a link to dev list, but I don't think it is really
>acceptable.
>- Animating Spark related chat room. I tried this a couple of times
>but to no avail. Without a certain critical mass of users it just won't
>work.
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
> Sure, I'll be happy to help if I can.
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz  > wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz 
> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016

Re: hope someone can recommend some books for me,a spark beginner

2016-11-06 Thread Denny Lee

There are a number of great resources to learn Apache Spark - a good
starting point is the Apache Spark Documentation at:
http://spark.apache.org/documentation.html


The two books that immediately come to mind are

- Learning Spark: http://shop.oreilly.com/product/mobile/0636920028512.do
(there's also a Chinese language version of this book)

- Advanced Analytics with Apache Spark:
http://shop.oreilly.com/product/mobile/0636920035091.do

You can also find a pretty decent listing of Apache Spark resources at:
https://sparkhub.databricks.com/resources/

HTH!


On Sun, Nov 6, 2016 at 19:00 litg <1933443...@qq.com> wrote:

>I'm a postgraduate from  Shanghai Jiao Tong University,China.
> recently, I
> carry out a project about the  realization of artificial algorithms on
> spark
> in python. however, I am not familiar with this field.furthermore,there are
> few Chinese books about spark.
>  Actually,I strongly want to have a further study at this field.hope
> someone can  kindly recommend me some books about  the mechanism of spark,
> or just give me suggestions about how to  program with spark.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/hope-someone-can-recommend-some-books-for-me-a-spark-beginner-tp28033.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee

The one you're looking for is the Data Sciences and Engineering with Apache
Spark at
https://www.edx.org/xseries/data-science-engineering-apacher-sparktm.

Note, a great quick start is the Getting Started with Apache Spark on
Databricks at https://databricks.com/product/getting-started-guide

HTH!

On Sun, Nov 6, 2016 at 22:20 Raghav  wrote:

> Can you please point out the right courses from EDX/Berkeley ?
>
> Many thanks.
>
> On Sun, Nov 6, 2016 at 6:08 PM, ayan guha  wrote:
>
> I would start with Spark documentation, really. Then you would probably
> start with some older videos from youtube, especially spark summit
> 2014,2015 and 2016 videos. Regading practice, I would strongly suggest
> Databricks cloud (or download prebuilt from spark site). You can also take
> courses from EDX/Berkley, which are very good starter courses.
>
> On Mon, Nov 7, 2016 at 11:57 AM, raghav  wrote:
>
> I am newbie in the world of big data analytics, and I want to teach myself
> Apache Spark, and want to be able to write scripts to tinker with data.
>
> I have some understanding of Map Reduce but have not had a chance to get my
> hands dirty. There are tons of resources for Spark, but I am looking for
> some guidance for starter material, or videos.
>
> Thanks.
>
> Raghav
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Denny Lee

If you're able to read the data in as a DataFrame, perhaps you can use a
BroadcastHashJoin so that way you can join to that table presuming its
small enough to distributed?  Here's a handy guide on a BroadcastHashJoin:
https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20SQL,%20DataFrames%20%26%20Datasets/05%20BroadcastHashJoin%20-%20scala.html

HTH!

On Thu, Nov 3, 2016 at 8:53 AM Jain, Nishit  wrote:

> I have a lookup table in HANA database. I want to create a spark broadcast
> variable for it.
> What would be the suggested approach? Should I read it as an data frame
> and convert data frame into broadcast variable?
>
> Thanks,
> Nishit
>

Re: GraphFrame BFS

2016-11-01 Thread Denny Lee

You should be able to GraphX or GraphFrames subgraph to build up your
subgraph.  A good example for GraphFrames can be found at:
http://graphframes.github.io/user-guide.html#subgraphs.  HTH!

On Mon, Oct 10, 2016 at 9:32 PM cashinpj  wrote:

> Hello,
>
> I have a set of data representing various network connections.  Each vertex
> is represented by a single id, while the edges have  a source id,
> destination id, and a relationship (peer to peer, customer to provider, or
> provider to customer).  I am trying to create a sub graph build around a
> single source node following one type of edge as far as possible.
>
> For example:
> 1 2 p2p
> 2 3 p2p
> 2 3 c2p
>
> Following the p2p edges would give:
>
> 1 2 p2p
> 2 3 p2p
>
> I am pretty new to GraphX and GraphFrames, but was wondering if it is
> possible to get this behavior using the GraphFrames bfs() function or would
> it be better to modify the already existing Pregel implementation of bfs?
>
> Thank you for your time.
>
> Padraic
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphFrame-BFS-tp27876.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

[jira] [Created] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-01 Thread Denny Lee (JIRA)

Denny Lee created SPARK-18200:
-

 Summary: GraphX Invalid initial capacity when running triangleCount
 Key: SPARK-18200
 URL: https://issues.apache.org/jira/browse/SPARK-18200
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 2.0.1, 2.0.0, 2.0.2
 Environment: Databricks, Ubuntu 16.04, macOS Sierra
Reporter: Denny Lee


Running GraphX triangle count on large-ish file results in the "Invalid initial 
capacity" error when running on Spark 2.0 (tested on Spark 2.0, 2.0.1, and 
2.0.2).  You can see the results at: http://bit.ly/2eQKWDN

Running the same code on Spark 1.6 and the query completes without any 
problems: http://bit.ly/2fATO1M

As well, running the GraphFrames version of this code runs as well (Spark 2.0, 
GraphFrames 0.2): http://bit.ly/2fAS8W8

Reference Stackoverflow question:
Spark GraphX: requirement failed: Invalid initial capacity 
(http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Denny Lee

Oh, I forgot to note that when downloading and running against the Spark
2.0.2 without Hadoop binaries, I got a JNI error due to an exception with
org / slf4j / logger  (i.e. org.slf4j.Logger class is not found).


On Mon, Oct 31, 2016 at 4:35 PM Reynold Xin  wrote:

> OK I will cut a new RC tomorrow. Any other issues people have seen?
>
>
> On Fri, Oct 28, 2016 at 2:58 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
> -1.
>
> The history server is broken because of some refactoring work in
> Structured Streaming: https://issues.apache.org/jira/browse/SPARK-18143
>
> On Fri, Oct 28, 2016 at 12:58 PM, Weiqing Yang 
> wrote:
>
> +1 (non binding)
>
>
>
> Environment: CentOS Linux release 7.0.1406 / openjdk version "1.8.0_111"/
> R version 3.3.1
>
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr -DskipTests clean package
>
> ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> -Dpyspark -Dsparkr test
>
>
> Best,
>
> Weiqing
>
> On Fri, Oct 28, 2016 at 10:06 AM, Ryan Blue 
> wrote:
>
> +1 (non-binding)
>
> Checksums and build are fine. The tarball matches the release tag except
> that .gitignore is missing. It would be nice if the tarball were created
> using git archive so that the commit ref is present, but otherwise
> everything looks fine.
> 
>
> On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin  wrote:
>
> Greetings from Spark Summit Europe at Brussels.
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if
> a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.2
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.2-rc1
> (1c2908eeb8890fdc91413a3f5bad2bb3d114db6c)
>
> This release candidate resolves 75 issues:
> https://s.apache.org/spark-2.0.2-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1208/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.1.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series. Bugs already present
> in 2.0.1, missing features, or bugs related to new features will not
> necessarily block this release.
>
> Q: What fix version should I use for patches merging into branch-2.0 from
> now on?
> A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> (i.e. RC2) is cut, I will change the fix version of those patches to 2.0.2.
>
>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>
>

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Denny Lee

+1 (non-binding)



On Thu, Oct 27, 2016 at 3:36 PM Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> built and tested without regressions from 2.0.1.
>
>
>
> On 27 October 2016 at 19:07, vaquar khan  wrote:
>
> +1
>
>
>
> On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu 
> wrote:
>
> +1
>
> On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin  wrote:
> > Greetings from Spark Summit Europe at Brussels.
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes
> if a
> > majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.0.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v2.0.2-rc1
> > (1c2908eeb8890fdc91413a3f5bad2bb3d114db6c)
> >
> > This release candidate resolves 75 issues:
> > https://s.apache.org/spark-2.0.2-jira
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1208/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-docs/
> >
> >
> > Q: How can I help test this release?
> > A: If you are a Spark user, you can help us test this release by taking
> an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 2.0.1.
> >
> > Q: What justifies a -1 vote for this release?
> > A: This is a maintenance release in the 2.0.x series. Bugs already
> present
> > in 2.0.1, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
> > Q: What fix version should I use for patches merging into branch-2.0 from
> > now on?
> > A: Please mark the fix version as 2.0.3, rather than 2.0.2. If a new RC
> > (i.e. RC2) is cut, I will change the fix version of those patches to
> 2.0.2.
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783
>
> IT Architect / Lead Consultant
> Greater Chicago
>
>
>

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Denny Lee

Congrats, Xiao!
On Tue, Oct 4, 2016 at 00:00 Takeshi Yamamuro  wrote:

> congrats, xiao!
>
> On Tue, Oct 4, 2016 at 3:59 PM, Hyukjin Kwon  wrote:
>
> Congratulations!
>
> 2016-10-04 15:51 GMT+09:00 Dilip Biswal :
>
> Hi Xiao,
>
> Congratulations Xiao !!  This is indeed very well deserved !!
>
> Regards,
> Dilip Biswal
> Tel: 408-463-4980
> dbis...@us.ibm.com
>
>
>
> From:Reynold Xin 
> To:"dev@spark.apache.org" , Xiao Li <
> gatorsm...@gmail.com>
> Date:10/03/2016 10:47 PM
> Subject:welcoming Xiao Li as a committer
> --
>
>
>
> Hi all,
>
> Xiao Li, aka gatorsmile, has recently been elected as an Apache Spark
> committer. Xiao has been a super active contributor to Spark SQL. Congrats
> and welcome, Xiao!
>
> - Reynold
>
>
>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Denny Lee

+1 (non-binding)

On Thu, Sep 29, 2016 at 9:43 PM Jeff Zhang  wrote:

> +1
>
> On Fri, Sep 30, 2016 at 9:27 AM, Burak Yavuz  wrote:
>
>> +1
>>
>> On Sep 29, 2016 4:33 PM, "Kyle Kelley"  wrote:
>>
>>> +1
>>>
>>> On Thu, Sep 29, 2016 at 4:27 PM, Yin Huai  wrote:
>>>
 +1

 On Thu, Sep 29, 2016 at 4:07 PM, Luciano Resende 
 wrote:

> +1 (non-binding)
>
> On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and
>> passes if a majority of at least 3+1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.1
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.1-rc4
>> (933d2c1ea4e5f5c4ec8d375b5ccaa4577ba4be38)
>>
>> This release candidate resolves 301 issues:
>> https://s.apache.org/spark-2.0.1-jira
>>
>> The release files, including signatures, digests, etc. can be found
>> at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1203/
>>
>> The documentation corresponding to this release can be found at:
>>
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-docs/
>>
>>
>> Q: How can I help test this release?
>> A: If you are a Spark user, you can help us test this release by
>> taking an existing Spark workload and running on this release candidate,
>> then reporting any regressions from 2.0.0.
>>
>> Q: What justifies a -1 vote for this release?
>> A: This is a maintenance release in the 2.0.x series.  Bugs already
>> present in 2.0.0, missing features, or bugs related to new features will
>> not necessarily block this release.
>>
>> Q: What fix version should I use for patches merging into branch-2.0
>> from now on?
>> A: Please mark the fix version as 2.0.2, rather than 2.0.1. If a new
>> RC (i.e. RC5) is cut, I will change the fix version of those patches to
>> 2.0.1.
>>
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


>>>
>>>
>>> --
>>> Kyle Kelley (@rgbkrk ; lambdaops.com)
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee

+1 on testing with Python2.

On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar <ksanka...@gmail.com> wrote:

> I do run both Python and Scala. But via iPython/Python2 with my own test
> code. Not running the tests from the distribution.
> Cheers
> 
>
> On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> I'm seeing some test failures with Python 3 that could definitely be
>> environmental (going to rebuild my virtual env and double check), I'm just
>> wondering if other people are also running the Python tests on this release
>> or if everyone is focused on the Scala tests?
>>
>> On Mon, Sep 26, 2016 at 11:48 AM, Maciej Bryński <mac...@brynski.pl>
>> wrote:
>>
>>> +1
>>> At last :)
>>>
>>> 2016-09-26 19:56 GMT+02:00 Sameer Agarwal <sam...@databricks.com>:
>>>
>>>> +1 (non-binding)
>>>>
>>>> On Mon, Sep 26, 2016 at 9:54 AM, Davies Liu <dav...@databricks.com>
>>>> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley <jos...@databricks.com>
>>>>> wrote:
>>>>> > +1
>>>>> >
>>>>> > On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee <denny.g@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> +1 (non-binding)
>>>>> >> On Sun, Sep 25, 2016 at 23:20 Jeff Zhang <zjf...@gmail.com> wrote:
>>>>> >>>
>>>>> >>> +1
>>>>> >>>
>>>>> >>> On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu
>>>>> >>> <shixi...@databricks.com> wrote:
>>>>> >>>>
>>>>> >>>> ＋1
>>>>> >>>>
>>>>> >>>> On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee <petermax...@gmail.com
>>>>> >
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> +1
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot
>>>>> Westerflier
>>>>> >>>>> <hvanhov...@databricks.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> +1 (non-binding)
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida
>>>>> >>>>>> <ricardo.alme...@actnowib.com> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> +1 (non-binding)
>>>>> >>>>>>>
>>>>> >>>>>>> Built and tested on
>>>>> >>>>>>> - Ubuntu 16.04 / OpenJDK 1.8.0_91
>>>>> >>>>>>> - CentOS / Oracle Java 1.7.0_55
>>>>> >>>>>>> (-Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver
>>>>> >>>>>>> -Pyarn)
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> On 25 September 2016 at 22:35, Matei Zaharia
>>>>> >>>>>>> <matei.zaha...@gmail.com> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> +1
>>>>> >>>>>>>>
>>>>> >>>>>>>> Matei
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Sep 25, 2016, at 1:25 PM, Josh Rosen <
>>>>> joshro...@databricks.com>
>>>>> >>>>>>>> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> +1
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Sun, Sep 25, 2016 at 1:16 PM Yin Huai <
>>>>> yh...@databricks.com>
>>>>> >>>>>>>> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> +1
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun
>>>>> >>>>&g

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee

+1 (non-binding)
On Sun, Sep 25, 2016 at 23:20 Jeff Zhang  wrote:

> +1
>
> On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> ＋1
>>
>> On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee  wrote:
>>
>>> +1
>>>
>>>
>>> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot Westerflier <
>>> hvanhov...@databricks.com> wrote:
>>>
 +1 (non-binding)

 On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida <
 ricardo.alme...@actnowib.com> wrote:

> +1 (non-binding)
>
> Built and tested on
> - Ubuntu 16.04 / OpenJDK 1.8.0_91
> - CentOS / Oracle Java 1.7.0_55
> (-Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver -Pyarn)
>
>
> On 25 September 2016 at 22:35, Matei Zaharia 
> wrote:
>
>> +1
>>
>> Matei
>>
>> On Sep 25, 2016, at 1:25 PM, Josh Rosen 
>> wrote:
>>
>> +1
>>
>> On Sun, Sep 25, 2016 at 1:16 PM Yin Huai 
>> wrote:
>>
>>> +1
>>>
>>> On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun >> > wrote:
>>>
 +1 (non binding)

 RC3 is compiled and tested on the following two systems, too. All
 tests passed.

 * CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1
with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
 -Phive-thriftserver -Dsparkr
 * CentOS 7.2 / Open JDK 1.8.0_102
with -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver

 Cheers,
 Dongjoon



 On Saturday, September 24, 2016, Reynold Xin 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.0.1. The vote is open until Tue, Sep 27, 2016 at 15:30 PDT 
> and
> passes if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.1
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.1-rc3
> (9d28cc10357a8afcfb2fa2e6eecb5c2cc2730d17)
>
> This release candidate resolves 290 issues:
> https://s.apache.org/spark-2.0.1-jira
>
> The release files, including signatures, digests, etc. can be
> found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1201/
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc3-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by
> taking an existing Spark workload and running on this release 
> candidate,
> then reporting any regressions from 2.0.0.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series.  Bugs
> already present in 2.0.0, missing features, or bugs related to new 
> features
> will not necessarily block this release.
>
> Q: What fix version should I use for patches merging into
> branch-2.0 from now on?
> A: Please mark the fix version as 2.0.2, rather than 2.0.1. If a
> new RC (i.e. RC4) is cut, I will change the fix version of those 
> patches to
> 2.0.1.
>
>
>
>>>
>>
>

>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Welcoming Felix Cheung as a committer

2016-08-08 Thread Denny Lee

Awesome - congrats Felix!

On Mon, Aug 8, 2016 at 9:44 PM Felix Cheung 
wrote:

> Thank you!
> Looking forward to work with you all!
>
>
>
>
>
> On Mon, Aug 8, 2016 at 7:41 PM -0700, "Yanbo Liang" 
> wrote:
>
> Congrats Felix!
>
> 2016-08-08 18:21 GMT-07:00 Kai Jiang :
>
>> Congrats Felix!
>>
>> On Mon, Aug 8, 2016, 18:14 Jeff Zhang  wrote:
>>
>>> Congrats Felix!
>>>
>>> On Tue, Aug 9, 2016 at 8:49 AM, Hyukjin Kwon 
>>> wrote:
>>>
 Congratulations!

 2016-08-09 7:47 GMT+09:00 Xiao Li :

> Congrats Felix!
>
> 2016-08-08 15:04 GMT-07:00 Herman van Hövell tot Westerflier
> :
> > Congrats Felix!
> >
> > On Mon, Aug 8, 2016 at 11:57 PM, dhruve ashar 
> wrote:
> >>
> >> Congrats Felix!
> >>
> >> On Mon, Aug 8, 2016 at 2:28 PM, Tarun Kumar 
> wrote:
> >>>
> >>> Congrats Felix!
> >>>
> >>> Tarun
> >>>
> >>> On Tue, Aug 9, 2016 at 12:57 AM, Timothy Chen 
> wrote:
> 
>  Congrats Felix!
> 
>  Tim
> 
>  On Mon, Aug 8, 2016 at 11:15 AM, Matei Zaharia <
> matei.zaha...@gmail.com>
>  wrote:
>  > Hi all,
>  >
>  > The PMC recently voted to add Felix Cheung as a committer.
> Felix has
>  > been a major contributor to SparkR and we're excited to have
> him join
>  > officially. Congrats and welcome, Felix!
>  >
>  > Matei
>  >
> -
>  > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>  >
> 
> 
> -
>  To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> >>
> >>
> >>
> >> --
> >> -Dhruve Ashar
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>

1 2 3 >

1 - 100 of 296 matches

Mail list logo