from:"Felix Cheung"

Re: Podling report for Sedona

2020-08-06 Thread Felix Cheung

The report is already on the wiki page? I’m still trying to see something
is missing.


On Thu, Aug 6, 2020 at 12:37 AM Justin Mclean  wrote:

> Hi,
>
> This report is now late, please submit it.
>
> Thanks,
>
> Justin
>
>

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung

So IMO maintaining outside in a separate repo is going to be harder. That was 
why I asked.

From: Maciej Szymkiewicz 
Sent: Tuesday, August 4, 2020 12:59 PM
To: Sean Owen
Cc: Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; Spark Dev List
Subject: Re: [PySpark] Revisiting PySpark type annotations

On 8/4/20 9:35 PM, Sean Owen wrote
> Yes, but the general argument you make here is: if you tie this
> project to the main project, it will _have_ to be maintained by
> everyone. That's good, but also exactly I think the downside we want
> to avoid at this stage (I thought?) I understand for some
> undertakings, it's just not feasible to start outside the main
> project, but is there no proof of concept even possible before taking
> this step -- which more or less implies it's going to be owned and
> merged and have to be maintained in the main project.

I think we have a bit different understanding here ‒ I believe we have
reached a conclusion that maintaining annotations within the project is
OK, we only differ when it comes to specific form it should take.

As of POC ‒ we have stubs, which have been maintained over three years
now and cover versions between 2.3 (though these are fairly limited) to,
with some lag, current master.  There is some evidence there are used in
the wild
(https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D),
there are a few contributors
(https://github.com/zero323/pyspark-stubs/graphs/contributors) and at
least some use cases (https://stackoverflow.com/q/40163106/). So,
subjectively speaking, it seems we're already beyond POC.

--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: A30CEF0C31A501EC

Re: [VOTE] Release Apache Superset (incubating) version 0.37.0

2020-08-04 Thread Felix Cheung

(Carry my vote, as stated already)
+1


On Tue, Aug 4, 2020 at 10:20 AM Ville Brofeldt 
wrote:

> Hello IPMC,
>
> The Apache Superset (incubating) community has voted on and approved a
> proposal to
> release Apache Superset (incubating) version 0.37.0.
> The voting thread can be found here:
> https://lists.apache.org/thread.html/r76d4a5f850546aed4f5ba94c5c8f7b3cb901d8842ed32832512e715b%40%3Cdev.superset.apache.org%3E
>
>
> Here are the binding +1 votes from mentors, carrying over from the podling
> vote:
> - Felix
>
> We now kindly request the Incubator PMC members review and vote on this
> incubator release.
>
> Apache Superset (incubating) is a modern, enterprise-ready business
> intelligence web application
>
> The release candidate:
> https://dist.apache.org/repos/dist/dev/incubator/superset/0.37.0rc4/
>
> Git tag for the release:
> https://github.com/apache/incubator-superset/tree/0.37.0rc4
>
> The Change Log for the release:
> https://github.com/apache/incubator-superset/blob/0.37.0rc4/CHANGELOG.md
>
> public keys are available at:
>
> https://www.apache.org/dist/incubator/superset/KEYS
>
> The vote will be open for at least 72 hours or until the necessary number
> of votes are reached.
>
> Please vote accordingly:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
>
> Thanks,
> The Apache Superset (Incubating) Team

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung

What would be the reason for separate git repo?

From: Hyukjin Kwon 
Sent: Monday, August 3, 2020 1:58:55 AM
To: Maciej Szymkiewicz 
Cc: Driesprong, Fokko ; Holden Karau 
; Spark Dev List 
Subject: Re: [PySpark] Revisiting PySpark type annotations

Okay, seems like we can create a separate repo as apache/spark? e.g.) 
https://issues.apache.org/jira/browse/INFRA-20470
We can also think about porting the files as are.
I will try to have a short sync with the author Maciej, and share what we 
discussed offline.

2020년 7월 22일 (수) 오후 10:43, Maciej Szymkiewicz 
mailto:mszymkiew...@gmail.com>>님이 작성:

W dniu środa, 22 lipca 2020 Driesprong, Fokko  napisał(a):
That's probably one-time overhead so it is not a big issue.  In my opinion, a 
bigger one is possible complexity. Annotations tend to introduce a lot of 
cyclic dependencies in Spark codebase. This can be addressed, but don't look 
great.

This is not true (anymore). With Python 3.6 you can add string annotations -> 
'DenseVector', and in the future with Python 3.7 this is fixed by having 
postponed evaluation: https://www.python.org/dev/peps/pep-0563/

As far as I recall linked PEP addresses backrferences not cyclic dependencies, 
which weren't a big issue in the first place

What I mean is a actually cyclic stuff - for example pyspark.context depends on 
pyspark.rdd and the other way around. These dependencies are not explicit at he 
moment.

Merging stubs into project structure from the other hand has almost no overhead.

This feels awkward to me, this is like having the docstring in a separate file. 
In my opinion you want to have the signatures and the functions together for 
transparency and maintainability.

I guess that's the matter of preference. From maintainability perspective it is 
actually much easier to have separate objects.

For example there are different types of objects that are required for 
meaningful checking, which don't really exist in real code (protocols, aliases, 
code generated signatures fo let complex overloads) as well as some monkey 
patched entities

Additionally it is often easier to see inconsistencies when typing is separate.

However, I am not implying that this should be a persistent state.

In general I see two non breaking paths here.

 - Merge pyspark-stubs a separate subproject within main spark repo and keep it 
in-sync there with common CI pipeline and transfer ownership of pypi package to 
ASF
- Move stubs directly into python/pyspark and then apply individual stubs to 
.modules of choice.

Of course, the first proposal could be an initial step for the latter one.

I think DBT is a very nice project where they use annotations very well: 
https://github.com/fishtown-analytics/dbt/blob/dev/marian-anderson/core/dbt/graph/queue.py

Also, they left out the types in the docstring, since they are available in the 
annotations itself.

In practice, the biggest advantage is actually support for completion, not type 
checking (which works in simple cases).

Agreed.

Would you be interested in writing up the Outreachy proposal for work on this?

I would be, and also happy to mentor. But, I think we first need to agree as a 
Spark community if we want to add the annotations to the code, and in which 
extend.

At some point (in general when things are heavy in generics, which is the case 
here), annotations become somewhat painful to write.

That's true, but that might also be a pointer that it is time to refactor the 
function/code :)

That might the case, but it is more often a matter capturing useful properties 
combined with requirement to keep things in sync with Scala counterparts.

For now, I tend to think adding type hints to the codes make it difficult to 
backport or revert and more difficult to discuss about typing only especially 
considering typing is arguably premature yet.

This feels a bit weird to me, since you want to keep this in sync right? Do you 
provide different stubs for different versions of Python? I had to look up the 
literals: https://www.python.org/dev/peps/pep-0586/

I think it is more about portability between Spark versions

Cheers, Fokko

Op wo 22 jul. 2020 om 09:40 schreef Maciej Szymkiewicz 
mailto:mszymkiew...@gmail.com>>:

On 7/22/20 3:45 AM, Hyukjin Kwon wrote:
> For now, I tend to think adding type hints to the codes make it
> difficult to backport or revert and
> more difficult to discuss about typing only especially considering
> typing is arguably premature yet.

About being premature ‒ since typing ecosystem evolves much faster than
Spark it might be preferable to keep annotations as a separate project
(preferably under AST / Spark umbrella). It allows for faster iterations
and supporting new features (for example Literals proved to be very
useful), without waiting for the next Spark release.

--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP:

Re: [VOTE] Release Apache Superset (incubating) 0.37.0 based on Superset 0.37.0rc4

2020-08-02 Thread Felix Cheung

+1

- incubating in name
- signature and hash fine
- DISCLAIMER is fine
- LICENSE and NOTICE are fine
- No unexpected binary files
- All source files have ASF headers
- compile from source
Flask
Frontend
(While it seemed to work, I was getting 404 on assets, I didn’t look more into 
it)



From: Ville Brofeldt 
Sent: Sunday, August 2, 2020 12:59:59 PM
To: dev@superset.apache.org 
Subject: Re: [VOTE] Release Apache Superset (incubating) 0.37.0 based on 
Superset 0.37.0rc4

+1 (binding)

Ville

On Sun, Aug 2, 2020, 22:58 Daniel Gaspar  wrote:

> +1 (binding)
>
> On 2020/08/01 16:53:35, Ville Brofeldt 
> wrote:
> > Hello Superset Community,
> >
> > With no new regressions or critical bugs having been brought to our
> attention in the 0.37
> > branch during the last week, this is a call for the vote to release
> Apache Superset
> > (incubating) version 0.37.0 based on 0.37.0rc4.
> >
> > New cherries since 0.37.0rc3:
> > - fix: excel sheet upload is not working (#10450) (@pphszx)
> > - feat: support non-numeric columns in pivot table (#10389) (@villebro)
> > - fix(dashboard): chart rerender when switching tabs (#10432) (@ktmud)
> > - fix: incorrect filter operator emitted by Filter Box (#10421)
> (@villebro)
> > - fix: bump pivot-table and rose (#10400) (@villebro)
> > - fix: treemap template literal (#10382) (@villebro)
> > - fix: group by with timestamp granularity (#10344) (@dpgaspar)
> > - fix: modified by column on charts and dashboards (#10340) (@dpgaspar)
> >
> > The release candidate:
> > https://dist.apache.org/repos/dist/dev/incubator/superset/0.37.0rc4/
> >
> > Git tag for the release:
> > https://github.com/apache/incubator-superset/tree/0.37.0rc4
> >
> > The Change Log for the release:
> > https://github.com/apache/incubator-superset/blob/0.37.0rc4/CHANGELOG.md
> >
> > The Updating instructions for the release:
> > https://github.com/apache/incubator-superset/blob/0.37.0rc4/UPDATING.md
> >
> > public keys are available at:
> >
> > https://www.apache.org/dist/incubator/superset/KEYS
> >
> > The vote will be open for at least 72 hours or until the necessary number
> > of votes are reached.
> >
> > Please vote accordingly:
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove with the reason
> >
> > Thanks,
> > The Apache Superset (Incubating) Team
>

Re: Podling Sedona Report Reminder - August 2020

2020-08-02 Thread Felix Cheung

And as a reminder the report is up 
https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Please review.

From: jmcl...@apache.org 
Sent: Saturday, August 1, 2020 5:54:53 PM
To: d...@sedona.incubator.apache.org 
Subject: Podling Sedona Report Reminder - August 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 19 August 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 05).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

Re: YuniKorn web-site V2 now is up

2020-08-01 Thread Felix Cheung

Thanks. I think you should at least having “incubating” once, in the title name 
as in http://pinot.apache.org/

And the disclaimer in footer.

Website address yunikorn.apache.org is ok.

From: Weiwei Yang 
Sent: Saturday, August 1, 2020 1:30:19 PM
To: dev@yunikorn.apache.org 
Cc: lamber...@apache.org 
Subject: Re: YuniKorn web-site V2 now is up

Hi Felix

Thanks for the reminder, appreciate it.
We were reviewing the things that need to be done in order to fix these
issues. I think we need to do the following things:

1. Fix the website URL in
https://incubator.apache.org/projects/yunikorn.html, that should point to
http://yunikorn.incubator.apache.org/
2. Add disclaimer on the website:

Apache *YuniKorn* is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by the name of Apache TLP sponsor. Incubation
is required of all newly accepted projects until a further review indicates
that the infrastructure, communications, and decision making process have
stabilized in a manner consistent with other successful ASF projects. While
incubation status is not necessarily a reflection of the completeness or
stability of the code, it does indicate that the project has yet to be
fully endorsed by the ASF.

But I am not quite sure what else needs to be done. Do we have to say
"Apache YuniKorn (Incubator)" everywhere in the web-site or document? I
checked some other incubator project websites, such as

   - http://ratis.apache.org/
   - https://livy.apache.org/
   - http://flagon.apache.org/

This seems not a requirement.
Please share your thoughts, thanks!

On Sat, Aug 1, 2020 at 12:58 PM Felix Cheung 
wrote:

> Reminder - branding is very important and since the website is public it
> will be great to get this address as soon as possible.
>
> Please see the link for a few things that should be updated on the
> website. Thanks
>
>
>
> 
> From: Weiwei Yang 
> Sent: Thursday, July 23, 2020 11:17:45 PM
> To: dev@yunikorn.apache.org 
> Cc: lamber...@apache.org 
> Subject: Re: YuniKorn web-site V2 now is up
>
> Thanks Felix!
> We will be working on fixing the references in the next couple of days.
>
> On Thu, Jul 23, 2020 at 4:16 PM Felix Cheung 
> wrote:
>
> > Looks great - please include the podling reference
> > https://incubator.apache.org/guides/branding.html#naming
> >
> > 
> > From: Weiwei Yang 
> > Sent: Thursday, July 23, 2020 12:32:22 AM
> > To: dev@yunikorn.apache.org 
> > Cc: lamber...@apache.org 
> > Subject: YuniKorn web-site V2 now is up
> >
> > Hi all
> >
> > I am happy to announce that the new YuniKorn web-site now is up and
> > running. The website is http://yunikorn.apache.org/. We have migrated
> all
> > our documents to the website and moving on, we will maintain our docs
> > there. It supports versioned docs.
> >
> > If you find any issue about content, format, style, or anything, please
> let
> > us know. You can create a JIRA, or post a message on the slack channel.
> >
> > Many thanks for Lamber Ken for initiating this and driving this effort.
> > Thank Wilfred and Sunil for helping the migration and review. Great to
> see
> > this is done before 0.9 release.
> >
> > Thanks
> > Weiwei
> >
>

Re: YuniKorn web-site V2 now is up

2020-08-01 Thread Felix Cheung

Reminder - branding is very important and since the website is public it will 
be great to get this address as soon as possible.

Please see the link for a few things that should be updated on the website. 
Thanks




From: Weiwei Yang 
Sent: Thursday, July 23, 2020 11:17:45 PM
To: dev@yunikorn.apache.org 
Cc: lamber...@apache.org 
Subject: Re: YuniKorn web-site V2 now is up

Thanks Felix!
We will be working on fixing the references in the next couple of days.

On Thu, Jul 23, 2020 at 4:16 PM Felix Cheung 
wrote:

> Looks great - please include the podling reference
> https://incubator.apache.org/guides/branding.html#naming
>
> 
> From: Weiwei Yang 
> Sent: Thursday, July 23, 2020 12:32:22 AM
> To: dev@yunikorn.apache.org 
> Cc: lamber...@apache.org 
> Subject: YuniKorn web-site V2 now is up
>
> Hi all
>
> I am happy to announce that the new YuniKorn web-site now is up and
> running. The website is http://yunikorn.apache.org/. We have migrated all
> our documents to the website and moving on, we will maintain our docs
> there. It supports versioned docs.
>
> If you find any issue about content, format, style, or anything, please let
> us know. You can create a JIRA, or post a message on the slack channel.
>
> Many thanks for Lamber Ken for initiating this and driving this effort.
> Thank Wilfred and Sunil for helping the migration and review. Great to see
> this is done before 0.9 release.
>
> Thanks
> Weiwei
>

Re: Podling Pinot Report Reminder - August 2020

2020-07-31 Thread Felix Cheung

Another reminder

From: Felix Cheung 
Sent: Sunday, July 26, 2020 5:09:32 PM
To: dev@pinot.apache.org ; 
d...@pinot.incubator.apache.org 
Subject: Re: Podling Pinot Report Reminder - August 2020

Reminder on the report.

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

From: jmcl...@apache.org 
Sent: Thursday, July 23, 2020 1:21:50 AM
To: d...@pinot.incubator.apache.org 
Subject: Podling Pinot Report Reminder - August 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 19 August 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 05).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Re: Podling Age Report Reminder - August 2020

2020-07-31 Thread Felix Cheung

Any one could take a shot at the report?

From: jmcl...@apache.org 
Sent: Thursday, July 23, 2020 1:22:06 AM
To: d...@age.incubator.apache.org 
Subject: Podling Age Report Reminder - August 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 19 August 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 05).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

Re: Sedona project website template

2020-07-31 Thread Felix Cheung

Sounds good. You can request on

https://selfserve.apache.org/

To create a repo for incubator-sedona-website.git

From: Jia Yu 
Sent: Friday, July 31, 2020 4:37:15 PM
To: dev@sedona.apache.org 
Subject: Sedona project website template

Hi folks,

We are creating a website for Apache Sedona. The new website will inherit
most of the old tutorials and API docs from GeoSpark website:
https://datasystemslab.github.io/GeoSpark/
which was written in pure markdown + MkDocs Material.

But apparently, the one template was too simple for an incubator project.
We may have multiple different sections including tutorials and blog posts.
And each release should have its own API web pages as well. Also, it should
require minimal efforts to mitigate our markdown-based tutorials to it.

After looking at several Apache project websites,
http://spark.apache.org/
http://zeppelin.apache.org/
http://mesos.apache.org/
https://mxnet.apache.org/
I feel that Jekyll may be a better choice. Any suggestions?

Jia

Podling report for Sedona

2020-07-31 Thread Felix Cheung

Hi,

We are due for the first report, I have taken a shot at the updates, under 
Sedona in

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Please review and edit.

Felix

[jira] [Commented] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166801#comment-17166801
 ] 

Felix Cheung commented on SPARK-20684:
--

[https://github.com/apache/spark/pull/17941#issuecomment-301669567]

[https://github.com/apache/spark/pull/19176#issuecomment-328292002]

[https://github.com/apache/spark/pull/19176#issuecomment-328292789]

 

> expose createOrReplaceGlobalTempView/createGlobalTempView and 
> dropGlobalTempView in SparkR
> --
>
> Key: SPARK-20684
> URL: https://issues.apache.org/jira/browse/SPARK-20684
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>Priority: Major
>
> This is a useful API that is not exposed in SparkR. It will help with moving 
> data between languages on a single single Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung edited comment on SPARK-12172 at 7/29/20, 1:20 AM:


These are methods (map etc) that were never public and not supported.

They were not call-able unless you directly reference the internal namespace 
Spark:::


was (Author: felixcheung):
These are methods (map etc) that were never public and not supported.

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>    Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung edited comment on SPARK-12172 at 7/29/20, 1:19 AM:


These are methods (map etc) that were never public and not supported.


was (Author: felixcheung):
These are methods (map etc) that were never public and not supported.

On Tue, Jul 28, 2020 at 10:18 AM S Daniel Zafar (Jira) 



> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>    Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung commented on SPARK-12172:
--

These are methods (map etc) that were never public and not supported.

On Tue, Jul 28, 2020 at 10:18 AM S Daniel Zafar (Jira) 



> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>    Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC2)

2020-07-26 Thread Felix Cheung

+1


From: Jeff Zhang 
Sent: Sunday, July 26, 2020 7:29:22 AM
To: users 
Cc: dev 
Subject: Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC2)

+1

Tested spark interpreter tutorial, flink interpreter tutorial.

Prabhjyot Singh mailto:prabhjyotsi...@gmail.com>> 
于2020年7月25日周六 下午12:26写道：
+1

On Fri, 24 Jul 2020 at 21:23, moon soo Lee 
mailto:m...@apache.org>> wrote:
+1

I tested

 - build from source
 - src package license file
 - bin package license file
 - new ui access

On Fri, Jul 24, 2020 at 12:36 AM Alex Ott 
mailto:alex...@gmail.com>> wrote:
+1 from me. What's done:

- checked the checksum
- run spark samples
- tested cassandra interpreter
- installed some plugins from helium registry


On Thu, Jul 23, 2020 at 5:04 PM Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:


Hi folks,

I propose the following RC to be released for the Apache Zeppelin 
0.9.0-preview2 release.


The commit id is 31b9ed51f946fed934885d8fbb29e9c183552e70 :
https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=commit;h=31b9ed51f946fed934885d8fbb29e9c183552e70

This corresponds to the tag: v0.9.0-preview2-rc2 :
https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=shortlog;h=refs/tags/v0.9.0-preview2-rc2

The release archives (tgz), signature, and checksums are here
https://dist.apache.org/repos/dist/dev/zeppelin/zeppelin-0.9.0-preview2-rc2/

The release candidate consists of the following source distribution archive
zeppelin-v0.9.0-preview2.tgz

In addition, the following supplementary binary distributions are provided
for user convenience at the same location
zeppelin-0.9.0-preview2-bin-all.tgz


The maven artifacts are here
https://repository.apache.org/content/repositories/orgapachezeppelin-1283/org/apache/zeppelin/

You can find the KEYS file here:
https://dist.apache.org/repos/dist/release/zeppelin/KEYS

Release notes available at
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342692==12316221

Vote will be open for next 72 hours (close at 8am 26/July PDT).

[ ] +1 approve
[ ] 0 no opinion
[ ] -1 disapprove (and reason why)


--
Best Regards

Jeff Zhang


--
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


--
Regards,
Prabhjyot Singh


--
Best Regards

Jeff Zhang

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC2)

2020-07-26 Thread Felix Cheung

+1


From: Jeff Zhang 
Sent: Sunday, July 26, 2020 7:29:22 AM
To: users 
Cc: dev 
Subject: Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC2)

+1

Tested spark interpreter tutorial, flink interpreter tutorial.

Prabhjyot Singh mailto:prabhjyotsi...@gmail.com>> 
于2020年7月25日周六 下午12:26写道：
+1

On Fri, 24 Jul 2020 at 21:23, moon soo Lee 
mailto:m...@apache.org>> wrote:
+1

I tested

 - build from source
 - src package license file
 - bin package license file
 - new ui access

On Fri, Jul 24, 2020 at 12:36 AM Alex Ott 
mailto:alex...@gmail.com>> wrote:
+1 from me. What's done:

- checked the checksum
- run spark samples
- tested cassandra interpreter
- installed some plugins from helium registry


On Thu, Jul 23, 2020 at 5:04 PM Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:


Hi folks,

I propose the following RC to be released for the Apache Zeppelin 
0.9.0-preview2 release.


The commit id is 31b9ed51f946fed934885d8fbb29e9c183552e70 :
https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=commit;h=31b9ed51f946fed934885d8fbb29e9c183552e70

This corresponds to the tag: v0.9.0-preview2-rc2 :
https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=shortlog;h=refs/tags/v0.9.0-preview2-rc2

The release archives (tgz), signature, and checksums are here
https://dist.apache.org/repos/dist/dev/zeppelin/zeppelin-0.9.0-preview2-rc2/

The release candidate consists of the following source distribution archive
zeppelin-v0.9.0-preview2.tgz

In addition, the following supplementary binary distributions are provided
for user convenience at the same location
zeppelin-0.9.0-preview2-bin-all.tgz


The maven artifacts are here
https://repository.apache.org/content/repositories/orgapachezeppelin-1283/org/apache/zeppelin/

You can find the KEYS file here:
https://dist.apache.org/repos/dist/release/zeppelin/KEYS

Release notes available at
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342692==12316221

Vote will be open for next 72 hours (close at 8am 26/July PDT).

[ ] +1 approve
[ ] 0 no opinion
[ ] -1 disapprove (and reason why)


--
Best Regards

Jeff Zhang


--
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


--
Regards,
Prabhjyot Singh


--
Best Regards

Jeff Zhang

[jira] [Closed] (WHIMSY-337) roster - LDAP sync issue

2020-07-26 Thread Felix Cheung (Jira)



 [ 
https://issues.apache.org/jira/browse/WHIMSY-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung closed WHIMSY-337.
---
Resolution: Fixed

> roster - LDAP sync issue
> 
>
> Key: WHIMSY-337
> URL: https://issues.apache.org/jira/browse/WHIMSY-337
> Project: Whimsy
>  Issue Type: Bug
>        Reporter: Felix Cheung
>Priority: Major
>
> I made a roster change,   I saw the email days ago, but LDAP is not changed
>  
> [https://lists.apache.org/thread.html/rb452a0d3d67eb04071e637ee3a9cc98050aaea9af427ab716b4e2c9e%40%3Cprivate.incubator.apache.org%3E]
>  
> [https://whimsy.apache.org/roster/ppmc/superset]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Podling Pinot Report Reminder - August 2020

2020-07-26 Thread Felix Cheung

Reminder on the report.

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

From: jmcl...@apache.org 
Sent: Thursday, July 23, 2020 1:21:50 AM
To: d...@pinot.incubator.apache.org 
Subject: Podling Pinot Report Reminder - August 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 19 August 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, August 05).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/August2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

[jira] [Created] (WHIMSY-337) roster - LDAP sync issue

2020-07-24 Thread Felix Cheung (Jira)

Felix Cheung created WHIMSY-337:
---

 Summary: roster - LDAP sync issue
 Key: WHIMSY-337
 URL: https://issues.apache.org/jira/browse/WHIMSY-337
 Project: Whimsy
  Issue Type: Bug
Reporter: Felix Cheung


I made a roster change,   I saw the email days ago, but LDAP is not changed

 

[https://lists.apache.org/thread.html/rb452a0d3d67eb04071e637ee3a9cc98050aaea9af427ab716b4e2c9e%40%3Cprivate.incubator.apache.org%3E]

 

[https://whimsy.apache.org/roster/ppmc/superset]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Exposing Spark parallelized directory listing & non-locality listing in core

2020-07-22 Thread Felix Cheung

+1

From: Holden Karau 
Sent: Wednesday, July 22, 2020 10:49:49 AM
To: Steve Loughran 
Cc: dev 
Subject: Re: Exposing Spark parallelized directory listing & non-locality 
listing in core

Wonderful. To be clear the patch is more to start the discussion about how we 
want to do it and less what I think is the right way.

On Wed, Jul 22, 2020 at 10:47 AM Steve Loughran 
mailto:ste...@cloudera.com>> wrote:

On Wed, 22 Jul 2020 at 00:51, Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
Hi Folks,

In Spark SQL there is the ability to have Spark do it's partition 
discovery/file listing in parallel on the worker nodes and also avoid locality 
lookups. I'd like to expose this in core, but given the Hadoop APIs it's a bit 
more complicated to do right. I

That's ultimately fixable, if we can sort out what's good from the app side and 
reconcile that with 'what is not pathologically bad across both HDFS and object 
stores".

Bad: globStatus, anything which returns an array rather than a remote iterator, 
encourages treewalk
Good: deep recursive listings, remote iterator results for: incremental/async 
fetch of next page of listing, soon: option for iterator, if cast to 
IOStatisticsSource, actually serve up stats on IO performance during the 
listing. (e.g. #of list calls, mean time to get a list response back., store 
throttle events)

Also look at LocatedFileStatus to see how it parallelises its work. its not 
perfect because wildcards are supported, which means globStatus gets used

happy to talk about this some more, and I'll review the patch

-steve

made a quick POC and two potential different paths we could do for 
implementation and wanted to see if anyone had thoughts - 
https://github.com/apache/spark/pull/29179.

Cheers,

Holden

--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

YouTube Live Streams: https://www.youtube.com/user/holdenkarau

--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

YouTube Live Streams: https://www.youtube.com/user/holdenkarau

[RESULT] [VOTE] Accept Sedona into the Apache Incubator

2020-07-19 Thread Felix Cheung

Hi,

With 6 binding +1 votes, and no -1, to vote to accept Sedona into the
Apache Incubator CLOSED and the vote has PASSED.

[6] +1 Binding votes:
- Jean-Baptiste Onofre
- Dave Fisher
- Kevin Ratnasekera
- Furkan KAMACI
- Willem Jiang
- Felix Cheung

[ 0 ] +0  Binding votes
[ 0 ] -1  Binding votes

Vote thread:
https://lists.apache.org/thread.html/rebedb029694dc71c4e0e82f7d27735d98724ea507e6b05f0fae4f267%40%3Cgeneral.incubator.apache.org%3E

On behalf of the Sedona community, thank you all.
Felix

Re: [VOTE] Accept Sedona into the Apache Incubator

2020-07-19 Thread Felix Cheung

+1 (binding)


On Sat, Jul 18, 2020 at 6:19 AM Willem Jiang  wrote:

> +1 （binding)
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Thu, Jul 16, 2020 at 11:49 AM Felix Cheung 
> wrote:
> >
> > Hi,
> >
> > As we discussed the Sedona proposal [1], I would like to call for a vote
> to
> > accept Sedona into the Apache Incubator.
> >
> > Sedona is a big geospatial data processing engine. The system provides an
> > easy to use Scala, SQL, and Python APIs for spatial data scientists to
> > manage, wrangle, and process geospatial data. The system extends and
> builds
> > upon a popular cluster computing framework (Apache Spark) to provide
> > scalability.
> >
> > The final proposal can be found at
> > https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal
> >
> > Please cast your vote:
> >
> >   [ ] +1, yes, bring Sedona into Incubator
> >   [ ] 0, I don't care either way
> >   [ ] -1, no, do not bring Sedona into Incubator, because...
> >
> > The vote will open for at least 72 hours and only votes from the IPMC
> > members are considered binding, but other votes are welcome!
> >
> > Thanks,
> > Felix
> >
> >
> > -
> > [1]
> >
> https://lists.apache.org/thread.html/r72f0b7ddd3143179045653050de9b7de8508d8f8b8194bd73a53e704%40%3Cgeneral.incubator.apache.org%3E
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

[VOTE] Accept Sedona into the Apache Incubator

2020-07-15 Thread Felix Cheung

Hi,

As we discussed the Sedona proposal [1], I would like to call for a vote to
accept Sedona into the Apache Incubator.

Sedona is a big geospatial data processing engine. The system provides an
easy to use Scala, SQL, and Python APIs for spatial data scientists to
manage, wrangle, and process geospatial data. The system extends and builds
upon a popular cluster computing framework (Apache Spark) to provide
scalability.

The final proposal can be found at
https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal

Please cast your vote:

  [ ] +1, yes, bring Sedona into Incubator
  [ ] 0, I don't care either way
  [ ] -1, no, do not bring Sedona into Incubator, because...

The vote will open for at least 72 hours and only votes from the IPMC
members are considered binding, but other votes are welcome!

Thanks,
Felix


-
[1]
https://lists.apache.org/thread.html/r72f0b7ddd3143179045653050de9b7de8508d8f8b8194bd73a53e704%40%3Cgeneral.incubator.apache.org%3E

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Felix Cheung

Welcome!

From: Nick Pentreath 
Sent: Tuesday, July 14, 2020 10:21:17 PM
To: dev 
Cc: Dilip Biswal ; Jungtaek Lim 
; huaxin gao 
Subject: Re: Welcoming some new Apache Spark committers

Congratulations and welcome as Apache Spark committers!

On Wed, 15 Jul 2020 at 06:59, Prashant Sharma 
mailto:scrapco...@gmail.com>> wrote:
Congratulations all ! It's great to have such committed folks as committers. :)

On Wed, Jul 15, 2020 at 9:24 AM Yi Wu 
mailto:yi...@databricks.com>> wrote:
Congrats!!

On Wed, Jul 15, 2020 at 8:02 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Congrats!

2020년 7월 15일 (수) 오전 7:56, Takeshi Yamamuro 
mailto:linguin@gmail.com>>님이 작성:
Congrats, all!

On Wed, Jul 15, 2020 at 5:15 AM Takuya UESHIN 
mailto:ues...@happy-camper.st>> wrote:
Congrats and welcome!

On Tue, Jul 14, 2020 at 1:07 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Congratulations and welcome!

On Tue, Jul 14, 2020 at 12:36 PM Xingbo Jiang 
mailto:jiangxb1...@gmail.com>> wrote:
Welcome, Huaxin, Jungtaek, and Dilip!

Congratulations!

On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia 
mailto:matei.zaha...@gmail.com>> wrote:
Hi all,

The Spark PMC recently voted to add several new committers. Please join me in 
welcoming them to their new roles! The new committers are:

- Huaxin Gao
- Jungtaek Lim
- Dilip Biswal

All three of them contributed to Spark 3.0 and we’re excited to have them join 
the project.

Matei and the Spark PMC
-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org

--
Takuya UESHIN

--
---
Takeshi Yamamuro

Re: [DISCUSS] project proposal - Apache Sedona

2020-07-14 Thread Felix Cheung

Thanks. I have added Von (welcome onboard!) and some context from this
discussion thread to the proposal post on wiki.

If there are no further questions, I’d like to move this to a VOTE tomorrow.


On Tue, Jul 14, 2020 at 11:48 AM Jia Yu  wrote:

> Thank you very much, Von. Looking forward to it!
>
> 
>
> Jia Yu
>
> Ph.D. in Computer Science
>
> Arizona State University <http://www.asu.edu/>
>
> Reach me via: Homepage <http://jiayuasu.github.io/> | GitHub
> <https://github.com/jiayuasu>
>
>
> On Tue, Jul 14, 2020 at 10:42 AM Felix Cheung 
> wrote:
>
> > Definitely, it would be great to have you.
> >
> >
> > On Tue, Jul 14, 2020 at 12:16 AM Gosling Von 
> wrote:
> >
> > > Hi,
> > >
> > > This is a very interesting geospatial data processing project. You’ve
> got
> > > a good set of mentors!
> > >
> > > If you are looking for a fourth mentor then I volunteer if you are
> > > interested.
> > >
> > > Best Regards,
> > > Von Gosling
> > >
> > > > On Jul 9, 2020, at 9:12 PM, Felix Cheung 
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > We would like to propose Apache Sedona, currently known in its
> > community
> > > as
> > > > GeoSpark, as a new project under the Apache Incubator.
> > > >
> > > > Sedona is a big geospatial data processing engine. The system
> provides
> > an
> > > > easy to use Scala, SQL, and Python APIs for spatial data scientists
> to
> > > > manage, wrangle, and process geospatial data. The system extends and
> > > builds
> > > > upon a popular cluster computing framework (Apache Spark) to provide
> > > > scalability.
> > > >
> > > >
> > > > The proposal can be found at
> > > >
> https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal
> > > >
> > > > The project can benefit from and is seeking more experienced mentor.
> > > >
> > > > Any thought or feedback is appreciated!
> > > > Regards
> > > > Felix
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] project proposal - Apache Sedona

2020-07-14 Thread Felix Cheung

Definitely, it would be great to have you.


On Tue, Jul 14, 2020 at 12:16 AM Gosling Von  wrote:

> Hi,
>
> This is a very interesting geospatial data processing project. You’ve got
> a good set of mentors!
>
> If you are looking for a fourth mentor then I volunteer if you are
> interested.
>
> Best Regards,
> Von Gosling
>
> > On Jul 9, 2020, at 9:12 PM, Felix Cheung  wrote:
> >
> > Hi,
> >
> > We would like to propose Apache Sedona, currently known in its community
> as
> > GeoSpark, as a new project under the Apache Incubator.
> >
> > Sedona is a big geospatial data processing engine. The system provides an
> > easy to use Scala, SQL, and Python APIs for spatial data scientists to
> > manage, wrangle, and process geospatial data. The system extends and
> builds
> > upon a popular cluster computing framework (Apache Spark) to provide
> > scalability.
> >
> >
> > The proposal can be found at
> > https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal
> >
> > The project can benefit from and is seeking more experienced mentor.
> >
> > Any thought or feedback is appreciated!
> > Regards
> > Felix
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [DISCUSS] project proposal - Apache Sedona

2020-07-13 Thread Felix Cheung

Noted below. I will leave others to add anything I’ve missed.

On Mon, Jul 13, 2020 at 8:15 PM Justin Mclean 
wrote:

> Hi,
>
> Currently PiPy is not an ASF supported platform and you may have some
> difficulty in complying with ASF policies. Are you aware of the trademark,
> branding and other ASF policies that apply here? Also are you are aware
> that the ASF release process involve the PPMC and IPMC voting on release
> which can take some time?
>

Re: PyPI - I am aware of the ongoing discussion on the distribution
guideline, plus from working with other communities in the past, so I
believe I can provide some guidance there.

> At the ASF discussions need to take place on the mailing list, you seem to
> have a number of other communication channels, is it going to be an issue
> moving some of that discussion off them and onto the mailing list?

As for community, google group maps well to dev@a.o, Gitter can move to ASF
slack (with dev@ summary), GitHub issue can continue - I would think they
would work well.

> It is rare for projects to exit incubation in under a year. Will there be
> any issues if this process takes longer?
>
> It may be better off to start with a single dev list and add the user list
> later as needed. Is there a reason you want both at the start?

And this project can start with dev@

> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [DISCUSS] project proposal - Apache Sedona

2020-07-13 Thread Felix Cheung

Thank you Jia.
Any other question on this?


On Sun, Jul 12, 2020 at 8:06 PM Jia Yu  wrote:

> Hello Dave,
>
> Thank you for the great question!
>
> We actually have tried our best to reach out to the existing code
> contributors to ask for their support. To be specific, we did the following
> two things:
>
> 1. We created a GitHub issue on GeoSpark repo to explain this donation in
> October 2019.
>
> Please find the issue here:
> https://github.com/DataSystemsLab/GeoSpark/issues/391
>
> We just posted the latest update of this proposal and added that if anyone
> found that we forgot them, we are more than happy to add them.
>
> 2. We have contacted all code contributors individually through emails and
> personal contacts. They all support this donation. The initial committers
> are contributors who made significant contributions to Apache Sedona based
> on the GitHub statistics:
> https://github.com/DataSystemsLab/GeoSpark/graphs/contributors They are
> willing to continuously support this project.
>
> We believe that we have got strong support from the community. Hope this
> can answer your question!
>
> Regards,
> Jia
>
> 
>
> Jia Yu
>
> Ph.D. in Computer Science
>
> Arizona State University <http://www.asu.edu/>
>
> Reach me via: Homepage <http://jiayuasu.github.io/> | GitHub
> <https://github.com/jiayuasu>
>
> On Sun, Jul 12, 2020 at 12:46 PM Dave Fisher 
>> wrote:
>>
>>> Hi -
>>>
>>> You note over 30 contributors to GeoSpark, but there are only 8 initial
>>> committers. Has the move to the Apache Incubator been thoroughly discussed
>>> in the existing community? Is anyone being excluded?
>>>
>>> Best Regards,
>>> Dave
>>>
>>> Sent from my iPhone
>>>
>>> > On Jul 12, 2020, at 12:36 PM, Felix Cheung 
>>> wrote:
>>> >
>>> > Any questions or concerns? If not, I can kick off the VOTE thread
>>> shortly.
>>> >
>>> >
>>> >
>>> >> On Thu, Jul 9, 2020 at 6:12 AM Felix Cheung 
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> We would like to propose Apache Sedona, currently known in its
>>> community
>>> >> as GeoSpark, as a new project under the Apache Incubator.
>>> >>
>>> >> Sedona is a big geospatial data processing engine. The system
>>> provides an
>>> >> easy to use Scala, SQL, and Python APIs for spatial data scientists to
>>> >> manage, wrangle, and process geospatial data. The system extends and
>>> builds
>>> >> upon a popular cluster computing framework (Apache Spark) to provide
>>> >> scalability.
>>> >>
>>> >>
>>> >> The proposal can be found at
>>> >> https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal
>>> >>
>>> >> The project can benefit from and is seeking more experienced mentor.
>>> >>
>>> >> Any thought or feedback is appreciated!
>>> >> Regards
>>> >> Felix
>>> >>
>>> >>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>>
>>>

Re: [DISCUSS] project proposal - Apache Sedona

2020-07-12 Thread Felix Cheung

Any questions or concerns? If not, I can kick off the VOTE thread shortly.



On Thu, Jul 9, 2020 at 6:12 AM Felix Cheung  wrote:

> Hi,
>
> We would like to propose Apache Sedona, currently known in its community
> as GeoSpark, as a new project under the Apache Incubator.
>
> Sedona is a big geospatial data processing engine. The system provides an
> easy to use Scala, SQL, and Python APIs for spatial data scientists to
> manage, wrangle, and process geospatial data. The system extends and builds
> upon a popular cluster computing framework (Apache Spark) to provide
> scalability.
>
>
> The proposal can be found at
> https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal
>
> The project can benefit from and is seeking more experienced mentor.
>
> Any thought or feedback is appreciated!
> Regards
> Felix
>
>

[DISCUSS] project proposal - Apache Sedona

2020-07-09 Thread Felix Cheung

Hi,

We would like to propose Apache Sedona, currently known in its community as
GeoSpark, as a new project under the Apache Incubator.

Sedona is a big geospatial data processing engine. The system provides an
easy to use Scala, SQL, and Python APIs for spatial data scientists to
manage, wrangle, and process geospatial data. The system extends and builds
upon a popular cluster computing framework (Apache Spark) to provide
scalability.


The proposal can be found at
https://cwiki.apache.org/confluence/display/INCUBATOR/Sedona+Proposal

The project can benefit from and is seeking more experienced mentor.

Any thought or feedback is appreciated!
Regards
Felix

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-05 Thread Felix Cheung

I think pluggable storage in shuffle is essential for k8s GA

From: Holden Karau 
Sent: Monday, June 29, 2020 9:33 AM
To: Maxim Gekk
Cc: Dongjoon Hyun; dev
Subject: Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

Should we also consider the shuffle service refactoring to support pluggable 
storage engines as targeting the 3.1 release?

On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk 
mailto:maxim.g...@databricks.com>> wrote:
Hi Dongjoon,

I would add:
- Filters pushdown to JSON (https://github.com/apache/spark/pull/27366)
- Filters pushdown to other datasources like Avro
- Support nested attributes of filters pushed down to JSON

Maxim Gekk

Software Engineer

Databricks, Inc.

On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

After a short celebration of Apache Spark 3.0, I'd like to ask you the 
community opinion on Apache Spark 3.1 feature expectations.

First of all, Apache Spark 3.1 is scheduled for December 2020.
- https://spark.apache.org/versioning-policy.html

I'm expecting the following items:

1. Support Scala 2.13
2. Use Apache Hadoop 3.2 by default for better cloud support
3. Declaring Kubernetes Scheduler GA
In my perspective, the last main missing piece was Dynamic allocation and
- Dynamic allocation with shuffle tracking is already shipped at 3.0.
- Dynamic allocation with worker decommission/data migration is targeting 
3.1. (Thanks, Holden)
4. DSv2 Stabilization

I'm aware of some more features which are on the way currently, but I love to 
hear the opinions from the main developers and more over the main users who 
need those features.

Thank you in advance. Welcome for any comments.

Bests,
Dongjoon.

--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung



-- Forwarded message -

We are pleased to announce that ApacheCon @Home will be held online,
September 29 through October 1.

More event details are available at https://apachecon.com/acah2020 but
there’s a few things that I want to highlight for you, the members.

Yes, the CFP has been reopened. It will be open until the morning of
July 13th. With no restrictions on space/time at the venue, we can
accept talks from a much wider pool of speakers, so we look forward to
hearing from those of you who may have been reluctant, or unwilling, to
travel to the US.
Yes, you can add your project to the event, whether that’s one talk, or
an entire track - we have the room now. Those of you who are PMC members
will be receiving information about how to get your projects represented
at the event.
Attendance is free, as has been the trend in these events in our
industry. We do, however, offer donation options for attendees who feel
that our content is worth paying for.
Sponsorship opportunities are available immediately at
https://www.apachecon.com/acna2020/sponsors.html

If you would like to volunteer to help, we ask that you join the
plann...@apachecon.com mailing list and discuss 
it there, rather than
here, so that we do not have a split discussion, while we’re trying to
coordinate all of the things we have to get done in this very short time
window.

Rich Bowen,
VP Conferences, The Apache Software Foundation

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung


-- Forwarded message -

We are pleased to announce that ApacheCon @Home will be held online,
September 29 through October 1.

More event details are available at https://apachecon.com/acah2020 but
there’s a few things that I want to highlight for you, the members.

Yes, the CFP has been reopened. It will be open until the morning of
July 13th. With no restrictions on space/time at the venue, we can
accept talks from a much wider pool of speakers, so we look forward to
hearing from those of you who may have been reluctant, or unwilling, to
travel to the US.
Yes, you can add your project to the event, whether that’s one talk, or
an entire track - we have the room now. Those of you who are PMC members
will be receiving information about how to get your projects represented
at the event.
Attendance is free, as has been the trend in these events in our
industry. We do, however, offer donation options for attendees who feel
that our content is worth paying for.
Sponsorship opportunities are available immediately at
https://www.apachecon.com/acna2020/sponsors.html

If you would like to volunteer to help, we ask that you join the
plann...@apachecon.com mailing list and discuss 
it there, rather than
here, so that we do not have a split discussion, while we’re trying to
coordinate all of the things we have to get done in this very short time
window.

Rich Bowen,
VP Conferences, The Apache Software Foundation

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung


-- Forwarded message -

We are pleased to announce that ApacheCon @Home will be held online,
September 29 through October 1.

More event details are available at https://apachecon.com/acah2020 but
there’s a few things that I want to highlight for you, the members.

Yes, the CFP has been reopened. It will be open until the morning of
July 13th. With no restrictions on space/time at the venue, we can
accept talks from a much wider pool of speakers, so we look forward to
hearing from those of you who may have been reluctant, or unwilling, to
travel to the US.
Yes, you can add your project to the event, whether that’s one talk, or
an entire track - we have the room now. Those of you who are PMC members
will be receiving information about how to get your projects represented
at the event.
Attendance is free, as has been the trend in these events in our
industry. We do, however, offer donation options for attendees who feel
that our content is worth paying for.
Sponsorship opportunities are available immediately at
https://www.apachecon.com/acna2020/sponsors.html

If you would like to volunteer to help, we ask that you join the
plann...@apachecon.com mailing list and discuss 
it there, rather than
here, so that we do not have a split discussion, while we’re trying to
coordinate all of the things we have to get done in this very short time
window.

Rich Bowen,
VP Conferences, The Apache Software Foundation

Podling report

2020-06-30 Thread Felix Cheung

Hi!

Any volunteer to draft the report?

https://cwiki.apache.org/confluence/display/INCUBATOR/July2020

Re: [Announce] New Zeppelin Committer: Philipp Dallig

2020-06-28 Thread Felix Cheung

Congrats and welcome!


From: Xun Liu 
Sent: Saturday, June 27, 2020 8:58:34 AM
To: dev 
Cc: users ; philipp.dal...@gmail.com 

Subject: Re: [Announce] New Zeppelin Committer: Philipp Dallig

That's great news!
Welcome aboard Philipp!
:-)

On Sat, Jun 27, 2020 at 8:43 AM Yadong Xie 
mailto:vthink...@gmail.com>> wrote:
welcome!

On Saturday, June 27, 2020, Alex Ott 
mailto:alex...@gmail.com>> wrote:

> That's great news! Welcome aboard Philipp!
>
> On Fri, Jun 26, 2020 at 8:23 AM Jeff Zhang 
> mailto:zjf...@gmail.com>> wrote:
>
> >
> > The Project Management Committee (PMC) for Apache Zeppelin
> > has invited Philipp Dallig to become a committer and we are very pleased
> > to announce that he has accepted.
> >
> > We greatly appreciate all of Philipp Dallig's hard work and generous
> > contributions to the project. We look forward to continued involvement in
> > the project.
> >
> > Congratulations & Welcome aboard Philipp Dallig !
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: [Announce] New Zeppelin Committer: Philipp Dallig

2020-06-28 Thread Felix Cheung

Congrats and welcome!


From: Xun Liu 
Sent: Saturday, June 27, 2020 8:58:34 AM
To: dev 
Cc: users ; philipp.dal...@gmail.com 

Subject: Re: [Announce] New Zeppelin Committer: Philipp Dallig

That's great news!
Welcome aboard Philipp!
:-)

On Sat, Jun 27, 2020 at 8:43 AM Yadong Xie 
mailto:vthink...@gmail.com>> wrote:
welcome!

On Saturday, June 27, 2020, Alex Ott 
mailto:alex...@gmail.com>> wrote:

> That's great news! Welcome aboard Philipp!
>
> On Fri, Jun 26, 2020 at 8:23 AM Jeff Zhang 
> mailto:zjf...@gmail.com>> wrote:
>
> >
> > The Project Management Committee (PMC) for Apache Zeppelin
> > has invited Philipp Dallig to become a committer and we are very pleased
> > to announce that he has accepted.
> >
> > We greatly appreciate all of Philipp Dallig's hard work and generous
> > contributions to the project. We look forward to continued involvement in
> > the project.
> >
> > Congratulations & Welcome aboard Philipp Dallig !
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung

Congrats

From: Jungtaek Lim 
Sent: Thursday, June 18, 2020 8:18:54 PM
To: Hyukjin Kwon 
Cc: Mridul Muralidharan ; Reynold Xin ; 
dev ; user 
Subject: Re: [ANNOUNCE] Apache Spark 3.0.0

Great, thanks all for your efforts on the huge step forward!

On Fri, Jun 19, 2020 at 12:13 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Yay!

2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 
mailto:mri...@gmail.com>>님이 작성:
Great job everyone ! Congratulations :-)

Regards,
Mridul

On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin 
mailto:r...@databricks.com>> wrote:

Hi all,

Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many of 
the innovations from Spark 2.x, bringing new ideas as well as continuing 
long-term projects that have been in development. This release resolves more 
than 3400 tickets.

We'd like to thank our contributors and users for their contributions and early 
feedback to this release. This release would not have been possible without you.

To download Spark 3.0.0, head over to the download page: 
http://spark.apache.org/downloads.html

To view the release notes: 
https://spark.apache.org/releases/spark-release-3-0-0.html

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung

Congrats

From: Jungtaek Lim 
Sent: Thursday, June 18, 2020 8:18:54 PM
To: Hyukjin Kwon 
Cc: Mridul Muralidharan ; Reynold Xin ; 
dev ; user 
Subject: Re: [ANNOUNCE] Apache Spark 3.0.0

Great, thanks all for your efforts on the huge step forward!

On Fri, Jun 19, 2020 at 12:13 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Yay!

2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 
mailto:mri...@gmail.com>>님이 작성:
Great job everyone ! Congratulations :-)

Regards,
Mridul

On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin 
mailto:r...@databricks.com>> wrote:

Hi all,

Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many of 
the innovations from Spark 2.x, bringing new ideas as well as continuing 
long-term projects that have been in development. This release resolves more 
than 3400 tickets.

We'd like to thank our contributors and users for their contributions and early 
feedback to this release. This release would not have been possible without you.

To download Spark 3.0.0, head over to the download page: 
http://spark.apache.org/downloads.html

To view the release notes: 
https://spark.apache.org/releases/spark-release-3-0-0.html

Re: About the binary files for source code release

2020-06-10 Thread Felix Cheung

Can this be done in a script - I don’t know much about helm, but suppose
you can commit the content as loose files in the source repo and run a
simple step make the tarballs/tgz


On Wed, Jun 10, 2020 at 12:31 AM Seunghyun Lee  wrote:

> Hi all,
>
> One of the comments for 0.4.0-rc2 candidate was about binary files under
> "/kubernetes" directory.
>
> """
> ./kubernetes/helm/pinot-0.2.0.tgz
> ./kubernetes/helm/pinot/charts/zookeeper-2.1.3.tgz
> ./kubernetes/helm/presto-0.2.0.tgz
> """
>
> We can start to discuss on how to handle those. I extracted the files and
> it's doesn't include any executable files. I would guess that those are
> some configuration/ script files needed to launch zookeeper on Kubernetes.
>
> ❯ tar xvf zookeeper-2.1.3.tgz
> x zookeeper/Chart.yaml
> x zookeeper/values.yaml
> x zookeeper/templates/NOTES.txt
> x zookeeper/templates/_helpers.tpl
> x zookeeper/templates/config-jmx-exporter.yaml
> x zookeeper/templates/config-script.yaml
> x zookeeper/templates/job-chroots.yaml
> x zookeeper/templates/poddisruptionbudget.yaml
> x zookeeper/templates/service-headless.yaml
> x zookeeper/templates/service.yaml
> x zookeeper/templates/servicemonitors.yaml
> x zookeeper/templates/statefulset.yaml
> x zookeeper/.helmignore
> x zookeeper/OWNERS
> x zookeeper/README.md
>
> @Felix Cheung  What are the recommended way to
> handle this? Remove it from source code release? If we want to include
> those tgz files, what is the recommended way to handle them?
>
> Best,
> Seunghyun
>

Re: [VOTE] Release Apache Pinot (incubating) 0.4.0 RC2

2020-06-09 Thread Felix Cheung

+1

But a couple of things to follow up:

mailing list link at the bottom of the web site seems to be broken

- binary files are in the source package
./kubernetes/helm/pinot-0.2.0.tgz
./kubernetes/helm/pinot/charts/zookeeper-2.1.3.tgz
./kubernetes/helm/presto-0.2.0.tgz

- ASF license header is not in one java file
/pinot-spi/src/test/resources/TestRecordReader.java

- build from source
could be helpful to have steps on to getting the right version of and setup
mvn

Other checks done and are fine:
- incubating in name
- signature and hash fine
- DISCLAIMER is fine
- LICENSE and NOTICE are fine


On Mon, Jun 8, 2020 at 9:20 PM H  wrote:

> This is a call for vote to the release Apache Pinot (incubating) version
> 0.4.0
>
> Apache Pinot (incubating) is a distributed columnar storage engine that can
> ingest data in realtime and serve analytical queries at low latency.
>
> Pinot community has voted and approved this release.
>
> Vote thread:
>
> https://lists.apache.org/thread.html/r866256b82048845ed732cb49638294314a0d475c2ea908c2e11bc7a1%40%3Cdev.pinot.apache.org%3E
>
> Result thread:
>
> https://lists.apache.org/thread.html/rdff8e67f8680fd9d4ed61e0baacad3c7c3ad141567fa7325776c85b1%40%3Cdev.pinot.apache.org%3E
>
> The release candidate:
>
> https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.4.0-rc2
>
> Git tag for this release:
> https://github.com/apache/incubator-pinot/tree/release-0.4.0-rc2
>
> Git hash for this release:
> 8355d2e0e489a8d127f2e32793671fba505628a8
>
> The artifacts have been signed with key: 6CC169A6FC19C470, which can be
> found in the following KEYS file.
> https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS
>
> Release notes:
> https://github.com/apache/incubator-pinot/releases/tag/release-0.4.0-rc2
>
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachepinot-1013
>
> Documentation on verifying a release candidate:
>
> https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate
>
>
> The vote will be open for at least 72 hours or until necessary number of
> votes are reached.
>
> Please vote accordingly,
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
>
> Thanks,
> Apache Pinot (incubating) team
>

Mentoring Superset

2020-06-06 Thread Felix Cheung

Hello,

I spoke to Alan and wanted to offer my help to Apache Superset in the 
Incubator, and hope that I can help in its journey to graduation.

I’m Committer and PMC member of Apache Spark, Zeppelin and I am currently 
mentoring Pinot and few other projects.

Regards,
Felix

Re: Crail draft podling report June 2020

2020-06-02 Thread Felix Cheung

Thanks - The wiki is now using the same credentials as your Apache account - 
have you tried that?

From: bernard metzler 
Sent: Tuesday, June 2, 2020 2:04:00 AM
To: dev@crail.apache.org 
Subject: Crail draft podling report June 2020

Please find below a draft podling report for Crail,
subject to discussion, suggestions, changes, etc.

Initially I wanted to directly put that in at
https://cwiki.apache.org/confluence/display/INCUBATOR/June2020

to edit online with you. Unfortunately I lost my
Confluence password for login, and any attempt to reset
it ends up with "An internal error occurred when trying
to change your password." In the interest of time, and
assuming others on the list still have access, I'd ask
one of you to put it in later today if we agreed on the
content.

Thanks,
Bernard.

Here is what I have in mind as a report so far:

---
Crail

Crail is a storage platform for sharing performance critical data in 
distributed data processing jobs at very high speed.

Crail has been incubating since 2017-11-01.
- Three most important unfinished issues to address before graduating:

1. Grow developers community
2. Grow Crail use base
3. More steady release cycle

- Are there any issues that the IPMC or ASF Board need to be aware of?

no.

- How has the community developed since the last report?

We see increasing traffic on the dev mailing list from
Crail users. A recent proposal from Adrian to add elastic
scaling got positive feedback from other committers.

- How has the project developed since the last report?

Crail went through a phase of limited visible activities on
the code base. It is critical the Crail developers
community resumes to be more active to keep the project
making decent progress. Using Crail as an ephemeral data
store for serverless computing emerges as a very suitable
use case. Pushing this forward, aiming at inclusion
in a next release is of common interest among the
committers.

- How would you assess the podling's maturity?

  (Please feel free to add your own commentary.)

  Initial setup
  Working towards first release
  x   Community building
  Nearing graduation
  Other:

- Date of last release:

2020-01-14

- When were the last committers or PPMC members elected?

December 4th, 2018

- Have your mentors been helpful and responsive?

yes.

Re: Podling Pinot Report June 2020

2020-05-31 Thread Felix Cheung

I saw the report was posted but with a question-

No, there isn’t a fixed email thread criteria for graduation. Graduation is 
about maturity of the community.

From: jmcl...@apache.org 
Sent: Friday, May 22, 2020 9:53:41 PM
To: d...@pinot.incubator.apache.org 
Subject: Podling Pinot Report Reminder - June 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 17 June 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, June 03).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/June2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Re: Podling Crail Report Reminder - June 2020

2020-05-31 Thread Felix Cheung

Reminder on the podling  report

From: jmcl...@apache.org 
Sent: Friday, May 22, 2020 9:53 PM
To: d...@crail.incubator.apache.org
Subject: Podling Crail Report Reminder - June 2020

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 17 June 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, June 03).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/June2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

Re: Teaclave Website

2020-05-16 Thread Felix Cheung

Great!


From: Mingshen Sun 
Sent: Friday, May 15, 2020 6:44:48 PM
To: dev@teaclave.apache.org 
Subject: Re: Teaclave Website

Thank you. I've also added community and contributors pages to the website.

On Fri, May 15, 2020 at 6:18 PM Furkan KAMACI  wrote:
>
> Hi,
>
> Thanks for your effort on the website! I see that the ASF Sponsorship URL
> is not right. So, I've created a PR for it.
>
> Kind Regards,
> Furkan KAMACI
>
>
>
> On Sat, May 16, 2020 at 1:35 AM Felix Cheung 
> wrote:
>
> > This is great - you might want to add links to mail archive (dev@) and a
> > page on the community (people)
> >
> > https://incubator.apache.org/guides/sites.html#creating_a_good_podling_site
> >
> >
> >
> >
> > 
> > From: Mingshen Sun 
> > Sent: Thursday, May 14, 2020 5:55:35 PM
> > To: dev@teaclave.apache.org 
> > Subject: Re: Teaclave Website
> >
> > Hi folks,
> >
> > I have setup a website for Teaclave: https://teaclave.apache.org/
> >
> > Currently, the site generator can automatically fetch docs from our
> > main repository. The source code can be found here:
> > https://github.com/apache/incubator-teaclave-website/.
> >
> > I'll include more information in this website. Feel free to comment
> > and help me to improve the website. Thanks.
> >
> >
> > On Wed, May 13, 2020 at 6:39 PM Furkan KAMACI 
> > wrote:
> > >
> > > Hi Mingshen,
> > >
> > > Great!
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On Thu, May 14, 2020 at 4:38 AM Mingshen Sun  wrote:
> > >
> > > > Great, I can access this repo now. It takes about 1h to propagate the
> > rule.
> > > >
> > > > On Wed, May 13, 2020 at 6:19 PM Mingshen Sun  wrote:
> > > > >
> > > > > I have created a new repo (apache/incubator-teaclave-website) for
> > > > > hosting sources and pages here [1].
> > > > >
> > > > > It appears in GitHub immediately
> > > > > (https://github.com/apache/incubator-teaclave-website). However, I
> > > > > don't have access to this repo now. I'm not sure whether it takes
> > some
> > > > > time to propagate the access control rule to GitHub or I need to
> > > > > submit a ticket to INFRA for help.
> > > > >
> > > > > [1] https://gitbox.apache.org/setup/newrepo.html.
> > > > >
> > > > > On Wed, May 13, 2020 at 3:45 PM Furkan KAMACI <
> > furkankam...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Hi Mingshen,
> > > > > >
> > > > > > Sure! Feel free to ask if you have any questions.
> > > > > >
> > > > > > Kind Regards,
> > > > > > Furkan KAMACI
> > > > > >
> > > > > > On Thu, May 14, 2020 at 1:04 AM Mingshen Sun 
> > wrote:
> > > > > >
> > > > > > > Thanks for asking, Furkan.
> > > > > > >
> > > > > > > I did some research on how to deploy a website under the Apache
> > > > > > > infrastructure. There are several options
> > > > > > > (https://infra.apache.org/project-site.html). Since we are
> > mainly
> > > > > > > working on GitHub, I think the GitHub pages fit our needs. We
> > can use
> > > > > > > .asf.yml to configure the deployment
> > > > > > > (
> > > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
> > > > > > > ).
> > > > > > >
> > > > > > > Let me try to start with a simple one with essential information
> > and
> > > > > > > later polishing details.
> > > > > > >
> > > > > > > On Wed, May 13, 2020 at 9:55 AM Furkan KAMACI <
> > > > furkankam...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > As far as I know, there is not a website created for Teaclave
> > yet.
> > > > Is
> > > > > > > there
> > > > > > > > any progress for it?
> > > > > > > >
> > > > > > > > Kind Regards,
> > > > > > > > Furkan KAMACI
> > > > > > >
> > > > > > >
> > -
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@teaclave.apache.org
> > > > > > >
> > > > > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
> > > > For additional commands, e-mail: dev-h...@teaclave.apache.org
> > > >
> > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
> > For additional commands, e-mail: dev-h...@teaclave.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
For additional commands, e-mail: dev-h...@teaclave.apache.org

Re: Teaclave Website

2020-05-15 Thread Felix Cheung

This is great - you might want to add links to mail archive (dev@) and a page 
on the community (people)

https://incubator.apache.org/guides/sites.html#creating_a_good_podling_site





From: Mingshen Sun 
Sent: Thursday, May 14, 2020 5:55:35 PM
To: dev@teaclave.apache.org 
Subject: Re: Teaclave Website

Hi folks,

I have setup a website for Teaclave: https://teaclave.apache.org/

Currently, the site generator can automatically fetch docs from our
main repository. The source code can be found here:
https://github.com/apache/incubator-teaclave-website/.

I'll include more information in this website. Feel free to comment
and help me to improve the website. Thanks.


On Wed, May 13, 2020 at 6:39 PM Furkan KAMACI  wrote:
>
> Hi Mingshen,
>
> Great!
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, May 14, 2020 at 4:38 AM Mingshen Sun  wrote:
>
> > Great, I can access this repo now. It takes about 1h to propagate the rule.
> >
> > On Wed, May 13, 2020 at 6:19 PM Mingshen Sun  wrote:
> > >
> > > I have created a new repo (apache/incubator-teaclave-website) for
> > > hosting sources and pages here [1].
> > >
> > > It appears in GitHub immediately
> > > (https://github.com/apache/incubator-teaclave-website). However, I
> > > don't have access to this repo now. I'm not sure whether it takes some
> > > time to propagate the access control rule to GitHub or I need to
> > > submit a ticket to INFRA for help.
> > >
> > > [1] https://gitbox.apache.org/setup/newrepo.html.
> > >
> > > On Wed, May 13, 2020 at 3:45 PM Furkan KAMACI 
> > wrote:
> > > >
> > > > Hi Mingshen,
> > > >
> > > > Sure! Feel free to ask if you have any questions.
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On Thu, May 14, 2020 at 1:04 AM Mingshen Sun  wrote:
> > > >
> > > > > Thanks for asking, Furkan.
> > > > >
> > > > > I did some research on how to deploy a website under the Apache
> > > > > infrastructure. There are several options
> > > > > (https://infra.apache.org/project-site.html). Since we are mainly
> > > > > working on GitHub, I think the GitHub pages fit our needs. We can use
> > > > > .asf.yml to configure the deployment
> > > > > (
> > > > >
> > https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
> > > > > ).
> > > > >
> > > > > Let me try to start with a simple one with essential information and
> > > > > later polishing details.
> > > > >
> > > > > On Wed, May 13, 2020 at 9:55 AM Furkan KAMACI <
> > furkankam...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > As far as I know, there is not a website created for Teaclave yet.
> > Is
> > > > > there
> > > > > > any progress for it?
> > > > > >
> > > > > > Kind Regards,
> > > > > > Furkan KAMACI
> > > > >
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
> > > > > For additional commands, e-mail: dev-h...@teaclave.apache.org
> > > > >
> > > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
> > For additional commands, e-mail: dev-h...@teaclave.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@teaclave.apache.org
For additional commands, e-mail: dev-h...@teaclave.apache.org

Re: [VOTE] Release Apache YuniKorn (Incubating) 0.8.0

2020-04-29 Thread Felix Cheung

+1 (binding)
Carry over my vote


On Wed, Apr 29, 2020 at 9:05 PM Weiwei Yang  wrote:

> Hello IPMC
>
> The Apache YuniKorn community has voted and approved the release Apache
> YuniKorn (incubating) 0.8.0. We now kindly request the IPMC members review
> and vote in this release.
>
> YuniKorn is a standalone, universal resource scheduler that can support
> both long-running and batch workloads. The current release provides a fully
> functional resource scheduler for K8s.
>
> YuniKorn community vote thread
>
> https://lists.apache.org/thread.html/r3cee36fffb0b0fe72380f188ca290327168ded8ba59a1da02b4bfbd3%40%3Cdev.yunikorn.apache.org%3E
>
> Vote result thread
>
> https://lists.apache.org/thread.html/r7779887aaada163af09bf6c8b33febf97cc64eb3d0134cc87b2dfc01%40%3Cdev.yunikorn.apache.org%3E
>
> Issues included in this release:
> https://issues.apache.org/jira/projects/YUNIKORN/versions/12347742
>
> The release candidate:
>
> https://dist.apache.org/repos/dist/dev/incubator/yunikorn/apache-yunikorn-incubating-0.8.0-rc4/
>
> This release has been signed with PGP
> key 8D076B6491A66D7B94E94519F57176CE11856D1F, corresponding to
> w...@apache.org. You can find the KEYS file here:
> https://dist.apache.org/repos/dist/dev/incubator/yunikorn/KEYS
>
> Git tag for the release:
>
>- https://github.com/apache/incubator-yunikorn-core/tree/v0.8.0
>- https://github.com/apache/incubator-yunikorn-k8shim/tree/v0.8.0
>-
>
> https://github.com/apache/incubator-yunikorn-scheduler-interface/tree/v0.8.0
>- https://github.com/apache/incubator-yunikorn-web/tree/v0.8.0
>
> The vote will be open for at least 72 hours or until the necessary number
> of votes is reached.
>
> Please vote accordingly:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
>
> Thanks,
> The Apache YuniKorn (Incubating) Team
>

Re: [VOTE] Project proposal - Apache AgensGraph Extension

2020-04-15 Thread Felix Cheung

+1 (binding)

I can offer to mentor this project, if needed (have done graph and Postgres
work in the past)


On Wed, Apr 15, 2020 at 11:18 AM Kevin Ratnasekera 
wrote:

> +1 ( binding )
>
> Regards
> Kevin
>
> On Wed, Apr 15, 2020 at 11:26 PM Dave Fisher  wrote:
>
> > +1 (binding)
> >
> > > On Apr 15, 2020, at 8:12 AM, Jim Jagielski  wrote:
> > >
> > > We would like to propose AgensGraph Extension as an Apache incubator
> > project. As such, I am calling a VOTE.
> > >
> > > The AgensGraph Extension provides an extension for PostgreSQL to give
> > the users the ability to leverage graph database on top of the existing
> > relational database with minimal effort. The basic principle of the
> project
> > is to create single storage that can handle both relational and graph
> model
> > data so that the users can use the standard ANSI SQL along with
> openCypher (
> > http://www.opencypher.org), the Graph query language.
> > >
> > > The proposal can be found here:
> > >
> > >   o
> >
> https://cwiki.apache.org/confluence/display/INCUBATOR/AgensGraphExtension
> > >
> > > Please cast your vote. I will leave the polls open for at least 72
> hours.
> > >
> > > PS: Please note that before the podling enters the Incubator, assuming
> a
> > positive vote response, the actual project name will be changed to
> > something acceptable as determined by VP Brand.
> > >
> > > PPS: If you are interested in mentoring, please let us know. We are
> > looking for additional mentors...
> > >
> > > Cheers!
> > > Jim
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>

Reminder March Incubator report

2020-02-29 Thread Felix Cheung

Re: Mentors wanted for projects

2020-02-19 Thread Felix Cheung

Can we call out projects that might be at risk in some way (community
growth, list traffic, low change count)

It might help to bring attention to those needing help or should rather
retire.


On Wed, Feb 19, 2020 at 1:47 PM Justin Mclean 
wrote:

> Hi,
>
> It generally seen that the right amount of mentors for a podling is three.
> Currently we have a few podlings that have less than that.
>
> Podlings that currently only have one  mentor:
> SAMOA
> Spot
> Taverna**
>
> Podlings that have two mentors:
> Annotator
> Daffodil
> Hivemall
> Marvin-AI
> Milagro
> Pony Mail*
> S2Graph
> SDAP
> Superset*
>
> Any existing mentor or IPMC members willing to help these projects out?
>
> Thanks,
> Justin
>
> * Nearing graduation
> ** Considering retiring
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [ANNOUNCE] Apache Submarine 0.3.0 release!

2020-02-10 Thread Felix Cheung

Cool stuff!


From: Wangda Tan 
Sent: Monday, February 10, 2020 9:07:07 PM
To: users@submarine.apache.org 
Cc: d...@submarine.apache.org 
Subject: Re: [ANNOUNCE] Apache Submarine 0.3.0 release!

Awesome! Thanks everybody for helping this release!

Best,
Wangda

On Thu, Feb 6, 2020 at 6:44 AM Xun Liu  wrote:

> Thanks to all contributors!
> Let's continue to develop better features for submarine!!
>
> On Thu, Feb 6, 2020 at 7:25 PM Sunil Govindan  wrote:
>
>> Awesome! Hearty Congratulations to all of you for the first release of
>> Submarine as Apache TLP!
>> Kudos to all who helped in this release!
>>
>> - Sunil
>>
>> On Thu, Feb 6, 2020 at 4:36 PM Zhankun Tang  wrote:
>>
>> > Hi folks,
>> >
>> > It's a great honor for me to announce that the Apache Submarine
>> community
>> > has released Apache Submarine 0.3.0!
>> >
>> > Apache Submarine 0.3.0 is the first release after the spin-off from
>> Apache
>> > Hadoop. It includes 196 patches since the prior version.
>> > The highlighted features are:
>> > - Mini-submarine (YARN)
>> > - Basic Tensorflow job submission to k8s through submarine-server
>> RESTful
>> > API
>> > - Job submission on YARN through submarine-server RPC protocol
>> >
>> > Tons of thanks to our contributors and community! Let's keep fighting!
>> >
>> > *Apache Submarine 0.3.0 released*:
>> > http://submarine.apache.org/releases/submarine-release-0.3.0.html
>> > *Changelog*: https://s.apache.org/4ezw1
>> >
>> > BR,
>> > Zhankun
>> >
>>
>

Re: [ANNOUNCE] Apache Submarine 0.3.0 release!

2020-02-10 Thread Felix Cheung

Cool stuff!


From: Wangda Tan 
Sent: Monday, February 10, 2020 9:07:07 PM
To: us...@submarine.apache.org 
Cc: dev@submarine.apache.org 
Subject: Re: [ANNOUNCE] Apache Submarine 0.3.0 release!

Awesome! Thanks everybody for helping this release!

Best,
Wangda

On Thu, Feb 6, 2020 at 6:44 AM Xun Liu  wrote:

> Thanks to all contributors!
> Let's continue to develop better features for submarine!!
>
> On Thu, Feb 6, 2020 at 7:25 PM Sunil Govindan  wrote:
>
>> Awesome! Hearty Congratulations to all of you for the first release of
>> Submarine as Apache TLP!
>> Kudos to all who helped in this release!
>>
>> - Sunil
>>
>> On Thu, Feb 6, 2020 at 4:36 PM Zhankun Tang  wrote:
>>
>> > Hi folks,
>> >
>> > It's a great honor for me to announce that the Apache Submarine
>> community
>> > has released Apache Submarine 0.3.0!
>> >
>> > Apache Submarine 0.3.0 is the first release after the spin-off from
>> Apache
>> > Hadoop. It includes 196 patches since the prior version.
>> > The highlighted features are:
>> > - Mini-submarine (YARN)
>> > - Basic Tensorflow job submission to k8s through submarine-server
>> RESTful
>> > API
>> > - Job submission on YARN through submarine-server RPC protocol
>> >
>> > Tons of thanks to our contributors and community! Let's keep fighting!
>> >
>> > *Apache Submarine 0.3.0 released*:
>> > http://submarine.apache.org/releases/submarine-release-0.3.0.html
>> > *Changelog*: https://s.apache.org/4ezw1
>> >
>> > BR,
>> > Zhankun
>> >
>>
>

Re: [VOTE] Accept NLPCraft into Apache Incubator

2020-02-09 Thread Felix Cheung

+1

On Sun, Feb 9, 2020 at 1:58 PM Roman Shaposhnik 
wrote:

> On Sun, Feb 9, 2020 at 1:54 PM Konstantin Boudnik  wrote:
> >
> > Hello.
> >
> > As the discussion of NLPCraft proposal [1] has been wrapped up [2] I
> would like
> > to call a VOTE to accept this project into the Apache Incubator.
> >
> > Please cast your vote:
> >
> >   [ ] +1, bring NLPCraft into Incubator
> >   [ ] +0, I don't care either way
> >   [ ] -1, do not bring NLPCraft into Incubator, because...
>
> +1 (binding)
>
> Thanks,
> Roman.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: More publicly documenting the options under spark.sql.*

2020-01-16 Thread Felix Cheung

I think it’s a good idea

From: Hyukjin Kwon 
Sent: Wednesday, January 15, 2020 5:49:12 AM
To: dev 
Cc: Sean Owen ; Nicholas Chammas 
Subject: Re: More publicly documenting the options under spark.sql.*

Resending to the dev list for archive purpose:

I think automatically creating a configuration page isn't a bad idea because I 
think we deprecate and remove configurations which are not created via 
.internal() in SQLConf anyway.

I already tried this automatic generation from the codes at SQL built-in 
functions and I'm pretty sure we can do the similar thing for configurations as 
well.

We could perhaps mimic what hadoop does 
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml

On Wed, 15 Jan 2020, 22:46 Hyukjin Kwon, 
mailto:gurwls...@gmail.com>> wrote:
I think automatically creating a configuration page isn't a bad idea because I 
think we deprecate and remove configurations which are not created via 
.internal() in SQLConf anyway.

I already tried this automatic generation from the codes at SQL built-in 
functions and I'm pretty sure we can do the similar thing for configurations as 
well.

We could perhaps mimic what hadoop does 
https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml

On Wed, 15 Jan 2020, 10:46 Sean Owen, 
mailto:sro...@gmail.com>> wrote:
Some of it is intentionally undocumented, as far as I know, as an
experimental option that may change, or legacy, or safety valve flag.
Certainly anything that's marked an internal conf. (That does raise
the question of who it's for, if you have to read source to find it.)

I don't know if we need to overhaul the conf system, but there may
indeed be some confs that could legitimately be documented. I don't
know which.

On Tue, Jan 14, 2020 at 7:32 PM Nicholas Chammas
mailto:nicholas.cham...@gmail.com>> wrote:
>
> I filed SPARK-30510 thinking that we had forgotten to document an option, but 
> it turns out that there's a whole bunch of stuff under SQLConf.scala that has 
> no public documentation under http://spark.apache.org/docs.
>
> Would it be appropriate to somehow automatically generate a documentation 
> page from SQLConf.scala, as Hyukjin suggested on that ticket?
>
> Another thought that comes to mind is moving the config definitions out of 
> Scala and into a data format like YAML or JSON, and then sourcing that both 
> for SQLConf as well as for whatever documentation page we want to generate. 
> What do you think of that idea?
>
> Nick
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org

Re: [VOTE] Accept YuniKorn into Apache Incubator

2020-01-10 Thread Felix Cheung

+1 (binding)

On Fri, Jan 10, 2020 at 9:54 AM David Lyle  wrote:

> +1 (non-binding)
>
> -David...
>
>
> On Fri, Jan 10, 2020 at 12:47 PM Vinod Kumar Vavilapalli <
> vino...@apache.org>
> wrote:
>
> > Hi,
> >
> > I'd like to call a vote on accepting YuniKorn into the Apache Incubator.
> >
> > Please see the discussion thread [1].
> >
> > Please see the full proposal:
> > https://cwiki.apache.org/confluence/display/INCUBATOR/YuniKornProposal
> >
> > Please cast your vote
> >
> > [ ] +1 Accept YuniKorn into the Incubator
> > [ ] +0 Indifferent to the acceptance of YuniKorn
> > [ ] -1 Do not accept YuniKorn because …
> >
> > The vote will be open at least for 72 hours.
> >
> > Incubator PMC member votes are binding. Everyone else is welcomed to vote
> > too (mark them as non-binding if you can)!
> >
> > Thanks
> > +Vinod
> >
> > [1] [DISCUSS] YuniKorn Proposal
> >
> https://lists.apache.org/thread.html/59a3fc019119352f06e75a2bae5c25cd1b652282d7a59b85ed2188cf%40%3Cgeneral.incubator.apache.org%3E
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung

Maybe it’s the reverse - the package is built to run in latest but not 
compatible with slightly older (3.5.2 was Dec 2018)


From: Jeff Zhang 
Sent: Thursday, December 26, 2019 5:36:50 PM
To: Felix Cheung 
Cc: user.spark 
Subject: Re: Fail to use SparkR of 3.0 preview 2

I use R 3.5.2

Felix Cheung mailto:felixcheun...@hotmail.com>> 
于2019年12月27日周五 上午4:32写道：
It looks like a change in the method signature in R base packages.

Which version of R are you running on?


From: Jeff Zhang mailto:zjf...@gmail.com>>
Sent: Thursday, December 26, 2019 12:46:12 AM
To: user.spark mailto:user@spark.apache.org>>
Subject: Fail to use SparkR of 3.0 preview 2

I tried SparkR of spark 3.0 preview 2, but hit the following issue.

Error in rbind(info, getNamespaceInfo(env, "S3methods")) :
  number of columns of matrices must match (see arg 2)
Error: package or namespace load failed for ‘SparkR’ in rbind(info, 
getNamespaceInfo(env, "S3methods")):
 number of columns of matrices must match (see arg 2)
During startup - Warning messages:
1: package ‘SparkR’ was built under R version 3.6.2
2: package ‘SparkR’ in options("defaultPackages") was not found

Does anyone know what might be wrong ? Thanks



--
Best Regards

Jeff Zhang


--
Best Regards

Jeff Zhang

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung

It looks like a change in the method signature in R base packages.

Which version of R are you running on?


From: Jeff Zhang 
Sent: Thursday, December 26, 2019 12:46:12 AM
To: user.spark 
Subject: Fail to use SparkR of 3.0 preview 2

I tried SparkR of spark 3.0 preview 2, but hit the following issue.

Error in rbind(info, getNamespaceInfo(env, "S3methods")) :
  number of columns of matrices must match (see arg 2)
Error: package or namespace load failed for ‘SparkR’ in rbind(info, 
getNamespaceInfo(env, "S3methods")):
 number of columns of matrices must match (see arg 2)
During startup - Warning messages:
1: package ‘SparkR’ was built under R version 3.6.2
2: package ‘SparkR’ in options("defaultPackages") was not found

Does anyone know what might be wrong ? Thanks



--
Best Regards

Jeff Zhang

Re: [VOTE] Apache Crail 1.2-incubating (rc2)

2019-12-18 Thread Felix Cheung

+1 binding

Checked
Name
Disclaimer
Signature
License
Header
Compile from source


On Thu, Dec 12, 2019 at 6:52 PM Justin Mclean 
wrote:

> Hi,
>
> +1 (binding)
>
> I checked:
> - incubating in name
> - signatures and hashes good
> - DISCLAIMER exists
> - LICENSE is fine
> - NOTICE has incorrect year
> - all source file have ASF headers
> - no unexpected binary files
> - can compile from source
>
> The CREDITS file is a little unusual as what’s in that file normal goes in
> NOTICE.
>
> Along with teh NOTICE file, this file also has the incorrect copyright
> year [1].
>
> Thanks,
> Justin
>
> 1. ./doc/source/conf.py
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [VOTE] Apache Crail 1.2-incubating (rc2)

2019-12-17 Thread Felix Cheung

dist.apache.org is having some issue. Let me check back



On Mon, Dec 16, 2019 at 11:57 PM Adrian Schuepbach <
adrian.schuepb...@gribex.net> wrote:

> Hi Felix
>
> Would you be available to vote on this Apache Crail release?
>
> Thanks
> Adrian
>
>
>
> On 12/14/19 00:32, Adrian Schüpbach wrote:
> > Dear all
> >
> > Thanks to Justin and Julian for having voted already.
> >
> > We need at least one more vote and would like to
> > kindly ask for another vote on the release of
> > Apache Crail 1.2-incubating.
> >
> > Thanks a lot
> > Adrian
> >
> >
> >
> > On 09.12.19 23:20, Adrian Schuepbach wrote:
> >> Please vote to approve the source release of Apache Crail 1.2-incubating
> >> (rc2).
> >>
> >> The podling dev vote thread:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00745.html
> >>
> >> The result:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00751.html
> >>
> >> Commit hash: 5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >>
> >>
> https://gitbox.apache.org/repos/asf?p=incubator-crail.git;a=commit;h=5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >>
> >> Release files can be found at:
> >> https://dist.apache.org/repos/dist/dev/incubator/crail/1.2-rc2/
> >>
> >> The Nexus Staging URL:
> >> https://repository.apache.org/content/repositories/orgapachecrail-1010
> >>
> >> Release artifacts are signed with the following key:
> >> https://www.apache.org/dist/incubator/crail/KEYS
> >>
> >> For information about the contents of this release, see:
> >>
> https://gitbox.apache.org/repos/asf?p=incubator-crail.git;a=blob;f=HISTORY.md;h=e68ec2546e4fa353c68c7c7940a8804c6968cd23;hb=5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >> or https://github.com/apache/incubator-crail/blob/v1.2-rc2/HISTORY.md
> >>
> >> The vote is open for at least 72 hours and passes if a majority of at
> >> least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Crail 1.0-incubating
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >> Adrian
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> --
> Adrian Schüpbach, Dr. sc. ETH Zürich
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [VOTE] Apache Crail 1.2-incubating (rc2)

2019-12-17 Thread Felix Cheung

I will look into later today.

On Mon, Dec 16, 2019 at 11:57 PM Adrian Schuepbach <
adrian.schuepb...@gribex.net> wrote:

> Hi Felix
>
> Would you be available to vote on this Apache Crail release?
>
> Thanks
> Adrian
>
>
>
> On 12/14/19 00:32, Adrian Schüpbach wrote:
> > Dear all
> >
> > Thanks to Justin and Julian for having voted already.
> >
> > We need at least one more vote and would like to
> > kindly ask for another vote on the release of
> > Apache Crail 1.2-incubating.
> >
> > Thanks a lot
> > Adrian
> >
> >
> >
> > On 09.12.19 23:20, Adrian Schuepbach wrote:
> >> Please vote to approve the source release of Apache Crail 1.2-incubating
> >> (rc2).
> >>
> >> The podling dev vote thread:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00745.html
> >>
> >> The result:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00751.html
> >>
> >> Commit hash: 5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >>
> >>
> https://gitbox.apache.org/repos/asf?p=incubator-crail.git;a=commit;h=5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >>
> >> Release files can be found at:
> >> https://dist.apache.org/repos/dist/dev/incubator/crail/1.2-rc2/
> >>
> >> The Nexus Staging URL:
> >> https://repository.apache.org/content/repositories/orgapachecrail-1010
> >>
> >> Release artifacts are signed with the following key:
> >> https://www.apache.org/dist/incubator/crail/KEYS
> >>
> >> For information about the contents of this release, see:
> >>
> https://gitbox.apache.org/repos/asf?p=incubator-crail.git;a=blob;f=HISTORY.md;h=e68ec2546e4fa353c68c7c7940a8804c6968cd23;hb=5597a87e9d6eab1877f18bf5e4e5935f172f866c
> >> or https://github.com/apache/incubator-crail/blob/v1.2-rc2/HISTORY.md
> >>
> >> The vote is open for at least 72 hours and passes if a majority of at
> >> least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Crail 1.0-incubating
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >> Adrian
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> --
> Adrian Schüpbach, Dr. sc. ETH Zürich
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [DISCUSS] YuniKorn Proposal

2019-12-11 Thread Felix Cheung

gt; - gotest.tools, Test library, Apache License 2.0
> > - github.com/stretchr/testify, Test library, MIT License
> > - Karma, Unit test library, MIT License
> > - Protactor, End2End test library, MIT License
> > - Json-server, Test server, MIT License
> > - Yarn, Dependency manager, BSD 2-Clause License
> >
> > 2.8.2 Cryptography
> > YuniKorn does not currently include any cryptography-related code.
> >
> > 2.9 Required Resources
> >
> > 2.9.1 Mailing lists:
> > - priv...@yunikorn.incubator.apache.org (PMC list)
> > - comm...@yunikorn.incubator.apache.org (git push emails)
> > - iss...@yunikorn.incubator.apache.org (JIRA issue feed)
> > - d...@yunikorn.incubator.apache.org (Dev discussion)
> > - u...@yunikorn.incubator.apache.org (User questions)
> >
> > 2.9.2 Git Repositories
> > Git is the preferred source control system
> > - git://git.apache.org/yunikorn-* (We have multiple git repositories)
> >
> > 2.9.3 Issue Tracking
> > JIRA YuniKorn (*YUNIKORN-*)
> >
> > 2.9.4 Other Resources
> > We have published a series of demo videos on the Youtube channel:
> > https://www.youtube.com/channel/UCDSJ2z-lEZcjdK27tTj_hGw
> >
> > 2.10 Initial Committers and Affinities
> > Initial committers and affinities are listed as below:
> > - Akhil PB (a...@cloudera.com) (Cloudera)
> > - Sunil Govindan (sun...@apache.org) (Cloudera)
> > - Vinod Kumar Vavilapalli (vino...@apache.org) (Cloudera)
> > - Wangda Tan (wan...@apache.org) (Cloudera)
> > - Weiwei Yang (w...@apache.org) (Cloudera)
> > - Wilfred Spiegelenburg (wspiegelenb...@cloudera.com) (Cloudera)
> > - Carlo Curino (cur...@apache.org) (Microsoft)
> > - Subramaniam Krishnan (su...@apache.org) (Microsoft)
> > - Arun Suresh (asur...@apache.org) (Microsoft)
> > - Konstantinos Karanasos (kkarana...@apache.org) (Microsoft)
> > - Jonathan Hung (jh...@apache.org) (LinkedIn)
> > - DB Tsai (dbt...@apache.org) (Apple)
> > - Junping Du (junping...@apache.org) (Tencent)
> > - Tao Yang (taoy...@apache.org) (Alibaba)
> > - Jason Lowe (jl...@apache.org) (Nvidia)
> >
> > 2.11 Sponsors
> > Champion
> > - Vinod Kumar Vavilapalli (vino...@apache.org)
> >
> > Nominated Mentors
> > - Junping Du (Tencent), (junping...@apache.org)
> > - Felix Cheung (Uber), (felixche...@apache.org)
> > - Jason Lowe (Nvidia), (jl...@apache.org)
> > - Holden Karau (Apple), (hol...@apache.org)
> >
> > Sponsoring Entity
> > - The Apache Incubator
> >
> > [1]
> https://cwiki.apache.org/confluence/display/INCUBATOR/YuniKornProposal
> >
> >  END OF THE PROPOSAL
> > ---
> >
> > Thanks
> > Weiwei
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: Podling Crail Report Reminder - December 2019

2019-12-02 Thread Felix Cheung

Thanks. We could also report a release is ongoing.


From: bernard metzler 
Sent: Monday, December 2, 2019 9:24:04 AM
To: dev@crail.apache.org ; Felix Cheung 

Subject: Re: Podling Crail Report Reminder - December 2019

Hi Felix,

Sure we will do. Just trying to get another release in shape to
have something great to report ;) OK we will likely fail that
release before December 5th, but we will report!

Thanks
Bernard.


On 12/2/2019 18:04, Felix Cheung wrote:
> Hello- reminder on the report this week. Thanks!
>
>
> 
> From: jmcl...@apache.org 
> Sent: Saturday, November 30, 2019 3:49:40 PM
> To: d...@crail.incubator.apache.org 
> Subject: Podling Crail Report Reminder - December 2019
>
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 18 December 2019, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, December 04).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Candidate names should not be made public before people are actually
> elected, so please do not include the names of potential committers or
> PPMC members in your report.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
>  the project or necessarily of its field
> *   A list of the three most important issues to address in the move
>  towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
>  aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://cwiki.apache.org/confluence/display/INCUBATOR/December2019
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Note: The format of the report has changed to use markdown.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>

Re: Podling Crail Report Reminder - December 2019

2019-12-02 Thread Felix Cheung

Hello- reminder on the report this week. Thanks!

From: jmcl...@apache.org 
Sent: Saturday, November 30, 2019 3:49:40 PM
To: d...@crail.incubator.apache.org 
Subject: Podling Crail Report Reminder - December 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 18 December 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, December 04).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/December2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

Re: SparkR integration with Hive 3 spark-r

2019-11-24 Thread Felix Cheung

I think you will get more answer if you ask without SparkR.

You question is independent on SparkR.

Spark support for Hive 3.x (3.1.2) was added here

https://github.com/apache/spark/commit/1b404b9b9928144e9f527ac7b1caa15f932c2649

You should be able to connect Spark to Hive metastore.

From: Alfredo Marquez 
Sent: Friday, November 22, 2019 4:26:49 PM
To: user@spark.apache.org 
Subject: Re: SparkR integration with Hive 3 spark-r

Does anyone else have some insight to this question?

Thanks,

Alfredo

On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez 
mailto:alfredo.g.marq...@gmail.com>> wrote:
Hello Nicolas,

Well the issue is that with Hive 3, Spark gets it's own metastore, separate 
from the Hive 3 metastore.  So how do you reconcile this separation of 
metastores?

Can you continue to "enableHivemetastore" and be able to connect to Hive 3? 
Does this connection take advantage of Hive's LLAP?

Our team doesn't believe that it's possible to make the connection as you would 
in the past.  But if it is that simple, I would be ecstatic .

Thanks,

Alfredo

On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris 
mailto:nicolas.pa...@riseup.net>> wrote:
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.

[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge 
> loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org

Re: Enabling fully disaggregated shuffle on Spark

2019-11-20 Thread Felix Cheung

Great!

Due to number of constraints I won’t be sending link directly here but please r 
me and I will add you.

From: Ben Sidhom 
Sent: Wednesday, November 20, 2019 9:10:01 AM
To: John Zhuge 
Cc: bo yang ; Amogh Margoor ; Ryan Blue 
; Ben Sidhom ; Spark Dev List 
; Christopher Crosbie ; Griselda 
Cuevas ; Holden Karau ; Mayank Ahuja 
; Kalyan Sivakumar ; alfo...@fb.com 
; Felix Cheung ; Matt Cheah 
; Yifei Huang (PD) 
Subject: Re: Enabling fully disaggregated shuffle on Spark

That sounds great!

On Wed, Nov 20, 2019 at 9:02 AM John Zhuge 
mailto:jzh...@apache.org>> wrote:
That will be great. Please send us the invite.

On Wed, Nov 20, 2019 at 8:56 AM bo yang 
mailto:bobyan...@gmail.com>> wrote:
Cool, thanks Ryan, John, Amogh for the reply! Great to see you interested! 
Felix will have a Spark Scalability & Reliability Sync meeting on Dec 4 1pm 
PST. We could discuss more details there. Do you want to join?

On Tue, Nov 19, 2019 at 4:23 PM Amogh Margoor 
mailto:amo...@qubole.com>> wrote:
We at Qubole are also looking at disaggregating shuffle on Spark. Would love to 
collaborate and share learnings.

Regards,
Amogh

On Tue, Nov 19, 2019 at 4:09 PM John Zhuge 
mailto:jzh...@apache.org>> wrote:
Great work, Bo! Would love to hear the details.

On Tue, Nov 19, 2019 at 4:05 PM Ryan Blue  wrote:
I'm interested in remote shuffle services as well. I'd love to hear about what 
you're using in production!

rb

On Tue, Nov 19, 2019 at 2:43 PM bo yang 
mailto:bobyan...@gmail.com>> wrote:
Hi Ben,

Thanks for the writing up! This is Bo from Uber. I am in Felix's team in 
Seattle, and working on disaggregated shuffle (we called it remote shuffle 
service, RSS, internally). We have put RSS into production for a while, and 
learned a lot during the work (tried quite a few techniques to improve the 
remote shuffle performance). We could share our learning with the community, 
and also would like to hear feedback/suggestions on how to further improve 
remote shuffle performance. We could chat more details if you or other people 
are interested.

Best,
Bo

On Fri, Nov 15, 2019 at 4:10 PM Ben Sidhom  wrote:

I would like to start a conversation about extending the Spark shuffle manager 
surface to support fully disaggregated shuffle implementations. This is closely 
related to the work in 
SPARK-25299<https://issues.apache.org/jira/browse/SPARK-25299>, which is 
focused on refactoring the shuffle manager API (and in particular, 
SortShuffleManager) to use a pluggable storage backend. The motivation for that 
SPIP is further enabling Spark on Kubernetes.

The motivation for this proposal is enabling full externalized (disaggregated) 
shuffle service implementations. (Facebook’s Cosco 
shuffle<https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service>
 is one example of such a disaggregated shuffle service.) These changes allow 
the bulk of the shuffle to run in a remote service such that minimal state 
resides in executors and local disk spill is minimized. The net effect is 
increased job stability and performance improvements in certain scenarios. 
These changes should work well with or are complementary to SPARK-25299. Some 
or all points may be merged into that issue as appropriate.

Below is a description of each component of this proposal. These changes can 
ideally be introduced incrementally. I would like to gather feedback and gauge 
interest from others in the community to collaborate on this. There are likely 
more points that would  be useful to disaggregated shuffle services. We can 
outline a more concrete plan after gathering enough input. A working session 
could help us kick off this joint effort; maybe something in the mid-January to 
mid-February timeframe (depending on interest and availability. I’m happy to 
host at our Sunnyvale, CA offices.

Proposal
Scheduling and re-executing tasks

Allow coordination between the service and the Spark DAG scheduler as to 
whether a given block/partition needs to be recomputed when a task fails or 
when shuffle block data cannot be read. Having such coordination is important, 
e.g., for suppressing recomputation after aborted executors or for forcing late 
recomputation if the service internally acts as a cache. One catchall solution 
is to have the shuffle manager provide an indication of whether shuffle data is 
external to executors (or nodes). Another option: allow the shuffle manager 
(likely on the driver) to be queried for the existence of shuffle data for a 
given executor ID (or perhaps map task, reduce task, etc). Note that this is at 
the level of data the scheduler is aware of (i.e., map/reduce partitions) 
rather than block IDs, which are internal details for some shuffle managers.

ShuffleManager API

Add a heartbeat (keep-alive) mechanism to RDD shuffle output so that the 
service knows that data is still active. This is one way to enable 
time-/j

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-20 Thread Felix Cheung

Just to add - hive 1.2 fork is definitely not more stable. We know of a few 
critical bug fixes that we cherry picked into a fork of that fork to maintain 
ourselves.



From: Dongjoon Hyun 
Sent: Wednesday, November 20, 2019 11:07:47 AM
To: Sean Owen 
Cc: dev 
Subject: Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

Thanks. That will be a giant step forward, Sean!

> I'd prefer making it the default in the POM for 3.0.

Bests,
Dongjoon.

On Wed, Nov 20, 2019 at 11:02 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
Yeah 'stable' is ambiguous. It's old and buggy, but at least it's the
same old and buggy that's been there a while. "stable" in that sense
I'm sure there is a lot more delta between Hive 1 and 2 in terms of
bug fixes that are important; the question isn't just 1.x releases.

What I don't know is how much affects Spark, as it's a Hive client
mostly. Clearly some do.

I'd prefer making it the default in the POM for 3.0. Mostly on the
grounds that its effects are on deployed clusters, not apps. And
deployers can still choose a binary distro with 1.x or make the choice
they want. Those that don't care should probably be nudged to 2.x.
Spark 3.x is already full of behavior changes and 'unstable', so I
think this is minor relative to the overall risk question.

On Wed, Nov 20, 2019 at 12:53 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> Hi, All.
>
> I'm sending this email because it's important to discuss this topic narrowly
> and make a clear conclusion.
>
> `The forked Hive 1.2.1 is stable`? It sounds like a myth we created
> by ignoring the existing bugs. If you want to say the forked Hive 1.2.1 is
> stabler than XXX, please give us the evidence. Then, we can fix it.
> Otherwise, let's stop making `The forked Hive 1.2.1` invincible.
>
> Historically, the following forked Hive 1.2.1 has never been stable.
> It's just frozen. Since the forked Hive is out of our control, we ignored 
> bugs.
> That's all. The reality is a way far from the stable status.
>
> https://mvnrepository.com/artifact/org.spark-project.hive/
> 
> https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark
>  (2015 August)
> 
> https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2
>  (2016 April)
>
> First, let's begin Hive itself by comparing with Apache Hive 1.2.2 and 1.2.3,
>
> Apache Hive 1.2.2 has 50 bug fixes.
> Apache Hive 1.2.3 has 9 bug fixes.
>
> I will not cover all of them, but Apache Hive community also backports
> important patches like Apache Spark community.
>
> Second, let's move to SPARK issues because we aren't exposed to all Hive 
> issues.
>
> SPARK-19109 ORC metadata section can sometimes exceed protobuf message 
> size limit
> SPARK-22267 Spark SQL incorrectly reads ORC file when column order is 
> different
>
> These were reported since Apache Spark 1.6.x because the forked Hive doesn't 
> have
> a proper upstream patch like HIVE-11592 (fixed at Apache Hive 1.3.0).
>
> Since we couldn't update the frozen forked Hive, we added Apache ORC 
> dependency
> at SPARK-20682 (2.3.0), added a switching configuration at SPARK-20728 
> (2.3.0),
> tured on `spark.sql.hive.convertMetastoreOrc by default` at SPARK-22279 
> (2.4.0).
> However, if you turn off the switch and start to use the forked hive,
> you will be exposed to the buggy forked Hive 1.2.1 again.
>
> Third, let's talk about the new features like Hadoop 3 and JDK11.
> No one believe that the ancient forked Hive 1.2.1 will work with this.
> I saw that the following issue is mentioned as an evidence of Hive 2.3.6 bug.
>
> SPARK-29245 ClassCastException during creating HiveMetaStoreClient
>
> Yes. I know that issue because I reported it and verified HIVE-21508.
> It's fixed already and will be released ad Apache Hive 2.3.7.
>
> Can we imagine something like this in the forked Hive 1.2.1?
> 'No'. There is no future on it. It's frozen.
>
> From now, I want to claim that the forked Hive 1.2.1 is the unstable one.
> I welcome all your positive and negative opinions.
> Please share your concerns and problems and fix them together.
> Apache Spark is an open source project we shared.
>
> Bests,
> Dongjoon.
>

Re: [VOTE] Release Apache Pinot (incubating) 0.2.0 RC0

2019-11-20 Thread Felix Cheung

+1

Built and ran tests.

I’ve been trying to run some tools in the last few days for license header
in file, so far the tools are not working out. Oh well.

Btw, rat check doesn’t seem to cover all files in the project?



On Mon, Nov 18, 2019 at 11:35 AM Subbu Subramaniam 
wrote:

> Hi all,
>
> This is a call for vote to the release Apache Pinot (incubating) version
> 0.2.0.
>
> Apache Pinot (incubating) is a distributed columnar storage engine that can
> ingest data in realtime and serve analytical queries at low latency.
>
> Pinot community has voted and approved this release.
>
> Vote threads:
>
> https://lists.apache.org/thread.html/302ea5dd91e731eaaf9ce3881e027613ebc76a8c01472c0517b56e89@%3Cdev.pinot.apache.org%3E
>
> https://lists.apache.org/thread.html/3d785c502905b23f1d4d1da8718501a7cb4ef8245dd665fe99bf6daf@%3Cdev.pinot.apache.org%3E
>
> Result thread:
>
> https://lists.apache.org/thread.html/c425825f7d9da198adfbc5c5633dd0300f77f0ce3f9b88dba92dbe60@%3Cdev.pinot.apache.org%3E
>
> The release candidate:
>
> https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.2.0-rc0
>
> Git tag for this release:
> https://github.com/apache/incubator-pinot/tree/release-0.2.0-rc0
>
> Git hash for this release:
> f8e1980c4160ac7fd2686d9edefab9ac0a825c5b
>
> The artifacts have been signed with key: B530034C, which can be
> found in the following KEYS file.
> https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS
>
> Release notes:
> https://github.com/apache/incubator-pinot/releases/tag/release-0.2.0-rc0
>
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachepinot-1003
>
> Documentation on verifying a release candidate:
>
> https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate
>
>
> The vote will be open for at least 72 hours or until necessary number of
> votes are reached.
>
> Please vote accordingly,
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
>
> Thanks,
>
> -Subbu Subramaniam
> (on behalf of Apache Pinot (incubating) team)
>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-18 Thread Felix Cheung

1000% with Steve, the org.spark-project hive 1.2 will need a solution. It is 
old and rather buggy; and It’s been *years*

I think we should decouple hive change from everything else if people are 
concerned?

From: Steve Loughran 
Sent: Sunday, November 17, 2019 9:22:09 AM
To: Cheng Lian 
Cc: Sean Owen ; Wenchen Fan ; Dongjoon 
Hyun ; dev ; Yuming Wang 

Subject: Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Can I take this moment to remind everyone that the version of hive which spark 
has historically bundled (the org.spark-project one) is an orphan project put 
together to deal with Hive's shading issues and a source of unhappiness in the 
Hive project. What ever get shipped should do its best to avoid including that 
file.

Postponing a switch to hadoop 3.x after spark 3.0 is probably the safest move 
from a risk minimisation perspective. If something has broken then it is you 
can start with the assumption that it is in the o.a.s packages without having 
to debug o.a.hadoop and o.a.hive first. There is a cost: if there are problems 
with the hadoop / hive dependencies those teams will inevitably ignore filed 
bug reports for the same reason spark team will probably because 1.6-related 
JIRAs as WONTFIX. WONTFIX responses for the Hadoop 2.x line include any 
compatibility issues with Java 9+. Do bear that in mind. It's not been tested, 
it has dependencies on artifacts we know are incompatible, and as far as the 
Hadoop project is concerned: people should move to branch 3 if they want to run 
on a modern version of Java

It would be really really good if the published spark maven artefacts (a) 
included the spark-hadoop-cloud JAR and (b) were dependent upon hadoop 3.x. 
That way people doing things with their own projects will get up-to-date 
dependencies and don't get WONTFIX responses themselves.

-Steve

PS: Discussion on hadoop-dev @ making Hadoop 2.10 the official "last ever" 
branch-2 release and then declare its predecessors EOL; 2.10 will be the 
transition release.

On Sun, Nov 17, 2019 at 1:50 AM Cheng Lian 
mailto:lian.cs@gmail.com>> wrote:
Dongjoon, I didn't follow the original Hive 2.3 discussion closely. I thought 
the original proposal was to replace Hive 1.2 with Hive 2.3, which seemed 
risky, and therefore we only introduced Hive 2.3 under the hadoop-3.2 profile 
without removing Hive 1.2. But maybe I'm totally wrong here...

Sean, Yuming's PR https://github.com/apache/spark/pull/26533 showed that Hadoop 
2 + Hive 2 + JDK 11 looks promising. My major motivation is not about demand, 
but risk control: coupling Hive 2.3, Hadoop 3.2, and JDK 11 upgrade together 
looks too risky.

On Sat, Nov 16, 2019 at 4:03 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
I'd prefer simply not making Hadoop 3 the default until 3.1+, rather
than introduce yet another build combination. Does Hadoop 2 + Hive 2
work and is there demand for it?

On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
>
> Do we have a limitation on the number of pre-built distributions? Seems this 
> time we need
> 1. hadoop 2.7 + hive 1.2
> 2. hadoop 2.7 + hive 2.3
> 3. hadoop 3 + hive 2.3
>
> AFAIK we always built with JDK 8 (but make it JDK 11 compatible), so don't 
> need to add JDK version to the combination.
>
> On Sat, Nov 16, 2019 at 4:05 PM Dongjoon Hyun 
> mailto:dongjoon.h...@gmail.com>> wrote:
>>
>> Thank you for suggestion.
>>
>> Having `hive-2.3` profile sounds good to me because it's orthogonal to 
>> Hadoop 3.
>> IIRC, originally, it was proposed in that way, but we put it under 
>> `hadoop-3.2` to avoid adding new profiles at that time.
>>
>> And, I'm wondering if you are considering additional pre-built distribution 
>> and Jenkins jobs.
>>
>> Bests,
>> Dongjoon.
>>

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Felix Cheung

this is about test description and not test file name right?

if yes I don’t see a problem.

From: Hyukjin Kwon 
Sent: Thursday, November 14, 2019 6:03:02 PM
To: Shixiong(Ryan) Zhu 
Cc: dev ; Felix Cheung ; 
Shivaram Venkataraman 
Subject: Re: Adding JIRA ID as the prefix for the test case name

Yeah, sounds good to have it.

In case of R, it seems not quite common to write down JIRA ID [1] but looks 
some have the prefix in its test name in general.
In case of Python and Java, seems we time to time write a JIRA ID in the 
comment right under the test method [2][3].

Given this pattern, I would like to suggest use the same format but:

1. For Python and Java, write a single comment that starts with JIRA ID and 
short description, e.g. (SPARK-X: test blah blah)
2. For R, use JIRA ID as a prefix for its test name.

[1] git grep -r "SPARK-" -- '*test*.R'
[2] git grep -r "SPARK-" -- '*Suite.java'
[3] git grep -r "SPARK-" -- '*test*.py'

Does that make sense? Adding Felix and Shivaram too.

2019년 11월 15일 (금) 오전 3:13, Shixiong(Ryan) Zhu 
mailto:shixi...@databricks.com>>님이 작성:
Should we also add a guideline for non Scala tests? Other languages (Java, 
Python, R) don't support using string as a test name.

Best Regards,

Ryan

On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I opened a PR - https://github.com/apache/spark-website/pull/231

2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 
mailto:gurwls...@gmail.com>>님이 작성:
> In general a test should be self descriptive and I don't think we should be 
> adding JIRA ticket references wholesale. Any action that the reader has to 
> take to understand why a test was introduced is one too many. However in some 
> cases the thing we are trying to test is very subtle and in that case a 
> reference to a JIRA ticket might be useful, I do still feel that this should 
> be a backstop and that properly documenting your tests is a much better way 
> of dealing with this.

Yeah, the test should be self-descriptive. I don't think adding a JIRA prefix 
harms this point. Probably I should add this sentence in the guidelines as well.
Adding a JIRA prefix just adds one extra hint to track down details. I think 
it's fine to stick to this practice and make it simpler and clear to follow.

> 1. what if multiple JIRA IDs relating to the same test? we just take the very 
> first JIRA ID?
Ideally one JIRA should describe one issue and one PR should fix one JIRA with 
a dedicated test.
Yeah, I think I would take the very first JIRA ID.

> 2. are we going to have a full scan of all existing tests and attach a JIRA 
> ID to it?
Yea, let's don't do this.

> It's a nice-to-have, not super essential, just because ...
It's been asked multiple times and each committer seems having a different 
understanding on this.
It's not a biggie but wanted to make it clear and conclude this.

> I'd add this only when a test specifically targets a certain issue.
Yes, so this one I am not sure. From what I heard, people adds the JIRA in 
cases below:

- Whenever the JIRA type is a bug
- When a PR adds a couple of tests
- Only when a test specifically targets a certain issue.
- ...

Which one do we prefer and simpler to follow?

Or I can combine as below (im gonna reword when I actually document this):
1. In general, we should add a JIRA ID as prefix of a test when a PR targets to 
fix a specific issue.
In practice, it usually happens when a JIRA type is a bug or a PR adds a 
couple of tests.
2. Uses "SPARK-: test name" format

If we have no objection with ^, let me go with this.

2019년 11월 13일 (수) 오전 8:14, Sean Owen 
mailto:sro...@gmail.com>>님이 작성:
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases.
I'd add this only when a test specifically targets a certain issue.
It's a nice-to-have, not super essential, just because in the rare
case you need to understand why a test asserts something, you can go
back and find what added it in the git history without much trouble.

On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Hi all,
>
> Maybe it's not a big deal but it brought some confusions time to time into 
> Spark dev and community. I think it's time to discuss about when/which format 
> to add a JIRA ID as a prefix for the test case name in Scala test cases.
>
> Currently we have many test case names with prefixes as below:
>
> test("SPARK-X blah blah")
> test("SPARK-X: blah blah")
> test("SPARK-X - blah blah")
> test("[SPARK-X] blah blah")
> …
>
> It is a good practice to have the JIRA ID in general because, for instance,
> it makes us put less efforts to track commit histories (or even when the files
> are totally moved), or to track related

Re: [VOTE] Accept StreamPipes into the Apache Incubator

2019-11-07 Thread Felix Cheung

+1
Great project!

On Thu, Nov 7, 2019 at 11:40 AM Christofer Dutz 
wrote:

> A big +1 (binding) from me
>
> Chris
> 
> Von: Julian Feinauer 
> Gesendet: Donnerstag, 7. November 2019 20:19:43
> An: general@incubator.apache.org 
> Betreff: Re: [VOTE] Accept StreamPipes into the Apache Incubator
>
> Hi Dominik,
>
> I like StreamPipes, as you know, thus my vote is
>
> +1 (binding)
>
> Julian
>
> Am 07.11.19, 20:00 schrieb "Dominik Riemer" :
>
> Hi all,
>
> following up the [DISCUSS] thread on StreamPipes (
> https://lists.apache.org/thread.html/1cf79ef65888f695b4b925fd67ef8a2b845f6b0931c251a0ff1115e1@%3Cgeneral.incubator.apache.org%3E),
> I would like to call a VOTE to accept StreamPipes into the Apache Incubator.
>
> Please cast your vote:
>
>   [ ] +1, bring StreamPipes into the Incubator
>   [ ] +0, I don't care either way
>   [ ] -1, do not bring StreamPipes into the Incubator, because...
>
> The vote will open at least for 72 hours and only votes from the
> Incubator PMC are binding, but votes from everyone are welcome.
>
> Dominik
>
> 
> StreamPipes Proposal (
> https://cwiki.apache.org/confluence/display/INCUBATOR/StreamPipesProposal)
>
> == Abstract ==
> StreamPipes is a self-service (Industrial) IoT toolbox to enable
> non-technical users to connect, analyze and explore (Industrial) IoT data
> streams.
>
> = Proposal =
>
> The goal of StreamPipes (www.streampipes.org<
> http://www.streampipes.org>) is to provide an easy-to-use toolbox for
> non-technical users, e.g., domain experts, to exploit data streams coming
> from (Industrial) IoT devices. Such users are provided with an intuitive
> graphical user interface with the Pipeline Editor at its core. Users are
> able to graphically model processing pipelines based on data sources
> (streams), data processors and data sinks. Data processors and sinks are
> self-contained microservices, which implement either stateful or stateless
> processing logic (e.g., a trend detection or image classifier). Their
> processing logic is implemented using one of several provided wrappers (we
> currently have wrappers for standalone/Edge-based processing, Apache Flink,
> Siddhi and working wrapper prototypes for Apache Kafka Streams and Spark,
> in the future we also plan to integrate with Apache Beam). An SDK allows to
> easily create new pipeline elements. Pipeline elements can be installed at
> runtime. To support users in creating pipelines, an underlying
> semantics-based data model enables pipeline elements to express
> requirements on incoming data streams that need to be fulfilled, thus
> reducing modeling errors.
> Data streams are integrated by using StreamPipes Connect, which allows
> to connect data sources (based on standard protocols, such as MQTT, Kafka,
> Pulsar, OPC-UA and further PLC4X-supported protocols) without further
> programming using a graphical wizard. Additional user-faced modules of
> StreamPipes are a Live dashboard to quickly explore IoT data streams and a
> wizard that generates code templates for new pipeline elements, a Pipeline
> Element Installer used to extend the algorithm feature set at runtime.
>
> === Background ===
> StreamPipes was started in 2014 by researchers from FZI Research
> Center for Information Technology in Karlsruhe, Germany. The original
> prototype was funded by an EU project centered around predictive analytics
> for the manufacturing domain. Since then, StreamPipes was constantly
> improved and extended by public funding mainly from federal German
> ministries. In early 2018, the source code was officially released under
> the Apache License 2.0. At the same time, while we focused on bringing the
> research prototype to a production-grade tool, the first companies started
> to use StreamPipes. Currently, the primary goal is to widen the user and
> developer base. At ApacheCon NA 2019, after having talked to many people
> from the Apache Community, we finally decided that we would like to bring
> StreamPipes to the Apache Incubator.
>
> === Rationale ===
> The (Industrial) IoT domain is a highly relevant and emerging sector.
> Currently, IoT platforms are offered by many vendors ranging from SMEs up
> to large enterprises. We believe that open source alternatives are an
> important cornerstone for manufacturing companies to easily adopt
> data-driven decision making. From our point of view, StreamPipes fits very
> well into the existing (I)IoT ecosystem within the ASF, with projects such
> as Apache PLC4X focusing on connecting machine data from PLCs, or other
> tools we are also using either in the core of StreamPipes or with
> integrations (Apache Kafka, Apache IoTDB, Apache Pulsar). StreamPipes
> itself focuses on enabling self-service IoT data analytics for
> non-technical users.
> The whole StreamPipes code is currently on Github. To get a rough
> estimate of the project size:
> *

Re: Podling Pinot Report Reminder - November 2019

2019-10-28 Thread Felix Cheung

Hi,

Reminder the report is due shortly.

From: jmcl...@apache.org 
Sent: Sunday, October 20, 2019 5:27 AM
To: d...@pinot.incubator.apache.org
Subject: Podling Pinot Report Reminder - November 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 20 November 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, November 06).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/November2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Re: Intellij announce Big Data Tools – Integration with Zeppelin

2019-10-16 Thread Felix Cheung

Cool!


From: Jeff Zhang 
Sent: Wednesday, October 16, 2019 6:52:02 AM
To: users 
Subject: Intellij announce Big Data Tools – Integration with Zeppelin


Here's the related details

https://blog.jetbrains.com/scala/2019/10/16/meet-big-data-tools-spark-integration-and-zeppelin-notebooks-in-intellij-idea/

https://plugins.jetbrains.com/plugin/12494-big-data-tools?_ga=2.41180706.1434705875.1571232785-884153734.1558949232

--
Best Regards

Jeff Zhang

[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-12 Thread Felix Cheung (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-29042:
-
Labels: correctness  (was: )

> Sampling-based RDD with unordered input should be INDETERMINATE
> ---
>
> Key: SPARK-29042
> URL: https://issues.apache.org/jira/browse/SPARK-29042
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>  Labels: correctness
>
> We have found and fixed the correctness issue when RDD output is 
> INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is 
> order sensitive to its input. A sampling-based RDD with unordered input, 
> should be INDETERMINATE.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Felix Cheung

+1

From: Thomas graves 
Sent: Wednesday, September 4, 2019 7:24:26 AM
To: dev 
Subject: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration 
and scheduling

Hey everyone,

I'd like to call for a vote on SPARK-27495 SPIP: Support Stage level
resource configuration and scheduling

This is for supporting stage level resource configuration and
scheduling.  The basic idea is to allow the user to specify executor
and task resource requirements for each stage to allow the user to
control the resources required at a finer grain. One good example here
is doing some ETL to preprocess your data in one stage and then feed
that data into an ML algorithm (like tensorflow) that would run as a
separate stage.  The ETL could need totally different resource
requirements for the executors/tasks than the ML stage does.

The text for the SPIP is in the jira description:

https://issues.apache.org/jira/browse/SPARK-27495

I split the API and Design parts into a google doc that is linked to
from the jira.

This vote is open until next Fri (Sept 13th).

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...

I'll start with my +1

Thanks,
Tom

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-09-08 Thread Felix Cheung

I’d prefer strict mode and fail fast (analysis check)

Also I like what Alastair suggested about standard clarification.

I think we can re-visit this proposal and restart the vote

From: Ryan Blue 
Sent: Friday, September 6, 2019 5:28 PM
To: Alastair Green
Cc: Reynold Xin; Wenchen Fan; Spark dev list; Gengliang Wang
Subject: Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table 
insertion by default

We discussed this thread quite a bit in the DSv2 sync up and Russell brought up 
a really good point about this.

The ANSI rule used here specifies how to store a specific value, V, so this is 
a runtime rule — an earlier case covers when V is NULL, so it is definitely 
referring to a specific value. The rule requires that if the type doesn’t match 
or if the value cannot be truncated, an exception is thrown for “numeric value 
out of range”.

That runtime error guarantees that even though the cast is introduced at 
analysis time, unexpected NULL values aren’t inserted into a table in place of 
data values that are out of range. Unexpected NULL values are the problem that 
was concerning to many of us in the discussion thread, but it turns out that 
real ANSI behavior doesn’t have the problem. (In the sync, we validated this by 
checking Postgres and MySQL behavior, too.)

In Spark, the runtime check is a separate configuration property from this one, 
but in order to actually implement ANSI semantics, both need to be set. So I 
think it makes sense tochange both defaults to be ANSI. The analysis check 
alone does not implement the ANSI standard.

In the sync, we also agreed that it makes sense to be able to turn off the 
runtime check in order to avoid job failures. Another, safer way to avoid job 
failures is to require an explicit cast, i.e., strict mode.

I think that we should amend this proposal to change the default for both the 
runtime check and the analysis check to ANSI.

As this stands now, I vote -1. But I would support this if the vote were to set 
both runtime and analysis checks to ANSI mode.

rb

On Fri, Sep 6, 2019 at 3:12 AM Alastair Green 
 wrote:
Makes sense.

While the ISO SQL standard automatically becomes an American national  (ANSI) 
standard, changes are only made to the International (ISO/IEC) Standard, which 
is the authoritative specification.

These rules are specified in SQL/Foundation (ISO/IEC SQL Part 2), section 9.2.

Could we rename the proposed default to “ISO/IEC (ANSI)”?

— Alastair

On Thu, Sep 5, 2019 at 17:17, Reynold Xin 
mailto:r...@databricks.com>> wrote:

Having three modes is a lot. Why not just use ansi mode as default, and legacy 
for backward compatibility? Then over time there's only the ANSI mode, which is 
standard compliant and easy to understand. We also don't need to invent a 
standard just for Spark.

On Thu, Sep 05, 2019 at 12:27 AM, Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
+1

To be honest I don't like the legacy policy. It's too loose and easy for users 
to make mistakes, especially when Spark returns null if a function hit errors 
like overflow.

The strict policy is not good either. It's too strict and stops valid use cases 
like writing timestamp values to a date type column. Users do expect truncation 
to happen without adding cast manually in this case. It's also weird to use a 
spark specific policy that no other database is using.

The ANSI policy is better. It stops invalid use cases like writing string 
values to an int type column, while keeping valid use cases like timestamp -> 
date.

I think it's no doubt that we should use ANSI policy instead of legacy policy 
for v1 tables. Except for backward compatibility, ANSI policy is literally 
better than the legacy policy.

The v2 table is arguable here. Although the ANSI policy is better than strict 
policy to me, this is just the store assignment policy, which only partially 
controls the table insertion behavior. With Spark's "return null on error" 
behavior, the table insertion is more likely to insert invalid null values with 
the ANSI policy compared to the strict policy.

I think we should use ANSI policy by default for both v1 and v2 tables, because
1. End-users don't care how the table is implemented. Spark should provide 
consistent table insertion behavior between v1 and v2 tables.
2. Data Source V2 is unstable in Spark 2.x so there is no backward 
compatibility issue. That said, the baseline to judge which policy is better 
should be the table insertion behavior in Spark 2.x, which is the legacy policy 
+ "return null on error". ANSI policy is better than the baseline.
3. We expect more and more uses to migrate their data sources to the V2 API. 
The strict policy can be a stopper as it's a too big breaking change, which may 
break many existing queries.

Thanks,
Wenchen

On Wed, Sep 4, 2019 at 1:59 PM Gengliang Wang 
mailto:gengliang.w...@databricks.com>> wrote:

Hi everyone,

I'd like to call for a vote on

Re: maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Felix Cheung

(Hmm, what is spark-...@apache.org?)

From: Sean Owen 
Sent: Tuesday, September 3, 2019 11:58:30 AM
To: Xiao Li 
Cc: Tom Graves ; spark-...@apache.org 

Subject: Re: maven 3.6.1 removed from apache maven repo

It's because build/mvn only queries ASF mirrors, and they remove non-current 
releases from mirrors regularly (we do the same).
This may help avoid this in the future: 
https://github.com/apache/spark/pull/25667

On Tue, Sep 3, 2019 at 1:41 PM Xiao Li 
mailto:lix...@databricks.com>> wrote:
Hi, Tom,

To unblock the build, I merged the upgrade to master. 
https://github.com/apache/spark/pull/25665

Thanks!

Xiao

On Tue, Sep 3, 2019 at 10:58 AM Tom Graves  wrote:
It looks like maven 3.6.1 was removed from the repo - see SPARK-28960.  It 
looks like they pushed 3.6.2,  but I don't see any release notes on the maven 
page for it 3.6.2

Seems like we had this happen before, can't remember if it was maven or 
something else, anyone remember or know if they are about to release 3.6.2?

Tom

--
[Databricks Summit - Watch the 
talks]

Re: Podling Report Reminder - September 2019

2019-09-02 Thread Felix Cheung

Hello - reminder again. The report is due in 2 days

From: Felix Cheung 
Sent: Wednesday, August 28, 2019 12:07:53 PM
To: dev@crail.apache.org 
Subject: Re: Podling Report Reminder - September 2019

Hi - reminder of the report due within a week.

From: jmcl...@apache.org 
Sent: Thursday, August 22, 2019 6:29:04 PM
To: d...@crail.incubator.apache.org 
Subject: Podling Report Reminder - September 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 18 September 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, September 04).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/September2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling

2019-09-01 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920479#comment-16920479
 ] 

Felix Cheung commented on SPARK-27495:
--

+1 on this.

 

I've reviewed this. A few questions/comments:
 # in the description above there is a passage on "Spark internal use by 
catalyst" - looking at the rest of the material, google doc etc, is this out of 
scope? if so we should clarify.
 # "different resources in multiple RDDs that get combined into a single stage" 
- this merge can be complicated, and I'm not sure taking the max etc is going 
to be right at all time. At the least it will be very confusing to the user on 
how much resource is used etc. Instead of a heuristic, the max etc, how about 
in the event of mismatch involving multiple RDDs, we detect and fail (fail 
fast) and ask the user to do a "repartition" operation before that stage?
 # in later comment, "resource requirement as a hint" - I am actually unsure 
about that. in many ML or DL/tensorflow use cases where MPI or allreduce are 
involved, the strict number of GPU, process, machine are required or else they 
fail to start. I am in favor of a strict mode for that purpose.

> SPIP: Support Stage level resource configuration and scheduling
> ---
>
> Key: SPARK-27495
> URL: https://issues.apache.org/jira/browse/SPARK-27495
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> *Q1.* What are you trying to do? Articulate your objectives using absolutely 
> no jargon.
> Objectives:
>  # Allow users to specify task and executor resource requirements at the 
> stage level. 
>  # Spark will use the stage level requirements to acquire the necessary 
> resources/executors and schedule tasks based on the per stage requirements.
> Many times users have different resource requirements for different stages of 
> their application so they want to be able to configure resources at the stage 
> level. For instance, you have a single job that has 2 stages. The first stage 
> does some  ETL which requires a lot of tasks, each with a small amount of 
> memory and 1 core each. Then you have a second stage where you feed that ETL 
> data into an ML algorithm. The second stage only requires a few executors but 
> each executor needs a lot of memory, GPUs, and many cores.  This feature 
> allows the user to specify the task and executor resource requirements for 
> the ETL Stage and then change them for the ML stage of the job.  
> Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and 
> extra Resources (GPU/FPGA/etc). It has the potential to allow for other 
> things like limiting the number of tasks per stage, specifying other 
> parameters for things like shuffle, etc. Initially I would propose we only 
> support resources as they are now. So Task resources would be cpu and other 
> resources (GPU, FPGA), that way we aren't adding in extra scheduling things 
> at this point.  Executor resources would be cpu, memory, and extra 
> resources(GPU,FPGA, etc). Changing the executor resources will rely on 
> dynamic allocation being enabled.
> Main use cases:
>  # ML use case where user does ETL and feeds it into an ML algorithm where 
> it’s using the RDD API. This should work with barrier scheduling as well once 
> it supports dynamic allocation.
>  # Spark internal use by catalyst. Catalyst could control the stage level 
> resources as it finds the need to change it between stages for different 
> optimizations. For instance, with the new columnar plugin to the query 
> planner we can insert stages into the plan that would change running 
> something on the CPU in row format to running it on the GPU in columnar 
> format. This API would allow the planner to make sure the stages that run on 
> the GPU get the corresponding GPU resources it needs to run. Another possible 
> use case for catalyst is that it would allow catalyst to add in more 
> optimizations to where the user doesn’t need to configure container sizes at 
> all. If the optimizer/planner can handle that for the user, everyone wins.
> This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I 
> think the DataSet API will require more changes because it specifically hides 
> the RDD from the users via the plans and catalyst can optimize the plan and 
> insert things into the plan. The only way I’ve found to make this work with 
> the Dataset API would be modifying all the plans to be able

[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920476#comment-16920476
 ] 

Felix Cheung commented on SPARK-28594:
--

Reviewed. looks reasonable to me. I can help shepherd this work.

 

ping [~srowen] [~vanzin] [~irashid] for feedback.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Felix Cheung (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-28594:
-
Shepherd: Felix Cheung

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Design review of SPARK-28594

2019-09-01 Thread Felix Cheung

I did review it and solving this problem makes sense. I will comment in the 
JIRA.

From: Jungtaek Lim 
Sent: Sunday, August 25, 2019 3:34:22 PM
To: dev 
Subject: Design review of SPARK-28594

Hi devs,

I have been working on designing SPARK-28594 [1] (though I've started with this 
via different requests) and design doc is now available [2].

Let me describe SPARK-28954 briefly - single and growing event log file for 
application has been major issue for streaming application as as long as event 
log just grows while the application is running, and lots of issues occur from 
there. The only viable workaround has been disabling event log which is not 
easily acceptable. Maybe stopping the application and rerunning would be 
another approach but it sounds really odd to stop the application due to event 
log. SPARK-28594 enables the way to roll the event log files, with compacting 
old event log files without losing the ability to replay whole logs.

While I'll break down issue into subtask and start from easier one, in parallel 
I'd like to ask for reviewing on the design to get better idea and find 
possible defects of design.

Please note that the doc is intended to describe the detailed changes (closer 
to the implementation details) and is not a kind of SPIP because I wouldn't 
feel going through SPIP process for this improvement - the change would be 
rather not huge and the proposal works orthogonal to current feature. Please 
let me know if it's not the case and SPIP process is necessary.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-28594
2. 
https://docs.google.com/document/d/12bdCC4nA58uveRxpeo8k7kGOI2NRTXmXyBOweSi4YcY/edit?usp=sharing

Re: Zeppelin Studio Proposal

2019-09-01 Thread Felix Cheung

This is cool!


From: CHALLA 
Sent: Wednesday, August 28, 2019 4:39:46 AM
To: users@zeppelin.apache.org 
Subject: Re: Zeppelin Studio Proposal

 great.

On Wed, Aug 28, 2019 at 4:04 PM Ivan Shapovalov 
mailto:shapovalov.iva...@gmail.com>> wrote:
Hey All,

The idea of refactoring UI sounds great!
Ever considered theia (https://github.com/theia-ide/theia)? May save a couple 
of ages for community.

Regards,
Ivan

ср, 28 авг. 2019 г. в 11:31, Jongyoul Lee 
mailto:jongy...@gmail.com>>:
Sounds great!!

On Tue, Aug 27, 2019 at 1:31 AM ieglonewolf ieglonewolf 
mailto:ieglonew...@gmail.com>> wrote:
I would like to add

Our key motive for us to start ZEPPELIN-4138
 here is to build a UI
system on top of zeppelin service which is intuitive and very much
functional.
Moreover doing it the right way is the key to scalability. I would like to
request each and every member of this community to help us develop this.

Please provide your valuable feedback

Thanks folks!

On Mon, Aug 26, 2019 at 9:24 PM Xun Liu 
mailto:neliu...@163.com>> wrote:

> Zeppelin is very much in need of a front end developed using VUE.js
> front-end technology.
> Thank you for your contribution. :-)
>
> Xun Liu
> Best Regards
>
> On Aug 26, 2019, at 9:21 PM, Jeff Zhang 
> mailto:zjf...@gmail.com>> wrote:
>
> + user mail list
>
>
> Thanks Malay for the proposal. The zeppelin frontend do needs some rework.
>
> Overall, your proposal make sense to me. User experience and performance
> are the 2 key things we need to improve in the frontend.
> I left some comments in the design doc.
>
>
> Malay Majithia mailto:malay.majit...@gmail.com>> 
> 于2019年8月23日周五 下午6:10写道：
>
>> Hey Folks,
>>
>> Regarding ZEPPELIN-4138
>> , we have come up
>> with the design document(draft) and the task list for the same:
>>
>>
>>  Zeppelin Studio - Design Document
>> 
>>
>>  Zeppelin Studio - Task list
>> 
>>
>> POC code is available on GitHub
>> .
>>
>> Sneak peek of the proposed interface:
>> 
>>
>>
>> Please review it and provide your valuable feedback.
>>
>> Best Regards
>> Malay Majithia
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


--
Ivan Shapovalov
Kharkov, Ukraine

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-30 Thread Felix Cheung

+1

Run tests, R tests, r-hub Debian, Ubuntu, mac, Windows


From: Hyukjin Kwon 
Sent: Wednesday, August 28, 2019 9:14 PM
To: Takeshi Yamamuro
Cc: dev; Dongjoon Hyun
Subject: Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

+1 (from the last blocker PR)

2019년 8월 29일 (목) 오전 8:20, Takeshi Yamamuro 
mailto:linguin@gmail.com>>님이 작성:
I checked the tests passed again on the same env.
It looks ok.


On Thu, Aug 29, 2019 at 6:15 AM Marcelo Vanzin  
wrote:
+1

On Tue, Aug 27, 2019 at 4:06 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.4.
>
> The vote is open until August 30th 5PM PST and passes if a majority +1 PMC 
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.4
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.4-rc3 (commit 
> 7955b3962ac46b89564e0613db7bea98a1478bf2):
> https://github.com/apache/spark/tree/v2.4.4-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.4-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1332/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.4-rc3-docs/
>
> The list of bug fixes going into 2.4.4 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12345466
>
> This release is using the release script of the tag v2.4.4-rc3.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.4?
> ===
>
> The current list of open tickets targeted at 2.4.4 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.4
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



--
Marcelo

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



--
---
Takeshi Yamamuro

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung

That’s great!

From: ☼ R Nair 
Sent: Saturday, August 24, 2019 10:57:31 AM
To: Dongjoon Hyun 
Cc: d...@spark.apache.org ; user @spark/'user 
@spark'/spark users/user@spark 
Subject: Re: JDK11 Support in Apache Spark

Finally!!! Congrats

On Sat, Aug 24, 2019, 11:11 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

Thanks to your many many contributions,
Apache Spark master branch starts to pass on JDK11 as of today.
(with `hadoop-3.2` profile: Apache Hadoop 3.2 and Hive 2.3.6)

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/326/
(JDK11 is used for building and testing.)

We already verified all UTs (including PySpark/SparkR) before.

Please feel free to use JDK11 in order to build/test/run `master` branch and
share your experience including any issues. It will help Apache Spark 3.0.0 
release.

For the follow-ups, please follow 
https://issues.apache.org/jira/browse/SPARK-24417 .
The next step is `how to support JDK8/JDK11 together in a single artifact`.

Bests,
Dongjoon.

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung

That’s great!

From: ☼ R Nair 
Sent: Saturday, August 24, 2019 10:57:31 AM
To: Dongjoon Hyun 
Cc: dev@spark.apache.org ; user @spark/'user 
@spark'/spark users/user@spark 
Subject: Re: JDK11 Support in Apache Spark

Finally!!! Congrats

On Sat, Aug 24, 2019, 11:11 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

Thanks to your many many contributions,
Apache Spark master branch starts to pass on JDK11 as of today.
(with `hadoop-3.2` profile: Apache Hadoop 3.2 and Hive 2.3.6)

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/326/
(JDK11 is used for building and testing.)

We already verified all UTs (including PySpark/SparkR) before.

Please feel free to use JDK11 in order to build/test/run `master` branch and
share your experience including any issues. It will help Apache Spark 3.0.0 
release.

For the follow-ups, please follow 
https://issues.apache.org/jira/browse/SPARK-24417 .
The next step is `how to support JDK8/JDK11 together in a single artifact`.

Bests,
Dongjoon.

Re: [VOTE] Accept DolphinScheduler(was EasyScheduler) into Apache Incubator

2019-08-23 Thread Felix Cheung

+1

On Fri, Aug 23, 2019 at 8:11 AM ShaoFeng Shi  wrote:

> +1 (binding)
>
> I believe the DolphinScheduler project will bring value to ASF. The team is
> very open and the community is already very active. Glad to see it to join
> the incubator.
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofeng...@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscr...@kylin.apache.org
> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>
>
>
>
> Furkan KAMACI  于2019年8月23日周五 下午5:32写道：
>
> > Hi,
> >
> > +1!
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > 23 Ağu 2019 Cum, saat 12:25 tarihinde Sheng Wu <
> wu.sheng.841...@gmail.com>
> > şunu yazdı:
> >
> > > Julian Feinauer  于2019年8月23日周五 下午5:20写道：
> > >
> > > > Hi,
> > > >
> > > > Your proposal looks good and the initiual PPMC already looks
> 'diverse'.
> > > > Furthermore, it seems like you have a good mentoring team on board.
> > > >
> > > > One 'minor' concern is that I think it is best to use Apaches Infra
> for
> > > CI
> > > > and Issue tracking.
> > > > Which I would greatly prefer over using Github issues.
> > > >
> > >
> > > Hi Julian
> > >
> > > Thanks for your supports.
> > >
> > > In the proposal, Jenkins means Apache INFRA Jenkins. I just changed the
> > > proposal text to `Apache Jenkins`.
> > >
> > > I think GitHub Issue tracker is an open option, as many ASF projects
> are
> > > using it already, and GitHub issue notifications have been achieved in
> > the
> > > mail list.
> > > Due to the team requires to use that, I think should be OK.
> > >
> > > Sheng Wu 吴晟
> > >
> > > Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
> > > Twitter, wusheng1108
> > >
> > >
> > >
> > > >
> > > > But overall, a clear +1 (binding) from my side.
> > > >
> > > > Julian
> > > >
> > > > Am 23.08.19, 11:14 schrieb "Kevin Ratnasekera" <
> > djkevincr1...@gmail.com
> > > >:
> > > >
> > > > +1
> > > >
> > > > On Fri, Aug 23, 2019 at 7:09 AM Sheng Wu 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > After the discussion of DolphinScheduler(was EasyScheduler)
> > > proposal
> > > > > (discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d3ac53bddf91391e54f63d042a0b3d60f2aecfbb99780bcc00b4db6e@%3Cgeneral.incubator.apache.org%3E
> > > > > ),
> > > > > I would like to call a VOTE to accept it into the Apache
> > Incubator.
> > > > >
> > > > > Please cast your vote:
> > > > >
> > > > >   [ ] +1, bring DolphinScheduler into Incubator
> > > > >   [ ] +0, I don't care either way
> > > > >   [ ] -1, do not bring DolphinScheduler into Incubator,
> > because...
> > > > >
> > > > > The vote will open at least for 72 hours and only votes from
> the
> > > > Incubator
> > > > > PMC are binding.
> > > > >
> > > > > ==
> > > > > Abstract
> > > > >
> > > > > DolphinScheduler is a distributed ETL scheduling engine with
> > > > powerful DAG
> > > > > visualization interface. DolphinScheduler focuses on solving
> the
> > > > problem of
> > > > > 'complex task dependencies & triggers' in data processing. Just
> > > like
> > > > its
> > > > > name, we dedicated to making the scheduling system out of the
> > box.
> > > > >
> > > > > *Current project name of DolphinScheduler is EasyScheduler,
> will
> > > > change it
> > > > > after it is accepted by Incubator.*
> > > > > Proposal
> > > > >
> > > > > DolphinScheduler provides many easy-to-use features to
> accelerate
> > > > > the engineering efficiency on data ETL workflow job. We
> propose a
> > > new
> > > > > concept of 'instance of process' and 'instance of task' to let
> > > > developers
> > > > > to tuning their jobs on the running state of workflow instead
> of
> > > > changing
> > > > > the task's template. Its main objectives are as follows:
> > > > >
> > > > >- Define the complex tasks' dependencies & triggers in a DAG
> > > > graph by
> > > > >dragging and dropping.
> > > > >- Support cluster HA.
> > > > >- Support multi-tenant and parallel or serial backfilling
> > data.
> > > > >- Support automatical failure job retry and recovery.
> > > > >- Support many data task types and process priority, task
> > > > priority and
> > > > >relative task timeout alarm.
> > > > >
> > > > > For now, DolphinScheduler has a fairly huge community in China.
> > It
> > > > is also
> > > > > widely adopted by many companies and organizations
> > > > >  as its
> ETL
> > > > > scheduling
> > > > > tool.
> > > > >
> > > > > We believe that bringing DolphinScheduler into ASF could
> advance
> > > > > development of a much more stronger and more diverse open
> source
> > > > community.
> > > > >
> >

Re: [DISCUSS] IPMC votes on releases

2019-08-09 Thread Felix Cheung

option (D) combined with (E)
And encourage mentors to vote on dev@ makes sense to me


On Fri, Aug 9, 2019 at 3:24 PM Justin Mclean  wrote:

> Hi,
>
> > (D) will still require 2 more IPMC vote?
> > (E) will be like (B) in that it will need mentors or other IPMC to vote
> in
> > podling dev@?
>
> All releases require 3 (or more) +1 votes by a PMC so yes they would
> require 3 IPMC votes.
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [DISCUSS] IPMC votes on releases

2019-08-09 Thread Felix Cheung

(D) will still require 2 more IPMC vote?
(E) will be like (B) in that it will need mentors or other IPMC to vote in
podling dev@?


On Thu, Aug 8, 2019 at 10:04 PM Justin Mclean 
wrote:

> Hi,
>
> One of the incubator pain points is the double voting on releases first by
> the podling and then by the IPMC.
>
> Historically there been a lot of discussion about this and a couple of
> proposals to try and change it, but none have been accepted. There have
> been proposals on alternative ways to vote and to ask for guidances which
> have been accepted, but podlings don’t seem to take these options up. I’m
> hoping the recent DISCLAIMER-WIP is one that is used more by podlings, and
> go some way to helping podling get releases out, but time will tell.
>
> When consider what to do about this, please keep this in mind:
> 1. Only PMC members can have binding votes on releases (it’s in our
> bylaws) so a minimum of 3 IPMC votes are require to make a release.
> 2. Podlings are not TLP and don’t have a PMC and PPMC members votes are
> not binding on releases.
> 3. The IPMC picks up some serious issues in (about 1 in 5) releases by
> checking this way. This is mostly, but not always, on the early releases.
>
> So option (A) would be to get the Bylaws changed and treat podlings as
> TLPs.
>
> Another option (B) is to get all mentors to vote on every release. We’ve
> tried this via various means and it seems only a couple of podlings can
> manage this.
>
> One (perhaps not carefully considered) option (C) would be to vote in all
> PPMC members as IPMC and make PPMC members IPMC members when projects are
> first created rather than incubator committers. If we did this we could
> optionally gate graduation on a review of a podlings releases but that may
> be unpopular. There have also been complaints in the past that he IPMC is
> too large, so increasing the IPMC size this way may also not be popular.
>
> A variation on (C) let call it option (D) would be to vote in podling
> release mangers into the IPMC after they have done a number of releases
> along with podling committers that provide good feedback on a number of
> release candidates. That way when starting out a podling is likely to need
> the IPMC help, but one they have a few releases under their belts they will
> have enough IPMC votes without having to reply on mentors or other IPMC
> members. It would also encourage more careful voting on releases, If you
> just go +1 with giving some detail you’re not going to be voted into the
> IPMC. This wouldn't require any bylaws or policy change, we could just go
> ahead and do it. It would require the mentors help in identifying good
> candidates.
>
> One further idea I have is (E) is that if a podling does have 3 IPMC votes
> on it dev list and are using the DISCLAIMER-WIP disclaimer, they can just
> notify the IPMC that they are making a release, the IPMC can review it and
> and any issues or feedback found can incorporated into the next release or
> before graduation as per [1]. This may mean that there’s a risk that a
> release has to be taken down and redone - (see issues that are blockers in
> that ticket), but most issues found over the notification it would be
> business as usual.
>
> So IMO options (A) and (C) above seem unlikely to happen, and (B) isn’t
> really working, but option (D) combined with (E) along with the recent
> DISCLAIMER-WIP I think could would improve the situation.
>
> Does anyone have any other ideas they care to share?
>
> Thanks,
> Justin
>
> 1. https://issues.apache.org/jira/projects/LEGAL/issues/LEGAL-469
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: Fwd: Incubator report timeline August 2019

2019-08-07 Thread Felix Cheung

Subbu,

A healthy community growth would be a big factor.

May I suggest you to check out the Maturity model - please note it is a 
guideline not a checklist or rule

https://community.apache.org/apache-way/apache-project-maturity-model.html




From: Subbu Subramaniam 
Sent: Wednesday, August 7, 2019 9:19 AM
To: dev@pinot.apache.org
Subject: Re: Fwd: Incubator report timeline August 2019

Thank you, Justin and Felix, for your guidance and suggestions. I have added 
more material as suggested, do take a look.

On that note, I would also like to know from your side what may be the factors 
that can help us graduate sooner.

We expect to add more committers next few months.

thanks

-Subbu

From: Subbu Subramaniam 
Sent: Wednesday, August 7, 2019 8:54 AM
To: dev@pinot.apache.org 
Subject: Re: Fwd: Incubator report timeline August 2019

Hi Justin,

Thanks for your suggestion.

Really, we believe the only thing that is blocking us from graduation is the 
releases. And, we have been blocked by helix bugs in that regard. I would like 
your suggestions on anything else that we may be unaware of, in terms of 
getting to graduation.

The open source contributions have come in well, and we expect more in the next 
few months.

I thought I made a note that we have not added any PPMC member, but I will 
double-check. Btw, is that one of the things we need to do for graduation? (add 
more members?)

thanks

-Subbu

From: Justin Mclean 
Sent: Wednesday, August 7, 2019 12:33 AM
To: dev@pinot.apache.org 
Subject: Re: Fwd: Incubator report timeline August 2019

Hi,

The report is a bit minimal, is there any possibly you can expand on the points 
there? In particular it would be good to list three things that are needed to 
be done before you graduate and list the date the last committer or PPMC member 
was added.

Thanks,
Justin

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Re: Fwd: Incubator report timeline August 2019

2019-08-07 Thread Felix Cheung

Subbu, can you add a summary on community interaction numbers (dev@ threads, 
PRs)?

From: Subbu Subramaniam 
Sent: Tuesday, August 6, 2019 8:30:49 PM
To: dev@pinot.apache.org ; Verandah2 . 

Subject: Re: Fwd: Incubator report timeline August 2019

Hi Justin

I had updated the podling report in this page a few days back.

https://cwiki.apache.org/confluence/display/INCUBATOR/August2019#pinot

Is there anything else needed?

thanks

-Subbu

From: Justin Mclean 
Sent: Tuesday, August 6, 2019 8:04 PM
To: dev@pinot.apache.org 
Subject: Re: Fwd: Incubator report timeline August 2019

Hi,

Just a friendly reminder that the report is due in a day, is anyone working on 
it?

Thanks,
Justin

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Re: [MENTORS] IPMC Policy change work in progress disclaimer

2019-08-02 Thread Felix Cheung

+1


On Thu, Aug 1, 2019 at 11:17 PM Paul King  wrote:

> Kudos for progressing this idea. I really like it. It should allow
> improvements from the perspective of both podlings and reviewers.
>
> Cheers, Paul.
>
> On Fri, Aug 2, 2019 at 12:25 PM Justin Mclean 
> wrote:
>
> > Hi,
> >
> > The work in progress progress disclaimer (DISCLAIMER-WIP) has been added
> > to the IPMC policy page. [1]
> >
> > Podlings can now select which disclaimer they want to use. If they want
> to
> > use the standard disclaimer then their releases must comply with all ASF
> > policy. If they want the incubator to be more lenient in voting on their
> > release or know that the release has some issues not in line with ASF
> > policy then the work in progress disclaimer can be used.
> >
> > Unless the release is simple and straightforward, I would recommend that
> a
> > podling uses the work in progress disclaimer for the first couple of
> > releases.
> >
> > For what is allowed under the work in progress disclaimer please see this
> > legal JIRA [2]. This allows a lot more in a release but not everything.
> > Certain things are required like having a LICENSE, NOTICE and
> > DISCLAIMER-WIP, and while it would be OK to include compiled code you
> still
> > need to comply with any licensing for that code.
> >
> > By the time a podling graduates it's expected that they are making
> > releases with the standard disclaimer.
> >
> > Here is the text of the DISCLAIMER-WIP where the Incubator is the
> sponsor:
> > 
> > Apache  #Podling-Name# is an effort
> > undergoing incubation at The Apache Software Foundation (ASF),
> > sponsored by the Apache Incubator. Incubation is required of all
> > newly accepted projects until a further review indicates that the
> > infrastructure, communications, and decision making process have
> > stabilized in a manner consistent with other successful ASF projects.
> > While incubation status is not necessarily a reflection of the
> > completeness or stability of the code, it does indicate that the
> > project has yet to be fully endorsed by the ASF.
> >
> > Some of the incubating project's releases may not be fully compliant
> > with ASF policy. For example, releases may have incomplete or
> > un-reviewed licensing conditions. What follows is a list of known
> > issues the project is currently aware of (note that this list, by
> > definition, is likely to be incomplete):
> > #List of known issues go here#
> >
> > If you are planning to incorporate this work into your
> > product/project, please be aware that you will need to conduct a
> > thorough licensing review to determine the overall implications of
> > including this work. For the current status of this project through the
> > Apache
> > Incubator visit:
> > http://incubator.apache.org/project/#Podling-Name#.html
> > 
> >
> > Just fill in #Podling-Name# with your podling name and list the known
> > issues in the correct place.
> >
> > Thanks,
> > Justin
> >
> > 1. https://incubator.apache.org/policy/incubation.html#disclaimers
> > 2. https://issues.apache.org/jira/projects/LEGAL/issues/LEGAL-469
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>

Re: Incubator policy wording

2019-07-31 Thread Felix Cheung

I think rewording the termination spew could help as there has been
feedback on incubator coming across as “unwelcoming” etc.

Just my 2c

On Wed, Jul 31, 2019 at 2:22 PM Justin Mclean 
wrote:

> Hi,
>
> Thanks for the feedback.
>
> > 1. Should we move disclaimer and release up instead of being the last?
> > Seems like good to be upfront with these
>
> For now I’ve just left thing in the order they were.
>
> > 2. About “It MAY consider the termination of a Podling if violations are
> > not corrected.“
>
> It practice this is probably not going to happen and there are going to be
> conversations with the podling to correct things way before we consider
> this. Each situation is going to be different so I don’t think we can put a
> general time frame on it. In general it needs to happen before graduation,
> and a s long as some progress is being made, all is fine. Also no timeline
> was specified before so adding one would change policy, which I think would
> need further discussion.
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: Incubator policy wording

2019-07-31 Thread Felix Cheung

Hi Justin, two thoughts

1. Should we move disclaimer and release up instead of being the last?
Seems like good to be upfront with these

2. About “It MAY consider the termination of a Podling if violations are
not corrected.“
Should we put a timeframe on the correction? It seems like violations in
some cases do not prevent continuation of the incubation (eg so long as
they are corrected before graduation etc)

Thanks!

On Wed, Jul 31, 2019 at 12:25 AM Justin Mclean 
wrote:

> Hi,
>
> Anyone have any other feedback or shod I just commit the changes?
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-16 Thread Felix Cheung

Not currently in Spark.

However, there are systems out there that can share DataFrame between languages 
on top of Spark - it’s not calling the python UDF directly but you can pass the 
DataFrame to python and then .map(UDF) that way.



From: Fiske, Danny 
Sent: Monday, July 15, 2019 6:58:32 AM
To: user@spark.apache.org
Subject: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a 
SparkR DataFrame?

Hi all,

Forgive this naïveté, I’m looking for reassurance from some experts!

In the past we created a tailored Spark library for our organisation, 
implementing Spark functions in Scala with Python and R “wrappers” on top, but 
the focus on Scala has alienated our analysts/statisticians/data scientists and 
collaboration is important for us (yeah… we’re aware that your SDKs are very 
similar across languages… :/ ). We’d like to see if we could forego the Scala 
facet in order to present the source code in a language more familiar to users 
and internal contributors.

We’d ideally write our functions with PySpark and potentially create a SparkR 
“wrapper” over the top, leading to the question:

Given a function written with PySpark that accepts a DataFrame parameter, is 
there a way to invoke this function using a SparkR DataFrame?

Is there any reason to pursue this? Is it even possible?

Many thanks,

Danny

For the latest data on the economy and society, consult our website at 
http://www.ons.gov.uk

***
Please Note:  Incoming and outgoing email messages are routinely monitored for 
compliance with our policy on the use of electronic communications

***

Legal Disclaimer:  Any views expressed by the sender of this message are not 
necessarily those of the Office for National Statistics
***

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Felix Cheung

+1

Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.

From: Yinan Li 
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue  wrote:
+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Bests,
Dongjoon.

On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
+1 (binding)

Thanks
Saisai

Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道：
+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
mailto:i...@cornell.edu>> wrote:
+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It is also worth to note the minimal changes to 
Spark core in order to make it work. This is a very much needed addition within 
the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang 
mailto:bobyan...@gmail.com>> wrote:
+1 This is great work, allowing plugin of different sort shuffle write/read 
implementation! Also great to see it retain the current Spark configuration 
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).

On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
mailto:mch...@palantir.com>> wrote:
Hi everyone,

I would like to call a vote for the SPIP for 
SPARK-25299, which proposes 
to introduce a pluggable storage API for temporary shuffle data.

You may find the SPIP document 
here.

The discussion thread for the SPIP was conducted 
here.

Please vote on whether or not this proposal is agreeable to you.

Thanks!

-Matt Cheah

--
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung

How about pyArrow?

From: Holden Karau 
Sent: Friday, June 14, 2019 11:06:15 AM
To: Felix Cheung
Cc: Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp
Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas

Are there other Python dependencies we should consider upgrading at the same 
time?

On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
So to be clear, min version check is 0.23
Jenkins test is 0.24

I’m ok with this. I hope someone will test 0.23 on releases though before we 
sign off?
We should maybe add this to the release instruction notes?

From: shane knapp mailto:skn...@berkeley.edu>>
Sent: Friday, June 14, 2019 10:23:56 AM
To: Bryan Cutler
Cc: Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev
Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas

excellent.  i shall not touch anything.  :)

On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Shane, I think 0.24.2 is probably more common right now, so if we were to pick 
one to test against, I still think it should be that one. Our Pandas usage in 
PySpark is pretty conservative, so it's pretty unlikely that we will add 
something that would break 0.23.X.

On Fri, Jun 14, 2019 at 10:10 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
ah, ok...  should we downgrade the testing env on jenkins then?  any specific 
version?

shane, who is loathe (and i mean LOATHE) to touch python envs ;)

On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
I should have stated this earlier, but when the user does something that 
requires Pandas, the minimum version is checked against what was imported and 
will raise an exception if it is a lower version. So I'm concerned that using 
0.24.2 might be a little too new for users running older clusters. To give some 
release dates, 0.23.2 was released about a year ago, 0.24.0 in January and 
0.24.2 in March.
I think given that we’re switching to requiring Python 3 and also a bit of a 
way from cutting a release 0.24 could be Ok as a min version requirement

On Fri, Jun 14, 2019 at 9:27 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
just to everyone knows, our python 3.6 testing infra is currently on 0.24.2...

On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thank you for this effort, Bryan!

Bests,
Dongjoon.

On Fri, Jun 14, 2019 at 4:24 AM Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
I’m +1 for upgrading, although since this is probably the last easy chance 
we’ll have to bump version numbers easily I’d suggest 0.24.2

On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and pandas 
combinations. Spark 3 should be good time to increase.

2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 
mailto:cutl...@gmail.com>>님이 작성:
Hi All,

We would like to discuss increasing the minimum supported version of Pandas in 
Spark, which is currently 0.19.2.

Pandas 0.19.2 was released nearly 3 years ago and there are some workarounds in 
PySpark that could be removed if such an old version is not required. This will 
help to keep code clean and reduce maintenance effort.

The change is targeted for Spark 3.0.0 release, see 
https://issues.apache.org/jira/browse/SPARK-28041. The current thought is to 
bump the version to 0.23.2, but we would like to discuss before making a 
change. Does anyone else have thoughts on this?

Regards,
Bryan
--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu
--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung

So to be clear, min version check is 0.23
Jenkins test is 0.24

I’m ok with this. I hope someone will test 0.23 on releases though before we 
sign off?

From: shane knapp 
Sent: Friday, June 14, 2019 10:23:56 AM
To: Bryan Cutler
Cc: Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev
Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas

excellent.  i shall not touch anything.  :)

On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Shane, I think 0.24.2 is probably more common right now, so if we were to pick 
one to test against, I still think it should be that one. Our Pandas usage in 
PySpark is pretty conservative, so it's pretty unlikely that we will add 
something that would break 0.23.X.

On Fri, Jun 14, 2019 at 10:10 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
ah, ok...  should we downgrade the testing env on jenkins then?  any specific 
version?

shane, who is loathe (and i mean LOATHE) to touch python envs ;)

On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
I should have stated this earlier, but when the user does something that 
requires Pandas, the minimum version is checked against what was imported and 
will raise an exception if it is a lower version. So I'm concerned that using 
0.24.2 might be a little too new for users running older clusters. To give some 
release dates, 0.23.2 was released about a year ago, 0.24.0 in January and 
0.24.2 in March.

On Fri, Jun 14, 2019 at 9:27 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
just to everyone knows, our python 3.6 testing infra is currently on 0.24.2...

On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thank you for this effort, Bryan!

Bests,
Dongjoon.

On Fri, Jun 14, 2019 at 4:24 AM Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
I’m +1 for upgrading, although since this is probably the last easy chance 
we’ll have to bump version numbers easily I’d suggest 0.24.2

On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and pandas 
combinations. Spark 3 should be good time to increase.

2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 
mailto:cutl...@gmail.com>>님이 작성:
Hi All,

We would like to discuss increasing the minimum supported version of Pandas in 
Spark, which is currently 0.19.2.

Pandas 0.19.2 was released nearly 3 years ago and there are some workarounds in 
PySpark that could be removed if such an old version is not required. This will 
help to keep code clean and reduce maintenance effort.

The change is targeted for Spark 3.0.0 release, see 
https://issues.apache.org/jira/browse/SPARK-28041. The current thought is to 
bump the version to 0.23.2, but we would like to discuss before making a 
change. Does anyone else have thoughts on this?

Regards,
Bryan
--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 

YouTube Live Streams: https://www.youtube.com/user/holdenkarau

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 2450 matches

Mail list logo