I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and
pandas combinations. Spark 3 should be good time to increase.
2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 님이 작성:
> Hi All,
>
> We would like to discuss increasing the minimum supported version of
> Pandas in Spark, which is
Hi All,
We would like to discuss increasing the minimum supported version of Pandas
in Spark, which is currently 0.19.2.
Pandas 0.19.2 was released nearly 3 years ago and there are some
workarounds in PySpark that could be removed if such an old version is not
required. This will help to keep
Hi everyone,
I would like to call a vote for the SPIP for SPARK-25299, which proposes to
introduce a pluggable storage API for temporary shuffle data.
You may find the SPIP document here.
The discussion thread for the SPIP was conducted here.
Please vote on whether or not this
Thank you for the feedbacks and requirements, Hyukjin, Reynold, Marco.
Sure, we can do whatever we want.
I'll wait for more feedbacks and proceed to the next steps.
Bests,
Dongjoon.
On Wed, Jun 12, 2019 at 11:51 PM Marco Gaido wrote:
> Hi Dongjoon,
> Thanks for the proposal! I like the
If you control the codebase, you control when an RDD goes out of scope. Or
am I missing something?
(Note that finalize will not necessarily executed when an object goes out
of scope but when the GC runs at some indeterminate point in the future.
Please avoid using finalize for the kind of task
I think maybe we could start a vote on this SPIP.
This has been discussed for a while, and the current doc is pretty complete
as for now. Also we saw lots of demands in the community about building
their own shuffle storage.
Thanks
Saisai
Imran Rashid 于2019年6月11日周二 上午3:27写道:
> I would be
Hi Dongjoon,
Thanks for the proposal! I like the idea. Maybe we can extend it to
component too and to some jira labels such as correctness which may be
worth to highlight in PRs too. My only concern is that in many cases JIRAs
are created not very carefully so they may be incorrect at the moment
Seems like a good idea. Can we test this with a component first?
On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun
wrote:
> Hi, All.
>
> Since we use both Apache JIRA and GitHub actively for Apache Spark
> contributions, we have lots of JIRAs and PRs consequently. One specific
> thing I've been
Yea, I think we can automate this process via, for instance,
https://github.com/apache/spark/blob/master/dev/github_jira_sync.py
+1 for such sort of automatic categorizing and matching metadata between
JIRA and github
Adding Josh and Sean as well.
On Thu, 13 Jun 2019, 13:17 Dongjoon Hyun,