Hi Xiao,
that is the right attitude, thanks a ton :)
Hi Kalin,
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.html#emr-5281-relnotes
EMR latest version should be available right out of the box, perhaps you
can raise a quick AWS ticket and find out in case its release it
Hi all,
@Enrico, I've added just the SQL query pages (+js dependencies etc.) in
the google drive -
https://drive.google.com/drive/folders/12pNc5uqhHtCoeCO3nHS3eQ3X7cFzUAQL?usp=sharing
That is what you had in mind right? They are different indeed. (For some
reason after I saved them off of the
If you can confirm that this is caused by Apache Spark, feel free to open a
JIRA. In each release, I do not expect your queries should hit such a major
performance regression. Also, please try the 3.0 preview releases.
Thanks,
Xiao
Kalin Stoyanov 于2020年1月15日周三 上午10:53写道:
> Hi Xiao,
>
>
Hi,
I am pretty sure that AWS has released 5.28.1 with some bug fixes day
before yesterday.
Also please ensure that you are using s3:// instead of s3a:// or anything
like that.
On another note, Xiao, is not entirely right in mentioning about issues in
EMR not to be posted here, a large group of
Hi Xiao,
Thanks, I didn't know that. This
https://aws.amazon.com/about-aws/whats-new/2019/11/announcing-emr-runtime-for-apache-spark/
implies that their fork is not used in emr 5.27. I tried that and it has
the same issue. But then again in their article they were comparing emr
5.27 vs 5.16 so I
Thanks Xiao, a more up to date publication in a conference like VLDB will
certainly turn the the tide for many of us trying to defend Spark's
Optimizer.
On Wed, Jan 15, 2020 at 9:39 AM Xiao Li wrote:
> In the upcoming Spark 3.0, we introduced a new framework for Adaptive
> Query Execution in
EMR is having their own fork of Spark, called EMR runtime. They are not
Apache Spark. You might need to talk with them instead of posting questions
in the Apache Spark community.
Cheers,
Xiao
Kalin Stoyanov 于2020年1月15日周三 上午9:53写道:
> Hi all,
>
> First of all let me say that I am pretty new to
Hi all,
First of all let me say that I am pretty new to Spark so this could be
entirely my fault somehow...
I noticed this when I was running a job on an amazon emr cluster with Spark
2.4.4, and it got done slower than when I had ran it locally (on Spark
2.4.1). I checked out the event logs, and
In the upcoming Spark 3.0, we introduced a new framework for Adaptive Query
Execution in Catalyst. This can adjust the plans based on the runtime
statistics. This is missing in Calcite based on my understanding.
Catalyst is also very easy to enhance. We also use the dynamic programming
approach
Thanks all, and Matei.
TL;DR of the conclusion for my particular case:
Qualitatively, while Catalyst[1] tries to mitigate learning curve and
maintenance burden, it lacks the dynamic programming approach used by
Calcite[2] and risks falling into local minima.
Quantitatively, there is no
10 matches
Mail list logo