A work-in-progress PR: https://github.com/apache/spark/pull/21822
The PR also adds the infrastructure to throw exceptions in test mode when
the various transform methods are used as part of analysis. Unfortunately
there are couple edge cases that do need that, and as a result there is
this ugly
Sure, I can wait for this and create another RC then.
Thanks,
Saisai
Xiao Li 于2018年7月20日周五 上午9:11写道:
> Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
> created. The PR has been created. Since this is not rare, let us merge it
> to 2.3.2?
>
> Reynold' PR is to get rid of
Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
created. The PR has been created. Since this is not rare, let us merge it
to 2.3.2?
Reynold' PR is to get rid of AnalysisBarrier. That is better than multiple
patches we added for AnalysisBarrier after 2.3.0 release. We can
I see, thanks Reynold.
Reynold Xin 于2018年7月20日周五 上午8:46写道:
> Looking at the list of pull requests it looks like this is the ticket:
> https://issues.apache.org/jira/browse/SPARK-24867
>
>
>
> On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin wrote:
>
>> I don't think my ticket should block this
Looking at the list of pull requests it looks like this is the ticket:
https://issues.apache.org/jira/browse/SPARK-24867
On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin wrote:
> I don't think my ticket should block this release. It's a big general
> refactoring.
>
> Xiao do you have a ticket for
I don't think my ticket should block this release. It's a big general
refactoring.
Xiao do you have a ticket for the bug you found?
On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao wrote:
> Hi Xiao,
>
> Are you referring to this JIRA (
> https://issues.apache.org/jira/browse/SPARK-24865)?
>
> Xiao
Hi Xiao,
Are you referring to this JIRA (
https://issues.apache.org/jira/browse/SPARK-24865)?
Xiao Li 于2018年7月20日周五 上午2:41写道:
> dfWithUDF.cache()
> dfWithUDF.write.saveAsTable("t")
> dfWithUDF.write.saveAsTable("t1")
>
>
> Cached data is not being used. It causes a big performance regression.
We have had multiple bugs introduced by AnalysisBarrier. In hindsight I
think the original design before analysis barrier was much simpler and
requires less developer knowledge of the infrastructure.
As long as analysis barrier is there, developers writing various code in
analyzer will have to be
Yeah, I was mostly thinking that, if the normal Spark PR tests were setup
to check the sigs (every time? some of the time?), then this could serve as
an automatic check that nothing funny has been done to the archives. There
shouldn't be any difference between the cache and the archive; but if
Yeah if the test code keeps around the archive and/or digest of what it
unpacked. A release should never be modified though, so highly rare.
If the worry is hacked mirrors then we might have bigger problems, but
there the issue is verifying the download sigs in the first place. Those
would have
Is there or should there be some checking of digests just to make sure that
we are really testing against the same thing in /tmp/test-spark that we are
distributing from the archive?
On Thu, Jul 19, 2018 at 11:15 AM Sean Owen wrote:
> Ideally, that list is updated with each release, yes.
dfWithUDF.cache()
dfWithUDF.write.saveAsTable("t")
dfWithUDF.write.saveAsTable("t1")
Cached data is not being used. It causes a big performance regression.
2018-07-19 11:32 GMT-07:00 Sean Owen :
> What regression are you referring to here? A -1 vote really needs a
> rationale.
>
> On Thu,
What regression are you referring to here? A -1 vote really needs a
rationale.
On Thu, Jul 19, 2018 at 1:27 PM Xiao Li wrote:
> I would first vote -1.
>
> I might find another regression caused by the analysis barrier. Will keep
> you posted.
>
>
I would first vote -1.
I might find another regression caused by the analysis barrier. Will keep
you posted.
Xiao
2018-07-18 18:05 GMT-07:00 Takeshi Yamamuro :
> +1 (non-binding)
>
> I run tests on a EC2 m4.2xlarge instance;
> [ec2-user]$ java -version
> openjdk version "1.8.0_171"
> OpenJDK
Ideally, that list is updated with each release, yes. Non-current releases
will now always download from archive.apache.org though. But we run into
rate-limiting problems if that gets pinged too much. So yes good to keep
the list only to current branches.
It looks like the download is cached in
Hi Team - Any good calculator/Excel to estimate compute and storage
requirements for the new spark jobs to be developed.
Capacity planning based on:-
Job, Data type etc
Thanks,
Deepu Raj
+1 this has been problematic.
Also, this list needs to be updated every time we make a new release?
Plus can we cache them on Jenkins, maybe we can avoid downloading the same
thing from Apache archive every test run.
From: Marco Gaido
Sent: Monday, July 16,
I use the state function flatmapgroupswithstate to track state of a kafka
stream. To further customize the state function I like to use a static
datasource (JDBC) in the state function. This datasource contains data I like
to join with the stream (as Iterator) within flatmapgroupswithstate.
18 matches
Mail list logo