Hi, We are building an internal analytics application. Kind of an event
store. We have all the basic analytics use cases like filtering,
aggregation, segmentation etc. So far our architecture used ElasticSearch
extensively but that is not scaling anymore. One unique requirement we have
is an event
+1 on longer release cycle at schedule and more maintenance releases.
_
From: Mark Hamstra >
Sent: Tuesday, September 27, 2016 2:01 PM
Subject: Re: [discuss] Spark 2.x release cadence
To: Reynold Xin
So technically the vote has passed, but IMHO it does not make sense to
release this and then immediately release 2.0.2. I will work on a new RC
once SPARK-17666 and SPARK-17673 are fixed.
Please shout if you disagree.
On Tue, Sep 27, 2016 at 2:05 PM, Mark Hamstra
If we're going to cut another RC, then it would be good to get this in as
well (assuming that it is merged shortly):
https://github.com/apache/spark/pull/15213
It's not a regression, and it shouldn't happen too often, but when failed
stages don't get resubmitted it is a fairly significant issue.
+1
And I'll dare say that for those with Spark in production, what is more
important is that maintenance releases come out in a timely fashion than
that new features are released one month sooner or later.
On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin wrote:
> We are 2
Actually I'm going to have to -1 the release myself. Sorry for crashing the
party, but I saw two super critical issues discovered in the last 2 days:
https://issues.apache.org/jira/browse/SPARK-17666 -- this would eventually
hang Spark when running against S3 (and many other storage systems)
+1 -- I think the minor releases were taking more like 4 months than 3
months anyway, and it was good for the reasons you give. This reflects
reality and is a good thing. All the better if we then can more
comfortably really follow the timeline.
On Tue, Sep 27, 2016 at 3:06 PM, Reynold Xin
+1 I think having a 4 month window instead of a 3 month window sounds good.
However I think figuring out a timeline for maintenance releases would
also be good. This is a common concern that comes up in many user
threads and it'll be better to have some structure around this. It
doesn't need to
We are 2 months past releasing Spark 2.0.0, an important milestone for the
project. Spark 2.0.0 deviated (took 6 month from the regular release
cadence we had for the 1.x line, and we never explicitly discussed what the
release cadence should look like for 2.x. Thus this email.
During Spark 1.x,
Hi Asaf,
The current collect_list/collect_set implementations have room for
improvement. We did not implement partial aggregation for these, because
the idea of a partial aggregation is that we can reduce network traffic (by
shipping fewer partially aggregated buffers); this does not really apply
Hi,
I wanted to try to implement https://issues.apache.org/jira/browse/SPARK-17691.
So I started by looking at the implementation of collect_list. My idea was, do
the same as they but when adding a new element, if there are already more than
the threshold, remove one instead.
The problem with
Yes - same thing with children in UnaryExpression, BinaryExpression.
Although I have to say the utility isn't that big here.
On Tue, Sep 27, 2016 at 12:53 AM, Jacek Laskowski wrote:
> Hi,
>
> Perhaps nitpicking...you've been warned.
>
> While reviewing expressions in Catalyst
Hi,
Perhaps nitpicking...you've been warned.
While reviewing expressions in Catalyst I've noticed some
inconsistency, i.e. Nondeterministic trait has two methods
deterministic and foldable final override while LeafExpression does
not have children final (at the very least).
My thinking is that
+1 (non-binding)
-suresh
> On Sep 26, 2016, at 11:11 PM, Jagadeesan As wrote:
>
> +1 (non binding)
>
> Cheers,
> Jagadeesan A S
>
>
>
>
> From:Jean-Baptiste Onofré
> To:dev@spark.apache.org
> Date:27-09-16 11:27 AM
> Subject:
14 matches
Mail list logo