Hi everybody,
At the moment, Catalyst rules are defined using two different types of rules:
`Rule[LogicalPlan]` and `Strategy` (which in turn maps to
`GenericStrategy[SparkPlan]`).
I propose to introduce utility methods to
a) reduce the boilerplate to define rewrite rules
b) turning them
Dear List,
We have run into serious problems trying to run a larger than average number of
aggregations in a GROUP BY query. Symptoms of this problem are OutOfMemory
exceptions and unreasonably long processing times due to GC. The problem occurs
when the following two conditions are met:
- The
Quick tests from my side - looks OK. The results are same or very similar
to 1.3.1. Will add dataframes et al in future tests.
+1 (non-binding, of course)
1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
HI all,
I've created another release repository where the release is
identified with the version 1.4.0-rc1:
https://repository.apache.org/content/repositories/orgapachespark-1093/
On Tue, May 19, 2015 at 5:36 PM, Krishna Sankar ksanka...@gmail.com wrote:
Quick tests from my side - looks OK.
There are most likely advantages and disadvantages to Tarek's algorithm
against the current implementation, and different scenarios where each is
more appropriate.
Would we not offer multiple PCA algorithms and let the user choose?
Trevor
Trevor Grant
Data Scientist
*Fortunate is he, who is
Sean,
Did the JIRA get created? If so I can't find it so a pointer would be
helpful.
Regards,
Tim
On 06/05/15 06:59, Reynold Xin wrote:
Sean - Please do.
On Tue, May 5, 2015 at 10:57 PM, Sean Owen so...@cloudera.com wrote:
OK to file a JIRA to scrape out a few Java 6-specific things in
To do it in one pass, conceptually what you would need to do is to consume
the entire parent iterator and store the values either in memory or on
disk, which is generally something you want to avoid given that the parent
iterator length is unbounded. If you need to start spilling to disk, you
Punya,
Let me see if I can publish these under rc1 as well. In the future
this will all be automated but current it's a somewhat manual task.
- Patrick
On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal
punya.bis...@gmail.com wrote:
When publishing future RCs to the staging repository, would
Thanks! I realize that manipulating the published version in the pom is a
bit inconvenient but it's really useful to have clear version identifiers
when we're juggling different versions and testing them out. For example,
this will come in handy when we compare 1.4.0-rc1 and 1.4.0-rc2 in a couple
Hey All,
Since we are now voting, please tread very carefully with branch-1.4 merges.
For instances, bug fixes that don't represent regressions from 1.3.X,
these probably shouldn't be merged unless they are extremely simple
and well reviewed.
As usual mature/core components (e.g. Spark core)
Overall this seems like a reasonable proposal to me. Here are a few
thoughts:
- There is some debugging utility to the ruleName, so we would probably
want to at least make that an argument to the rule function.
- We also have had rules that operate on SparkPlan, though since there is
only one
Please vote on releasing the following candidate as Apache Spark version 1.4.0!
The tag to be voted on is v1.4.0-rc1 (commit 777a081):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e
The release files, including signatures, digests, etc.
There's an open PR to fix it. If you could try it and report back on the PR
it'd be great. More likely to get in fast.
https://github.com/apache/spark/pull/6260
On Mon, May 18, 2015 at 6:43 PM, Fernando O. fot...@gmail.com wrote:
I just noticed I sent this to users instead of dev:
--
Hi,
I vaguely remember issues with using float/double as keys in MR (and spark ?).
But cant seem to find documentation/analysis about the same.
Does anyone have some resource/link I can refer to ?
Thanks,
Mridul
-
To
Hi Peter, a few months ago I was using MetricsSystem to export to Graphite
and then view in Grafana; relevant scripts and some instructions are here
https://github.com/hammerlab/grafana-spark-dashboards/ if you want to
take a look.
On Sun, May 17, 2015 at 8:48 AM Peter Prettenhofer
A couple of other process things:
1. Please *keep voting* (+1/-1) on this thread even if we find some
issues, until we cut RC2. This lets us pipeline the QA.
2. The SQL team owes a JIRA clean-up (forthcoming shortly)... there
are still a few Blocker's that aren't.
On Tue, May 19, 2015 at 9:10
Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0.
I'd like to use this status to really mean must happen before the
release. Many of these may be already fixed, or aren't really blockers --
can just be updated accordingly.
I bet at least one will require further work if
17 matches
Mail list logo