, 2014 at 10:54 AM, Mark Hamstra m...@clearstorydata.com
wrote:
Evan,
Have you actually tried to build Spark using its POM file and
sbt-pom-reader?
I just made a first, naive attempt, and I'm still sorting through just
what this did and didn't produce. It looks like the basic jar files
Couple of comments: 1) Whether the Spark POM is produced by SBT or Maven
shouldn't matter for those who just need to link against published
artifacts, but right now SBT and Maven do not produce equivalent POMs for
Spark -- I think 2) Incremental builds using Maven are trivially more
difficult
'to' is an exception to the usual rule, so (1 to folds).map {... } would be
the best form.
On Mon, Mar 3, 2014 at 1:02 AM, holdenk g...@git.apache.org wrote:
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10203849
A basic Debian package can already be created from the Maven build: mvn
-Pdeb ...
On Tue, Apr 1, 2014 at 11:24 AM, Evan Chan e...@ooyala.com wrote:
Also, I understand this is the last week / merge window for 1.0, so if
folks are interested I'd like to get in a PR quickly.
thanks,
Evan
wrote:
Ya there is already some fragmentation here. Maven has some dist
targets
and there is also ./make-distribution.sh.
On Tue, Apr 1, 2014 at 11:31 AM, Mark Hamstra m...@clearstorydata.com
wrote:
A basic Debian package can already be created from the Maven build:
mvn
...or at least you could do that if the Maven build wasn't broken right now.
On Tue, Apr 1, 2014 at 6:01 PM, Mark Hamstra m...@clearstorydata.comwrote:
What the ... is kind of depends on what you're trying to accomplish.
You could be setting Hadoop version and other stuff
Whoops! Looks like it was just my brain that was broken.
On Tue, Apr 1, 2014 at 6:03 PM, Mark Hamstra m...@clearstorydata.comwrote:
...or at least you could do that if the Maven build wasn't broken right
now.
On Tue, Apr 1, 2014 at 6:01 PM, Mark Hamstra m...@clearstorydata.comwrote
I'm trying to decide whether attacking the underlying issue of
RangePartitioner running eager jobs in rangeBounds (i.e. SPARK-1021) is a
better option than a messy workaround for some async job-handling stuff
that I am working on. It looks like there have been a couple of aborted
attempts to
There were a few early/test RCs this cycle that were never put to a vote.
On Tue, May 13, 2014 at 8:07 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
just curious, where is rc4 VOTE?
I searched my gmail but didn't find that?
On Tue, May 13, 2014 at 9:49 AM, Sean Owen so...@cloudera.com
Sorry, looks like an extra line got inserted in there. One more try:
val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =
val x = Math.random()
val y = Math.random()
if (x*x + y*y 1) 1 else 0
}.reduce(_ + _)
On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra m
Sorry for the duplication, but I think this is the current VOTE candidate
-- we're not voting on rc8 yet?
+1, but just barely. We've got quite a number of outstanding bugs
identified, and many of them have fixes in progress. I'd hate to see those
efforts get lost in a post-1.0.0 flood of new
+1, but just barely. We've got quite a number of outstanding bugs
identified, and many of them have fixes in progress. I'd hate to see those
efforts get lost in a post-1.0.0 flood of new features targeted at 1.1.0 --
in other words, I'd like to see 1.0.1 retain a high priority relative to
1.1.0.
+1
On Fri, May 16, 2014 at 2:16 AM, Patrick Wendell pwend...@gmail.com wrote:
[Due to ASF e-mail outage, I'm not if anyone will actually receive this.]
Please vote on releasing the following candidate as Apache Spark version
1.0.0!
This has only minor changes on top of rc7.
The tag to be
features and 1.x. I think there are a few steps that could
streamline triage of this flood of contributions, and make all of this
easier, but that's for another thread.
On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra m...@clearstorydata.com
wrote:
+1, but just barely. We've got quite
opinion known but left it to the wisdom of larger
group of committers to decide ... I did not think it was critical enough to
do a binding -1 on.
Regards
Mridul
On 17-May-2014 9:43 pm, Mark Hamstra m...@clearstorydata.com wrote:
Which of the unresolved bugs in spark-core do you think
,
then I'm listening.
On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan mri...@gmail.comwrote:
On 17-May-2014 11:40 pm, Mark Hamstra m...@clearstorydata.com wrote:
That is a past issue that we don't need to be re-opening now. The
present
Huh ? If we need to revisit based on changed
That's the crude way to do it. If you run `sbt/sbt publishLocal`, then you
can resolve the artifact from your local cache in the same way that you
would resolve it if it were deployed to a remote cache. That's just the
build step. Actually running the application will require the necessary
jars
That's all very old functionality in Spark terms, so it shouldn't have
anything to do with your installation being out-of-date. There is also no
need to cast as long as the relevant implicit conversions are in scope:
import org.apache.spark.SparkContext._
On Tue, May 20, 2014 at 1:00 PM,
+1
On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.comwrote:
Signature and hash for source looks good
No external executable package with source - good
Compiled with git and maven - good
Ran examples and sample programs locally and standalone -good
+1
- Henry
On
+1
On Tue, May 27, 2014 at 9:26 AM, Ankur Dave ankurd...@gmail.com wrote:
0
OK, I withdraw my downvote.
Ankur http://www.ankurdave.com/
+1
On Fri, Jul 4, 2014 at 12:40 PM, Patrick Wendell pwend...@gmail.com wrote:
I'll start the voting with a +1 - ran tests on the release candidate
and ran some basic programs. RC1 passed our performance regression
suite, and there are no major changes from that RC.
On Fri, Jul 4, 2014 at
) by the same
Mr. Zaharia:
https://github.com/apache/spark/commit/bb1bce79240da22c2677d9f8159683cdf73158c2#diff-776a630ac2b2ec5fe85c07ca20a58fc0
So I'd say it's safe to delete it.
On Wed, Jul 9, 2014 at 2:36 PM, Mark Hamstra m...@clearstorydata.com
wrote:
Doesn't look to me like this is used
project mllib
.
.
.
clean
.
.
.
compile
.
.
.
test
...all works fine for me @2a732110d46712c535b75dd4f5a73761b6463aa8
On Sat, Jul 19, 2014 at 11:10 AM, Debasish Das debasish.da...@gmail.com
wrote:
I am at the reservoir sampling commit:
commit 586e716e47305cd7c2c3ff35c0e828b63ef2f6a8
Sure, drop() would be useful, but breaking the transformations are lazy;
only actions launch jobs model is abhorrent -- which is not to say that we
haven't already broken that model for useful operations (cf.
RangePartitioner, which is used for sorted RDDs), but rather that each such
exception to
Rather than embrace non-lazy transformations and add more of them, I'd
rather we 1) try to fully characterize the needs that are driving their
creation/usage; and 2) design and implement new Spark abstractions that
will allow us to meet those needs and eliminate existing non-lazy
transformation.
Where and how is that fork being maintained? I'm not seeing an obviously
correct branch or tag in the main asf hive repo github mirror.
On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote:
It would be great if the hive team can fix that issue. If not, we'll
have to
.
- Patrick
On Mon, Jul 28, 2014 at 10:02 AM, Mark Hamstra m...@clearstorydata.com
wrote:
Where and how is that fork being maintained? I'm not seeing an obviously
correct branch or tag in the main asf hive repo github mirror.
On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend
Of late, I've been coming across quite a few pull requests and associated
JIRA issues that contain nothing indicating their purpose beyond a pretty
minimal description of what the pull request does. On the pull request
itself, a reference to the corresponding JIRA in the title combined with a
See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
referenced in that JIRA:
https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing
On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov pahomov.e...@gmail.com
wrote:
I have problems using Oozie.
Your's are in the same ballpark with mine, where maven builds with zinc
take about 1.4x the time to build with SBT.
On Fri, Oct 24, 2014 at 4:24 PM, Sean Owen so...@cloudera.com wrote:
Here's a crude benchmark on a Linux box (GCE n1-standard-4). zinc gets
the assembly build in range of SBT's
+1 (binding)
On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
+1 on this proposal.
On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Will these maintainers have a cleanup for those pending PRs upon we start
to apply this model?
I
The console mode of sbt (just run
sbt/sbt and then a long running console session is started that will accept
further commands) is great for building individual subprojects or running
single test suites. In addition to being faster since its a long running
JVM, its got a lot of nice
Ok, strictly speaking, that's equivalent to your second class of
examples, development
console, not the first sbt console
On Sun, Nov 16, 2014 at 1:47 PM, Mark Hamstra m...@clearstorydata.com
wrote:
The console mode of sbt (just run
sbt/sbt and then a long running console session is started
More or less correct, but I'd add that there are an awful lot of software
systems out there that use Maven. Integrating with those systems is
generally easier if you are also working with Spark in Maven. (And I
wouldn't classify all of those Maven-built systems as legacy, Michael :)
What that
- Start the SBT interactive console with sbt/sbt
- Build your assembly by running the assembly target in the assembly
project: assembly/assembly
- Run all the tests in one module: core/test
- Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
also supports tab
And that is no different from how Hive has worked for a long time.
On Fri, Dec 5, 2014 at 11:42 AM, Michael Armbrust mich...@databricks.com
wrote:
The command run fine for me on master. Note that Hive does print an
exception in the logs, but that exception does not propogate to user code.
`zipWithIndex` is both compute intensive and breaks Spark's
transformations are lazy model, so it is probably not appropriate to add
this to the public RDD API. If `zipWithIndex` weren't already what I
consider to be broken, I'd be much friendlier to building something more on
top of it, but I
+1
On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
+1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice
any performance regressions that could not be attributed to changes of
default configurations. To be more specific, when running Spark 1.2.0
SPARK-2992 is a good start, but it's not exhaustive. For example,
zipWithIndex is also an eager transformation, and we occasionally see PRs
suggesting additional eager transformations.
On Thu, Dec 18, 2014 at 12:14 PM, Reynold Xin r...@databricks.com wrote:
Alessandro was probably referring to
In master, Reynold has already taken care of moving Row
into org.apache.spark.sql; so, even though the implementation of Row (and
GenericRow et al.) is in Catalyst (which is more optimizer than parser),
that needn't be of concern to users of the API in its most recent state.
On Tue, Jan 27, 2015
policy as can be: a priority queue.
Alex
On Sat, Jan 10, 2015 at 5:00 PM, Mark Hamstra m...@clearstorydata.com
wrote:
-dev, +user
http://spark.apache.org/docs/latest/job-scheduling.html
On Sat, Jan 10, 2015 at 4:40 PM, Alessandro Baretta
alexbare...@gmail.com wrote:
Is it possible
it sounds like nobody intends these to be used to actually deploy Spark
I wouldn't go quite that far. What we have now can serve as useful input
to a deployment tool like Chef, but the user is then going to need to add
some customization or configuration within the context of that tooling to
A LOADING Executor is on the way to RUNNING, but hasn't yet been registered
with the Master, so it isn't quite ready to do useful work.
On Mar 29, 2015, at 9:09 PM, Niranda Perera niranda.per...@gmail.com wrote:
Hi,
I have noticed in the Spark UI, workers and executors run on several
, but I haven't run it to ground yet.)
On Mon, Feb 23, 2015 at 12:18 PM, Michael Armbrust mich...@databricks.com
wrote:
On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra m...@clearstorydata.com
wrote:
So what are we expecting of Hive 0.12.0 builds with this RC? I know not
every combination
So what are we expecting of Hive 0.12.0 builds with this RC? I know not
every combination of Hadoop and Hive versions, etc., can be supported, but
even an example build from the Building Spark page isn't looking too good
to me.
Working from f97b0d4, the example build command works: mvn -Pyarn
Agreed. The Spark project and community that Vinod describes do not
resemble the ones with which I am familiar.
On Wed, Apr 22, 2015 at 1:20 PM, Patrick Wendell pwend...@gmail.com wrote:
Hi Vinod,
Thanks for you thoughts - However, I do not agree with your sentiment
and implications. Spark
https://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn
On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri pramodbilig...@gmail.com
wrote:
This is great. I didn't know about the mvn script in the build directory.
Pramod
On Fri, May 1, 2015 at 9:51 AM, York, Brennon
+1
On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell pwend...@gmail.com
wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.1!
The tag to be voted on is v1.3.1-rc2 (commit 3e83913):
+1
On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.1!
The tag to be voted on is v1.3.1-rc1 (commit 0dcb5d9f):
+1
On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.4.0!
The tag to be voted on is v1.4.0-rc3 (commit 22596c5):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
This discussion belongs on the dev list. Please post any replies there.
On Sat, May 23, 2015 at 10:19 PM, Cheolsoo Park piaozhe...@gmail.com
wrote:
Hi,
I've been testing SparkSQL in 1.4 rc and found two issues. I wanted to
confirm whether these are bugs or not before opening a jira.
*1)*
Please keep in mind that you are also ASF people, as is the entire Spark
community (users and all)[4]. Phrasing things in terms of us and them by
drawing a distinction on [they] get in a fight on our mailing list is not
helpful.
whineBut they started it!/whine
A bit more seriously, my
Should 1.5.2 wait for Josh's fix of SPARK-11293?
On Sun, Oct 25, 2015 at 2:25 PM, Sean Owen wrote:
> The signatures and licenses are fine. I continue to get failures in
> these tests though, with "-Pyarn -Phadoop-2.6 -Phive
> -Phive-thriftserver" on Ubuntu 15 / Java 7.
>
> -
Yes, that's clearer -- at least to me.
But before going any further, let me note that we are already sliding past
Sean's opening question of "Should we start talking about Spark 2.0?" to
actually start talking about Spark 2.0. I'll try to keep the rest of this
post at a higher- or meta-level in
The place of the RDD API in 2.0 is also something I've been wondering
about. I think it may be going too far to deprecate it, but changing
emphasis is something that we might consider. The RDD API came well before
DataFrames and DataSets, so programming guides, introductory how-to
articles and
n we know that the source relation
> (/RDD) is already partitioned on the grouping expressions. AFAIK the spark
> sql still does not allow that knowledge to be applied to the optimizer - so
> a full shuffle will be performed. However in the native RDD we can use
> preservesPartitioning=tru
d in DF/DS.
>>
>>
>>
>> I mean, we need to think about what kind of RDD APIs we have to provide
>> to developer, maybe the fundamental API is enough, like, the ShuffledRDD
>> etc.. But PairRDDFunctions probably not in this category, as we can do the
>> sam
FiloDB is also closely reated. https://github.com/tuplejump/FiloDB
On Mon, Nov 16, 2015 at 12:24 AM, Nick Pentreath
wrote:
> Cloudera's Kudu also looks interesting here (getkudu.io) - Hadoop
> input/output format support:
>
For more than a small number of files, you'd be better off using
SparkContext#union instead of RDD#union. That will avoid building up a
lengthy lineage.
On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky wrote:
> Hey Jeff,
> Do you mean reading from multiple text files? In
Really, Sandy? "Extra consideration" even for already-deprecated API? If
we're not going to remove these with a major version change, then just when
will we remove them?
On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza wrote:
> Another +1 to Reynold's proposal.
>
> Maybe
I'm liking the way this is shaping up, and I'd summarize it this way (let
me know if I'm misunderstanding or misrepresenting anything):
- New features are not at all the focus of Spark 2.0 -- in fact, a
release with no new features might even be best.
- Remove deprecated API that we
think we are in agreement, although I wouldn't go to the extreme and say
> "a release with no new features might even be best."
>
> Can you elaborate "anticipatory changes"? A concrete example or so would
> be helpful.
>
> On Tue, Nov 10, 2015 at 5:19 PM, Mark Ham
at 7:04 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:
> Heh... ok, I was intentionally pushing those bullet points to be extreme
> to find where people would start pushing back, and I'll agree that we do
> probably want some new features in 2.0 -- but I think we've got good
>
+1
On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.5.2. The vote is open until Sat Nov 7, 2015 at 00:00 UTC and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release
There was a lot of discussion that preceded our arriving at this statement
in the Spark documentation: "Maven is the official build tool recommended
for packaging Spark, and is the build of reference."
https://spark.apache.org/docs/latest/building-spark.html#building-with-sbt
I'm not aware of
HiveSparkSubmitSuite is fine for me, but I do see the same issue with
DataFrameStatSuite
-- OSX 10.10.4, java
1.7.0_75, -Phive -Phive-thriftserver -Phadoop-2.4 -Pyarn
On Wed, Jul 8, 2015 at 4:18 AM, Sean Owen so...@cloudera.com wrote:
The POM issue is resolved and the build succeeds. The
+1
On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.4.1!
This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1
The tag to be voted on is
While we're at it, adding endpoints that get results by jobGroup (cf.
SparkContext#setJobGroup) instead of just for a single Job would also be
very useful to some of us.
On Thu, Sep 17, 2015 at 7:30 AM, Imran Rashid wrote:
> Hi Kevin,
>
> I think it would be great if you
/spark/spark-core_2.10/
>
> Mark Hamstra <m...@clearstorydata.com>于2015年9月22日周二 下午12:55写道:
>
>> There is no 1.5.0-SNAPSHOT because 1.5.0 has already been released. The
>> current head of branch-1.5 is 1.5.1-SNAPSHOT -- soon to be 1.5.1 release
>> candidates and then
There is no 1.5.0-SNAPSHOT because 1.5.0 has already been released. The
current head of branch-1.5 is 1.5.1-SNAPSHOT -- soon to be 1.5.1 release
candidates and then the 1.5.1 release.
On Mon, Sep 21, 2015 at 9:51 PM, Bin Wang wrote:
> I'd like to use some important bug fixes
Try to read this before Marcelo gets to you.
https://issues.apache.org/jira/browse/SPARK-11157
On Thu, Dec 3, 2015 at 5:27 PM, Matt Cheah wrote:
> Hi everyone,
>
> A very brief question out of curiosity – is there any particular reason
> why we don’t publish the Spark
+1
On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust
wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are
I'm afraid you're correct, Krishna:
core/src/main/scala/org/apache/spark/package.scala: val SPARK_VERSION =
"1.6.0-SNAPSHOT"
docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar wrote:
> Guys,
>The sc.version gives
+1
On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust
wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are
You can, but you shouldn't. Using backdoors to mutate the data in an RDD
is a good way to produce confusing and inconsistent results when, e.g., an
RDD's lineage needs to be recomputed or a Task is resubmitted on fetch
failure.
On Tue, Dec 29, 2015 at 11:24 AM, ai he wrote:
+1
On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust
wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are
; APIs but can't move to Spark 2.0 because of the backwards incompatible
> changes, like removal of deprecated APIs, Scala 2.11 etc.
>
> Kostas
>
>
> On Fri, Nov 13, 2015 at 12:26 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> Why does stabili
make this consistent.
>
> But, I think the resolution is simple: it's not 'dangerous' to release
> this and I don't think people who say they think this really do. So
> just finish this release normally, and we're done. Even if you think
> there's an argument against it, weig
This is not a Databricks vs. The World situation, and the fact that some
persist in forcing every issue into that frame is getting annoying. There
are good engineering and project-management reasons not to populate the
long-term, canonical repository of Maven artifacts with what are known to
be
It's not a question of whether the preview artifacts can be made available
on Maven central, but rather whether they must be or should be. I've got
no problems leaving these unstable, transitory artifacts out of the more
permanent, canonical repository.
On Fri, Jun 3, 2016 at 1:53 AM, Steve
SPARK-15893 is resolved as a duplicate of SPARK-15899. SPARK-15899 is
Unresolved.
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
>
>
It's also marked as Minor, not Blocker.
On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin wrote:
> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
> wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> >
e are people that develop for Spark on Windows. The
> referenced issue is indeed Minor and has nothing to do with unit tests.
>
>
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* Wednesday, June 22, 2016 4:09 PM
> *To:* Marcelo Vanzin <van...@cloudera.com>
&
https://github.com/apache/spark/pull/10608
On Fri, Jan 29, 2016 at 11:50 AM, Jakob Odersky wrote:
> I'm not an authoritative source but I think it is indeed the plan to
> move the default build to 2.11.
>
> See this discussion for more detail
>
>
Do all of those thousands of Stages end up being actual Stages that need to
be computed, or are the vast majority of them eventually "skipped" Stages?
If the latter, then there is the potential to modify the DAGScheduler to
avoid much of this behavior:
I don't entirely agree. You're best off picking the right size :). That's
almost impossible, though, since at the input end of the query processing
you often want a large number of partitions to get sufficient parallelism
for both performance and to avoid spilling or OOM, while at the output end
d about we can cleanup those warnings
>>> once we get there.
>>>
>>> On Fri, Apr 1, 2016 at 10:00 PM, Raymond Honderdors <
>>> raymond.honderd...@sizmek.com> wrote:
>>>
>>>> What about a seperate branch for scala 2.10?
>>>>
I agree with your general logic and understanding of semver. That is why
if we are going to violate the strictures of semver, I'd only be happy
doing so if support for Java 7 and/or Scala 2.10 were clearly understood to
be deprecated already in the 2.0.0 release -- i.e. from the outset not to
be
Why would the Executors shutdown when the Job is terminated? Executors are
bound to Applications, not Jobs. Furthermore,
unless spark.job.interruptOnCancel is set to true, canceling the Job at the
Application and DAGScheduler level won't actually interrupt the Tasks
running on the Executors. If
r 6, 2016 at 2:57 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> ... My concern is that either of those options will take more resources
>> than some Spark users will have available in the ~3 months remaining before
>> Spark 2.0.0, which will cause fragmentation
It's a pain in the ass. Especially if some of your transitive dependencies
never upgraded from 2.10 to 2.11.
On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin wrote:
> If you want to go down that route, you should also ask somebody who has
> had experience managing a large
There aren't many such libraries, but there are a few. When faced with one
of those dependencies that still doesn't go beyond 2.10, you essentially
have the choice of taking on the maintenance burden to bring the library up
to date, or you do what is potentially a fairly larger refactoring to use
+1
On Wed, Mar 2, 2016 at 2:45 PM, Michael Armbrust
wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.1!
>
> The vote is open until Saturday, March 5, 2016 at 20:00 UTC and passes if
> a majority of at least 3+1 PMC votes are cast.
>
Yes, it works in standalone mode.
On Mon, Mar 7, 2016 at 4:25 PM, Eugene Morozov
wrote:
> Hi, the feature looks like the one I'd like to use, but there are two
> different descriptions in the docs of whether it's available.
>
> I'm on a standalone deployment mode and
Dropping Scala 2.10 support has to happen at some point, so I'm not
fundamentally opposed to the idea; but I've got questions about how we go
about making the change and what degree of negative consequences we are
willing to accept. Until now, we have been saying that 2.10 support will
be
t for all of spark 2.x
>>
>> Regarding Koert's comment on akka, I thought all akka dependencies
>> have been removed from spark after SPARK-7997 and the recent removal
>> of external/akka
>>
>> On Wed, Mar 30, 2016 at 9:36 AM, Mark Hamstra <m...@clearstorydat
ith Mark in that I don't see how supporting scala 2.10 for
>> spark 2.0 implies supporting it for all of spark 2.x
>>
>> Regarding Koert's comment on akka, I thought all akka dependencies
>> have been removed from spark after SPARK-7997 and the recent remo
I don't believe the Scala compiler understands the difference between your
two examples the same way that you do. Looking at a few similar cases,
I've only found the bytecode produced to be the same regardless of which
style is used.
On Wed, Apr 13, 2016 at 7:46 PM, Hyukjin Kwon
Yes, replicated and distributed shuffle materializations are key
requirement to maintain performance in a fully elastic cluster where
Executors aren't just reallocated across an essentially fixed number of
Worker nodes, but rather the number of Workers itself is dynamic.
Retaining the file
after a work-load burst your cluster dynamically changes from 1
> workers to 1000, will the typical HDFS replication factor be sufficient to
> retain access to the shuffle files in HDFS
>
> HDFS isn't resizing. Spark is. HDFS files should be HA and durable.
>
> On Thu,
1 - 100 of 192 matches
Mail list logo