In ideal situation, +1 on removing all vendor specific builds and
making just hadoop version specific - that is what we should depend on
anyway.
Though I hope Sean is correct in assuming that vendor specific builds
for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause
incompatibilities
Hey All,
Today there was a JIRA posted with an observed regression around Spark
Streaming during certain recovery scenarios:
https://issues.apache.org/jira/browse/SPARK-6222
My preference is to go ahead and ship this release (RC3) as-is and if
this issue is isolated resolved soon, we can make a
Does the Apache project team have any ability to measure download counts of
the various releases? That data could be useful when it comes time to
sunset vendor-specific releases, like CDH4 for example.
On Mon, Mar 9, 2015 at 5:34 AM, Mridul Muralidharan mri...@gmail.com
wrote:
In ideal
Krishna, I tested your linear regression example. For linear
regression, we changed its objective function from 1/n * \|A x -
b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
squares formulations. It means you could re-produce the same result by
multiplying the step size by 2.
+1 (non-binding)
- Verified signatures
- Built on Mac OS X and Fedora 21.
On Mon, Mar 9, 2015 at 11:01 PM, Krishna Sankar ksanka...@gmail.com wrote:
Excellent, Thanks Xiangrui. The mystery is solved.
Cheers
k/
On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng men...@gmail.com wrote:
+1
Tested on Mac OS X
On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng men...@gmail.com wrote:
Krishna, I tested your linear regression example. For linear
regression, we changed its objective function from 1/n * \|A x -
b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2 to be consistent with common least
Excellent, Thanks Xiangrui. The mystery is solved.
Cheers
k/
On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng men...@gmail.com wrote:
Krishna, I tested your linear regression example. For linear
regression, we changed its objective function from 1/n * \|A x -
b\|_2^2 to 1/(2n) * \|Ax - b\|_2^2
Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
Distributions X ...
May be one option is to have a minimum basic set (which I know is what we
are discussing) and move the rest to spark-packages.org. There the vendors
can add the latest downloads - for example when 1.4 is
Yeah, interesting question of what is the better default for the
single set of artifacts published to Maven. I think there's an
argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
and cons discussed more at
https://issues.apache.org/jira/browse/SPARK-5134
We probably want to revisit the way we do binaries in general for
1.4+. IMO, something worth forking a separate thread for.
I've been hesitating to add new binaries because people
(understandably) complain if you ever stop packaging older ones, but
on the other hand the ASF has complained that we
+1
Tested it on Mac OS X.
One small issue I noticed is that the Scala 2.11 build is using Hadoop 1
without Hive, which is kind of weird because people will more likely want
Hadoop 2 with Hive. So it would be good to publish a build for that
configuration instead. We can do it if we do a new
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
Maven artifacts.
Patrick I see you just commented on SPARK-5134 and will follow up
there. Sounds like this may accidentally not be a problem.
On binary
Yeah, my concern is that people should get Apache Spark from *Apache*, not from
a vendor. It helps everyone use the latest features no matter where they are.
In the Hadoop distro case, Hadoop made all this effort to have standard APIs
(e.g. YARN), so it should be easy. But it is a problem if
I think it's important to separate the goals from the implementation.
I agree with Matei on the goal - I think the goal needs to be to allow
people to download Apache Spark and use it with CDH, HDP, MapR,
whatever... This is the whole reason why HDFS and YARN have stable
API's, so that other
Our goal is to let people use the latest Apache release even if vendors fall
behind or don't want to package everything, so that's why we put out releases
for vendors' versions. It's fairly low overhead.
Matei
On Mar 8, 2015, at 5:56 PM, Sean Owen so...@cloudera.com wrote:
Ah. I
Yeah it's not much overhead, but here's an example of where it causes
a little issue.
I like that reasoning. However, the released builds don't track the
later versions of Hadoop that vendors would be distributing -- there's
no Hadoop 2.6 build for example. CDH4 is here, but not the
far-more-used
There are still three JIRAs marked as blockers for 1.3.0:
SPARK-5310 Update SQL programming guide for 1.3
SPARK-5183 Document data source API
SPARK-6128 Update Spark Streaming Guide for Spark 1.3
As a matter of hygiene, let's either mark them resolved if they're
resolved, or push them /
Given the title and tagging, it sounds like there could be some
must-have doc changes to go with what is being released as 1.3. It can
be finished later, and published later, but then the docs source
shipped with the release doesn't match the site, and until then, 1.3
is released without some
Hey Sean,
SPARK-5310 Update SQL programming guide for 1.3
SPARK-5183 Document data source API
SPARK-6128 Update Spark Streaming Guide for Spark 1.3
For these, the issue is that they are documentation JIRA's, which
don't need to be timed exactly with the release vote, since we can
update the
+1 (non-binding, doc issues aside)
Ran batch of tests against yarn and standalone, including tests for
rc2 blockers, all looks fine.
On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.0!
The
+1 (non-binding, of course)
1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:55 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
2. Tested pyspark, mlib - running as well as compare results with 1.1.x
1.2.x
pyspark
Sean,
The docs are distributed and consumed in a fundamentally different way
than Spark code itself. So we've always considered the deadline for
doc changes to be when the release is finally posted.
If there are small inconsistencies with the docs present in the source
code for that release tag,
To add to what Patrick said, the only reason that those JIRAs are marked as
Blockers (at least I can say for myself) is so that they are at the top of
the JIRA list signifying that these are more *immediate* issues than all
the Critical issues. To make it less confusing for the community voting,
Although the problem is small, especially if indeed the essential docs
changes are following just a couple days behind the final release, I
mean, why the rush if they're essential? wait a couple days, finish
them, make the release.
Answer is, I think these changes aren't actually essential given
For now, I'll just put this as critical. We can discuss the
documentation stuff offline or in another thread.
On Fri, Mar 6, 2015 at 1:36 PM, Sean Owen so...@cloudera.com wrote:
Although the problem is small, especially if indeed the essential docs
changes are following just a couple days
I'll kick it off with a +1.
On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.0!
The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4):
26 matches
Mail list logo