Ah, ok. It was missing in the list of jiras. So +1.
Thanks, Hari
On Mon, Apr 6, 2015 at 11:36 AM, Patrick Wendell pwend...@gmail.com
wrote:
I believe TD just forgot to set the fix version on the JIRA. There is
a fix for this in 1.3:
Cc'ing Chris Fregly, who wrote the Kinesis integration. Maybe he can help.
On Mon, Apr 6, 2015 at 9:23 AM, Vadim Bichutskiy vadim.bichuts...@gmail.com
wrote:
Hi all,
I am wondering, has anyone on this list been able to successfully
implement Spark on top of Kinesis?
Best,
Vadim
ᐧ
On
+1
On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.1!
The tag to be voted on is v1.3.1-rc1 (commit 0dcb5d9f):
+1
On Sun, Apr 5, 2015 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.2!
The tag to be voted on is v1.2.2-rc1 (commit 7531b50):
I believe TD just forgot to set the fix version on the JIRA. There is
a fix for this in 1.3:
https://github.com/apache/spark/commit/03e263f5b527cf574f4ffcd5cd886f7723e3756e
- Patrick
On Mon, Apr 6, 2015 at 2:31 PM, Mark Hamstra m...@clearstorydata.com wrote:
Is that correct, or is the JIRA
+1 too
On Sun, Apr 5, 2015 at 4:24 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.2!
The tag to be voted on is v1.2.2-rc1 (commit 7531b50):
Note that we can do this in DataFrames and use Catalyst to push Sample down
beneath Projection :)
On Mon, Apr 6, 2015 at 12:42 PM, Xiangrui Meng men...@gmail.com wrote:
The gap sampling is triggered when the sampling probability is small
and the directly underlying storage has constant time
+1
On Apr 4, 2015, at 6:11 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.1!
The tag to be voted on is v1.3.1-rc1 (commit 0dcb5d9f):
The gap sampling is triggered when the sampling probability is small
and the directly underlying storage has constant time lookups, in
particular, ArrayBuffer. This is a very strict requirement. If rdd is
cached in memory, we use ArrayBuffer to store its elements and
rdd.sample will trigger gap
This is being discussed in
https://issues.apache.org/jira/browse/SPARK-6407. Let's move the
discussion there. Thanks for providing references! -Xiangrui
On Sun, Apr 5, 2015 at 11:48 PM, Chunnan Yao yaochun...@gmail.com wrote:
On-line Collaborative Filtering(CF) has been widely used and studied.
This would be great for those of us running on HDP. At eBay we recently ran
in to few problems using the generic Hadoop lib. Two off of the top of my
head:
* Needed to included our custom Hadoop client due to custom keberos
integration
* Minor difference in HDFS protocol causing the following
Similar problem on 1.2 branch:
[ERROR] Failed to execute goal on project spark-core_2.11: Could not resolve
dependencies for project
org.apache.spark:spark-core_2.11:jar:1.2.3-SNAPSHOT: The following artifacts
could not be resolved:
org.apache.spark:spark-network-common_2.10:jar:1.2.3-SNAPSHOT,
I don't think it's required. This looks like zinc is running (it seems
to find the process on port 3030), but, something is wrong with zinc
then. If you aren't running your own zinc, then it's the copy
downloaded by Spark. Maybe try deleting that and shutting down the
zinc process, and trying a
Hi all,
Joseph proposed an idea about using just builder methods, instead of static
train()
methods for Scala/Java. I agree with that idea. Because we have many
duplicated
static train() method. If you have any thoughts on that please share it with
us.
[SPARK-6682] Deprecate static train and
I'm killing zinc (if it's running) before running each build attempt.
Trying to build as clean as possible.
On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote:
What if you don't run zinc? I.e. just download maven and run that mvn
package It might take longer, but I
The issue is that if you invoke build/mvn it will start zinc again
if it sees that it is killed.
The absolute most sterile thing to do is this:
1. Kill any zinc processes.
2. Clean up spark git clean -fdx (WARNING: this will delete any
staged changes you have, if you have code modifications or
One thing that I think can cause issues is if you run build/mvn with
Scala 2.10, then try to run it with 2.11, since I think we may store
some downloaded jars relating to zinc that will get screwed up. Not
sure that's what is happening, just an idea.
On Mon, Apr 6, 2015 at 10:54 PM, Patrick
I resorted to deleting the spark directory between each build earlier today
(attempting maximum sterility) and then re-cloning from github and switching
to the 1.2 or 1.3 branch.
Does anything persist outside of the spark directory?
Are you able to build either 1.2 or 1.3 w/ Scala-2.11?
--
$dev/change-version-to-2.11.sh
$build/mvn -e -DskipTests clean package
[ERROR] Failed to execute goal on project spark-core_2.11: Could not resolve
dependencies for project
org.apache.spark:spark-core_2.11:jar:1.3.2-SNAPSHOT: The following artifacts
could not be resolved:
Killing zinc resolved the problem building with scala-2.10 - thank you.
(adding that to my build script)
Having problems building with scala-2.11 - will post separately for that if
reproducible.
--
View this message in context:
What if you don't run zinc? I.e. just download maven and run that mvn
package It might take longer, but I wonder if it will work.
On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote:
Similar problem on 1.2 branch:
[ERROR] Failed to execute goal on project spark-core_2.11: Could not
Today I cannot build the 1.2 branch:
[INFO]
[INFO] Building Spark Project Networking 1.2.3-SNAPSHOT
[INFO]
snipped
[INFO] ---
The only think that can persist outside of Spark is if there is still
a live Zinc process. We took care to make sure this was a generally
stateless mechanism.
Both the 1.2.X and 1.3.X releases are built with Scala 2.11 for
packaging purposes. And these have been built as recently as in the
last
Hmm.. Make sure you are building with the right flags. I think you need to
pass -Dscala-2.11 to maven. Take a look at the upstream docs - on my phone
now so can't easily access.
On Apr 7, 2015 1:01 AM, mjhb sp...@mjhb.com wrote:
I even deleted my local maven repository (.m2) but still stuck
On-line Collaborative Filtering(CF) has been widely used and studied. To
re-train a CF model from scratch every time when new data comes in is very
inefficient
(http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model).
However, in Spark community we see few
I think those are great to have. I would put them in the DataFrame API
though, since this is applying to structured data. Many of the advanced
functions on the PairRDDFunctions should really go into the DataFrame API
now we have it.
One thing that would be great to understand is what
Hi!
I'd like to get community's opinion on implementing a generic quantile
approximation algorithm for Spark that is O(n) and requires limited memory.
I would find it useful and I haven't found any existing implementation. The
plan was basically to wrap t-digest
See now: https://issues.apache.org/jira/browse/SPARK-6710
On Mon, Apr 6, 2015 at 4:27 AM, Reynold Xin r...@databricks.com wrote:
Adding Jianping Wang to the thread, since he contributed the SVDPlusPlus
implementaiton.
Jianping,
Can you take a look at this message? Thanks.
On Fri, Apr 3,
SPARK-6673 is not, in the end, relevant for 1.3.x I believe; we just
resolved it for 1.4 anyway. False alarm there.
I back-ported SPARK-6205 into the 1.3 branch for next time. We'll pick
it up if there's another RC, but by itself is not something that needs
a new RC. (I will give the same
29 matches
Mail list logo