An RDD can hold objects of any type. If you generally think of it as a
distributed Collection, then you won't ever be that far off.
As far as serialization, the contents of an RDD must be serializable.
There are two serialization libraries you can use with Spark: normal Java
serialization or
As far as I know, the upstream doesn't release binaries, only source code.
The downloads page https://mesos.apache.org/downloads/ for 0.18.0 only
has a source tarball. Is there a binary release somewhere from Mesos that
I'm missing?
On Sun, May 11, 2014 at 2:16 PM, Patrick Wendell
The main reason is that it doesn't always work (e.g. sometimes application
program has special serialization / externalization written already for
Java which don't work in Kryo).
On Mon, May 12, 2014 at 5:47 PM, Anand Avati av...@gluster.org wrote:
Hi,
Can someone share the reason why Kryo
Thanks for the experiments and analysis!
I think Michael already submitted a patch that avoids scanning all columns
for count(*) or count(1).
On Mon, May 12, 2014 at 9:46 PM, Andrew Ash and...@andrewash.com wrote:
Hi Spark devs,
First of all, huge congrats on the parquet integration with
I'm trying to run spark-shell on Windows that uses Hadoop YARN on Linux.
Specifically, the environment is as follows:
- Client
- OS: Windows 7
- Spark version: 1.0.0-SNAPSHOT (git cloned 2014.5.8)
- Server
- Platform: hortonworks sandbox 2.1
I has to modify the spark source code to apply
I’ll ask the Mesos folks about this. Unfortunately it might be tough to link
only to a company’s builds; but we can perhaps include them in addition to
instructions for building Mesos from Apache.
Matei
On May 12, 2014, at 11:55 PM, Gerard Maas gerard.m...@gmail.com wrote:
Andrew,
On Mon, May 12, 2014 at 2:47 PM, Anand Avati av...@gluster.org wrote:
Hi,
Can someone share the reason why Kryo serializer is not the default?
why should it be?
On top of it, the only way to serialize a closure into the backend (even
now) is java serialization (which means java serialization
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
Hi Deb,
For K possible outcomes in multinomial logistic regression, we can have
K-1 independent binary logistic regression models, in which one outcome is
chosen as a pivot and then the other K-1 outcomes are separately
regressed against the pivot outcome. See my presentation for technical
There were a few early/test RCs this cycle that were never put to a vote.
On Tue, May 13, 2014 at 8:07 AM, Nan Zhu zhunanmcg...@gmail.com wrote:
just curious, where is rc4 VOTE?
I searched my gmail but didn't find that?
On Tue, May 13, 2014 at 9:49 AM, Sean Owen so...@cloudera.com
Thank you for your investigation into this!
Just for completeness, I've confirmed it's a problem only in REPL, not in
compiled Spark programs.
But within REPL, a direct consequence of non-same classes after
serialization/deserialization also means that lookup() doesn't work:
scala class C(val
-1
The following bug should be fixed:
https://issues.apache.org/jira/browse/SPARK-1817
https://issues.apache.org/jira/browse/SPARK-1712
-- Original --
From: Patrick Wendell;pwend...@gmail.com;
Date: Wed, May 14, 2014 04:07 AM
To:
On Tue, May 13, 2014 at 8:26 AM, Michael Malak michaelma...@yahoo.comwrote:
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell.
In the Spark Shell, equals() fails when I use the canonical equals()
pattern of
I just built rc5 on Windows 7 and tried to reproduce the problem described in
https://issues.apache.org/jira/browse/SPARK-1712
It works on my machine:
14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at console:17) finished
in 4.548 s
14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet
Thanks for filing -- I'm keeping my eye out for updates on that ticket.
Cheers!
Andrew
On Tue, May 13, 2014 at 2:40 PM, Michael Armbrust mich...@databricks.comwrote:
It looks like currently the .count() on parquet is handled incredibly
inefficiently and all the columns are materialized.
15 matches
Mail list logo