Hey Marcelo,
Yes - I agree. That one trickled in just as I was packaging this RC.
However, I still put this out here to allow people to test the
existing fixes, etc.
- Patrick
On Wed, Mar 4, 2015 at 9:26 AM, Marcelo Vanzin van...@cloudera.com wrote:
I haven't tested the rc2 bits yet, but I'd
I haven't tested the rc2 bits yet, but I'd consider
https://issues.apache.org/jira/browse/SPARK-6144 a serious regression
from 1.2 (since it affects existing addFile() functionality if the
URL is hdfs:...).
Will test other parts separately.
On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell
-1 (non-binding) because of SPARK-6144.
But aside from that I ran a set of tests on top of standalone and yarn
and things look good.
On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.3.0!
The
the master and workers need some system and package updates, and i'll also
be rebooting the machines as well.
this shouldn't take very long to perform, and i expect jenkins to be back
up and building by 9am at the *latest*.
important note: i will NOT be updating jenkins or any of the plugins
#4 with a preference for CamelCaseEnums
On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley jos...@databricks.com
wrote:
another vote for #4
People are already used to adding () in Java.
On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote:
#4 but with MemoryOnly (more
Hi all,
There are many places where we use enum-like types in Spark, but in
different ways. Every approach has both pros and cons. I wonder
whether there should be an “official” approach for enum-like types in
Spark.
1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
* All types
Hey Mingyu,
I think it's broken out separately so we can record the time taken to
serialize the result. Once we serializing it once, the second
serialization should be really simple since it's just wrapping
something that has already been turned into a byte buffer. Do you see
a specific issue
another vote for #4
People are already used to adding () in Java.
On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch java...@gmail.com wrote:
#4 but with MemoryOnly (more scala-like)
http://docs.scala-lang.org/style/naming-conventions.html
Constants, Values, Variable and Methods
Constant
The concern is really just the runtime overhead and memory footprint of
Java-serializing an already-serialized byte array again. We originally
noticed this when we were using RDD.toLocalIterator() which serializes the
entire 64MB partition. We worked around this issue by kryo-serializing and
Hi all,
It looks like the result of task is serialized twice, once by serializer (I.e.
Java/Kryo depending on configuration) and once again by closure serializer
(I.e. Java). To link the actual code,
The first one:
I'm cool with #4 as well, but make sure we dictate that the values should
be defined within an object with the same name as the enumeration (like we
do for StorageLevel). Otherwise we may pollute a higher namespace.
e.g. we SHOULD do:
trait StorageLevel
object StorageLevel {
case object
I think we will have to fix
https://issues.apache.org/jira/browse/SPARK-5143 as well before the
final 1.3.x.
But yes everything else checks out for me, including sigs and hashes
and building the source release.
I have been following JIRA closely and am not aware of other blockers
besides the
#4 but with MemoryOnly (more scala-like)
http://docs.scala-lang.org/style/naming-conventions.html
Constants, Values, Variable and Methods
Constant names should be in upper camel case. That is, if the member is
final, immutable and it belongs to a package object or an object, it may be
I like #4 as well and agree with Aaron's suggestion.
- Patrick
On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote:
I'm cool with #4 as well, but make sure we dictate that the values should
be defined within an object with the same name as the enumeration (like we
do for
I am trying to read RDD avro, transform and write.
I am able to run it locally fine but when i run onto cluster, i see issues
with Avro.
export SPARK_HOME=/home/dvasthimal/spark/spark-1.0.2-bin-2.4.1
export SPARK_YARN_USER_ENV=CLASSPATH=/apache/hadoop/conf
export
Thanks for your reply, Evan.
It may make sense to have a more general Gibbs sampling
framework, but it might be good to have a few desired applications
in mind (e.g. higher level models that rely on Gibbs) to help API
design, parallelization strategy, etc.
I think I'm more interested in a
Yeah, it will result in a second serialized copy of the array (costing
some memory). But the computational overhead should be very small. The
absolute worst case here will be when doing a collect() or something
similar that just bundles the entire partition.
- Patrick
On Wed, Mar 4, 2015 at 5:47
It is the LR over car-data at https://github.com/xsankar/cloaked-ironman.
1.2.0 gives Mean Squared Error = 40.8130551358
1.3.0 gives Mean Squared Error = 105.857603953
I will verify it one more time tomorrow.
Cheers
k/
On Tue, Mar 3, 2015 at 11:28 PM, Xiangrui Meng men...@gmail.com wrote:
On
Hi, in the roadmap of Spark in 2015 (link:
http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p
ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark
Streaming and Spark SQL.
My question is: what's the typical usage of SchemaRDD in a Spark
Streaming application?
+1 (subject to comments on ec2 issues below)
machine 1: Macbook Air, OSX 10.10.2 (Yosemite), Java 8
machine 2: iMac, OSX 10.8.4, Java 7
1. mvn clean package -DskipTests (33min/13min)
2. ran SVM benchmark https://github.com/insidedctm/spark-mllib-benchmark
EC2 issues:
1) Unable to
Hi Manoj,
this question is best asked on the Spark mailing lists (copied). From a formal
point of view all
that counts is your proposal in Melange once applications start but your mentor
or the project you
wish to contribute to may have additional requirements.
Cheers,
Uli
On 2015-03-03
21 matches
Mail list logo