There are 44 issues still targeted for 1.4.1. None are Blockers; 12
are Critical. ~80% were opened and/or set by committers. Compare with
90 issues resolved for 1.4.1.
I'm concerned that committers are targeting lots more for a release
even in the short term than realistically can go in. On its
Fare points, I also like simpler solutions.
The overhead of Python task could be a few of milliseconds, which
means we also should eval them as batches (one Python task per batch).
Decreasing the batch size for UDF sounds reasonable to me, together
with other tricks to reduce the data in
Hi Michael Armbrust,
I have filed an issue on JIRA for this,
https://issues.apache.org/jira/browse/SPARK-8588
https://issues.apache.org/jira/browse/SPARK-8588
--
View this message in context:
Hi Davies,
In general, do we expect people to use CPython only for heavyweight UDFs
that invoke an external library? Are there any examples of using Jython,
especially performance comparisons to Java/Scala and CPython? When using
Jython, do you expect the driver to send code to the executor as a
How spark guarantees that no RDD will fail /lost during its life cycle .
Is there something like ask in storm or its does it by default .
--
Thanks Regards,
Anshu Shukla
Correct, I was running with a batch size of about 100 when I did the tests,
because I was worried about deadlocks. Do you have any concerns regarding
the batched synchronous version of communication between the Java and
Python processes, and if not, should I file a ticket and starting writing
it?
From you comment, the 2x improvement only happens when you have the
batch size as 1, right?
On Wed, Jun 24, 2015 at 12:11 PM, Justin Uang justin.u...@gmail.com wrote:
FYI, just submitted a PR to Pyrolite to remove their StopException.
https://github.com/irmen/Pyrolite/pull/30
With my
We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
by chunk. Recently we read the whole file using textFile method and found
the map/reduce job is much faster. Anybody can help us to understand why? We
Hi Sean,
I'm running a Mesos cluster. My driver app is built using maven against the
maven 1.4.0 dependency.
The Mesos slave machines have the spark distribution installed from the
distribution link.
I have a hard time understanding how this isn't a standard app deployment
but maybe I'm missing
If you read the file one by one and then use parallelize, it is read by a
single thread on a single machine.
On Wednesday, June 24, 2015, xing ehomec...@gmail.com wrote:
We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then
When we compare the performance, we already excluded this part of time
difference.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html
Sent from the Apache Spark Developers List mailing list
They are different classes even. Your problem isn't class-not-found though.
You're also comparing different builds really. You should not be including
Spark code in your app.
On Wed, Jun 24, 2015, 9:48 PM jimfcarroll jimfcarr...@gmail.com wrote:
These jars are simply incompatible. You can see
Hi Ryan,
If you can get past the paperwork, I'm sure this can make a great Spark
Package (http://spark-packages.org). People then can use it for
benchmarking purposes, and I'm sure people will be looking for graph
generators!
Best,
Burak
On Wed, Jun 24, 2015 at 7:55 AM, Carr, J. Ryan
+1
(partially b/c I would like jira admin myself)
On Tue, Jun 23, 2015 at 3:47 AM, Sean Owen so...@cloudera.com wrote:
There are some committers who are active on JIRA and sometimes need to
do things that require JIRA admin access -- in particular thinking of
adding a new person as
The SparkR code is in the `R` directory i.e.
https://github.com/apache/spark/tree/master/R
Shivaram
On Wed, Jun 24, 2015 at 8:45 AM, Vasili I. Galchin vigalc...@gmail.com
wrote:
Matei,
Last night I downloaded the Spark bundle.
In order to save me time, can you give me the name of the
Hi Spark Devs,
As part of a project at work, I have written a graph generator for RMAT
graphs consistent with the specifications in the Graph 500 benchmark
(http://www.graph500.org/specifications). We had originally planned to use the
rmatGenerator function in GraphGenerators, but found that
Matei,
Last night I downloaded the Spark bundle.
In order to save me time, can you give me the name of the SparkR example is
and where it is in the Sparc tree?
Thanks,
Bill
On Tuesday, June 23, 2015, Matei Zaharia matei.zaha...@gmail.com wrote:
Just FYI, it would be easiest to follow
Thaks,
I am talking about streaming.
On 25 Jun 2015 5:37 am, ayan guha guha.a...@gmail.com wrote:
Can you elaborate little more? Are you talking about receiver or streaming?
On 24 Jun 2015 23:18, anshu shukla anshushuk...@gmail.com wrote:
How spark guarantees that no RDD will fail /lost
Hi all,
I'm trying to implement a custom StandaloneRecoveryModeFactory in the Java
environment. Pls find the implementation here. [1] . I'm new to Scala,
hence I'm trying to use Java environment as much as possible.
when I start a master with spark.deploy.recoveryMode.factory property to be
Hi,
I have Impala created table with the following io format and serde:
inputFormat:parquet.hive.DeprecatedParquetInputFormat,
outputFormat:parquet.hive.DeprecatedParquetOutputFormat,
serdeInfo:SerDeInfo(name:null,
serializationLib:parquet.hive.serde.ParquetHiveSerDe, parameters:{})
I am trying
Hey Sean,
This is being shipped now because there is a severe bug in 1.4.0 that
can cause data corruption for Parquet users.
There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
inconsistent with shipping a release now. The goal of having every
single targeted JIRA cleared by
Hello all,
I have a strange problem. I have a mesos spark cluster with Spark
1.4.0/Hadoop 2.4.0 installed and a client application use maven to include
the same versions.
However, I'm getting a serialUIDVersion problem on:
ERROR Remoting -
Have you tried shuffle compression?
spark.shuffle.compress (true|false)
if you have a filesystem capable also I’ve noticed file consolidation helps
disk usage a bit.
spark.shuffle.consolidateFiles (true|false)
Steve
On Jun 24, 2015, at 3:27 PM, Ulanov, Alexander
These jars are simply incompatible. You can see this by looking at that class
in both the maven repo for 1.4.0 here:
http://central.maven.org/maven2/org/apache/spark/spark-core_2.10/1.4.0/spark-core_2.10-1.4.0.jar
as well as the spark-assembly jar inside the .tgz file you can get from the
24 matches
Mail list logo