Hi TD,
I have sent more informations now using 8 workers. The gap has been 27 sec now.
Have you seen?
Thanks
BR
--
Informativa sulla Privacy: http://www.unibs.it/node/8155
Hi Deb, feel free to add accuracy along with precision and recall. -Xiangrui
On Mon, May 12, 2014 at 1:26 PM, Debasish Das debasish.da...@gmail.com wrote:
Hi,
I see precision and recall but no accuracy in mllib.evaluation.binary.
Is it already under development or it needs to be added ?
Are you setting a core limit with spark.cores.max? If you don't, in coarse
mode each Spark job uses all available cores on Mesos and doesn't let them
go until the job is terminated. At which point the other job can access
the cores.
https://spark.apache.org/docs/latest/running-on-mesos.html --
Dear, all
definition of fetch wait time:
* Time the task spent waiting for remote shuffle blocks. This only
includes the time
* blocking on shuffle input data. For instance if block B is being
fetched while the task is
* still not finished processing block A, it is not considered to
Thanks Nicholas! I looked at those docs several times without noticing that
critical part you highlighted.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p5494.html
Sent from the Apache Spark User List mailing list
We are running into same issue. After 700 or so files the stack overflows,
cache, persist checkpointing dont help.
Basically checkpointing only saves the RDD when it is materialized it
only materializes in the end, then it runs out of stack.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
The table will be cached but 10GB (Most likely more) would be on disk. You
can check that in the storage tab in shark application.
Java out of memory could be as your worker memory is too low or memory
allocated to Shark is too low.
Mayur Rustagi
Ph: +1 (760) 203 3257
Unfortunately it's very difficult to get uncaching right with GraphX due to
the complicated internal dependency structure that it creates. It's
necessary to know exactly what operations you're doing on the graph in order
to unpersist correctly (i.e., in a way that avoids recomputation).
I have a
We are using spray + Akka + spark stack at Alpine data labs
Chester
Sent from my iPhone
On May 4, 2014, at 8:37 PM, ZhangYi yizh...@thoughtworks.com wrote:
Hi all,
Currently, our project is planning to adopt spark to be big data platform.
For the client side, we decide expose REST
unsubscribe
--
This message and any attachments are intended only for the use of the addressee
and may contain information that is privileged and confidential. If the reader
of the message is not the intended recipient or an
i think so, fewer questions and answers these three days
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-any-problem-on-the-spark-mailing-list-tp5509p5522.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
In general, you can find out exactly what's not serializable by adding
-Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS.
Since a this reference to the enclosing class is often what's causing the
problem, a general workaround is to move the mapPartitions call to a static
method
Hi Zhen,
Thanks a lot for sharing. I'm sure it will be useful for new users.
A small note: On the 'checkpoint' explanation:
sc.setCheckpointDir(my_directory_name)
it would be useful to specify that 'my_directory_name' should exist in all
slaves. As an alternative you could use an HDFS directory
Hi, i'm writing this post because I would to know a caching approach for
iterative algorithms in graphX. So far I was not able to keep stable the
time of execution of each iteration. Can you achieve this condition?
The code I used is this:
var g = ... // my graph
var prevG: Graph[VD, ED] = null
Great work!thanks!
On May 13, 2014 3:16 AM, zhen z...@latrobe.edu.au wrote:
Hi Everyone,
I found it quite difficult to find good examples for Spark RDD API calls.
So
my student and I decided to go through the entire API and write examples
for
the vast majority of API calls (basically
Scala's for-loop is not just looping; it's not native looping in bytecode
level. It will create a couple of objects at runtime and performs a
truckload of method calls on them. As a result, if you are referring the
variables outside the for-loop, the whole for-loop object and any variable
inside
Hi wxhsdp,
See https://github.com/scalanlp/breeze/issues/142 and
https://github.com/fommil/netlib-java/issues/60 for details.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Tue, May
Which hadoop version did you use? I'm not sure whether Hadoop v2 fixes
the problem you described, but it does contain several fixes to bzip2
format. -Xiangrui
On Wed, May 7, 2014 at 9:19 PM, Andrew Ash and...@andrewash.com wrote:
Hi all,
Is anyone reading and writing to .bz2 files stored in
Hi All,
We are also waiting for this. Does anyone know of tentative date for this
release ?
We are at spark 0.8.0 right now. Should we wait for spark 1.0 or upgrade
to spark 0.9.1 ?
Thanks,
Anurag Tangri
On Tue, May 13, 2014 at 9:40 AM, bhusted brian.hus...@gmail.com wrote:
Can anyone
Hi,
How do I load native BLAS libraries on Mac ?
I am getting the following errors while running LR and SVM with SGD:
14/05/07 10:48:13 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
14/05/07 10:48:13 WARN BLAS: Failed to load implementation from:
On Mon, May 12, 2014 at 12:14 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
That API is something the HDFS administrator uses outside of any application
to tell HDFS to cache certain files or directories. But once you’ve done
that, any existing HDFS client accesses them directly from the
Thanks Xiangrui. After some debugging efforts, it turns out that the
problem results from a bug in my code. But it's good to know that a long
lineage could also lead to this problem. I will also try checkpointing to
see whether the performance can be improved.
Best regards,
- Guanhua
On 5/13/14
Count causes the overall performance to drop drastically. Infact beyond 50
files it starts to hang. if i force materialization.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, May 13, 2014 at 9:34 PM,
Great to know that! Thank you, Matei.
Best regards,
-chanwit
--
Chanwit Kaewkasi
linkedin.com/in/chanwit
On Tue, May 13, 2014 at 2:14 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
That API is something the HDFS administrator uses outside of any application
to tell HDFS to cache certain
24 matches
Mail list logo