I've a Spark cluster with 3 worker nodes.
- *Workers:* 3
- *Cores:* 48 Total, 48 Used
- *Memory:* 469.8 GB Total, 72.0 GB Used
I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB
compressed and 11GB uncompressed.
When I try to read the compressed file from HDFS
On Tue, May 6, 2014 at 10:07 PM, kamatsuoka ken...@gmail.com wrote:
I was using s3n:// but I got frustrated by how
slow it is at writing files.
I'm curious: How slow is slow? How long does it take you, for example, to
save a 1GB file to S3 using s3n vs s3?
Dear all,
Recently we released a distributed extension of LIBLINEAR at
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/distributed-liblinear/
Currently, TRON for logistic regression and L2-loss SVM is supported.
We provided both MPI and Spark implementations.
This is very preliminary so your
Try setting hostname to domain setting in /etc/hosts .
Its not able to resolve ip to hostname
try this ...
localhost 192.168.10.220 CHBM220
On Wed, May 7, 2014 at 12:50 PM, Sophia sln-1...@163.com wrote:
[root@CHBM220 spark-0.9.1]#
I have just the problem resolved via running master and work daemons
individually on where they are.
if I execute the shell: sbin/start-all.sh , the problem always exist.
发件人: Francis.Hu [mailto:francis...@reachjunction.com]
发送时间: Tuesday, May 06, 2014 10:31
收件人: user@spark.apache.org
Nick,
I have encountered strange things like this before (usually when
programming with mutable structures and side-effects), and for me, the
answer was that, until .count (or .first, or similar), is called, your
variable 'a' refers to a set of instructions that only get executed to form
the
Yep. I figured that out. I uncompressed the file and it looks much faster
now. Thanks.
On Sun, May 11, 2014 at 8:14 AM, Mayur Rustagi mayur.rust...@gmail.comwrote:
.gz files are not splittable hence harder to process. Easiest is to move
to a splittable compression like lzo and break file
Got.
But it doesn't indicate all can receive this test.
Mail list is unstable recently.
Sent from my iPhone5s
On 2014年5月10日, at 13:31, Matei Zaharia matei.zaha...@gmail.com wrote:
This message has no content.
There was an outage: https://blogs.apache.org/infra/entry/mail_outage
On Fri, May 9, 2014 at 1:27 PM, wxhsdp wxh...@gmail.com wrote:
i think so, fewer questions and answers these three days
--
View this message in context:
Svend,
I built it on my iMac and it was about the same speed as Windows 7, RHEL 6
VM on Windows 7, and Linux on EC2. Spark is pleasantly easy to build on all
of these platforms, which is wonderful.
How long does it take to start spark-shell?
Maybe it's a JVM memory setting problem on your
resending... my email somehow never made it to the user list.
On Fri, May 9, 2014 at 2:11 PM, Koert Kuipers ko...@tresata.com wrote:
in writing my own RDD i ran into a few issues with respect to stuff being
private in spark.
in compute i would like to return an iterator that respects task
For what it is worth, our team here at
MediaCrossinghttp://mediacrossing.com has
been using the Spark/Mesos combination since last summer with much success
(low operations overhead, high developer performance).
IMO, Hadoop is overcomplicated from both a development and operations
perspective so I
Hello Prof. Lin,
Awesome news ! I am curious if you have any benchmarks comparing C++ MPI
with Scala Spark liblinear implementations...
Is Spark Liblinear apache licensed or there are any specific restrictions
on using it ?
Except using native blas libraries (which each user has to manage by
Will sbt-pack and the maven solution work for the Scala REPL?
I need the REPL because it save a lot of time when I'm playing with large data
sets because I load then once, cache them and then try out things interactively
before putting in a standalone driver.
I've sbt woking for my own
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger jeis...@us.ibm.com wrote:
In a nut shell, Spark opens up a couple of well known ports. And,then the
workers and the shell open up dynamic ports for each job. These dynamic
ports make securing the Spark network difficult.
Indeed.
Judging by
will do
On May 11, 2014 6:44 PM, Aaron Davidson ilike...@gmail.com wrote:
You got a good point there, those APIs should probably be marked as
@DeveloperAPI. Would you mind filing a JIRA for that (
https://issues.apache.org/jira/browse/SPARK)?
On Sun, May 11, 2014 at 11:51 AM, Koert Kuipers
HI Sonal,
Yes I am working towards that same idea. How did you go about creating
the non-spark-jar dependencies ? The way I am doing it is a separate
straw-man project that does not include spark but has the external third
party jars included. Then running sbt compile:managedClasspath and
I didn't get the original message, only the reply. Ruh-roh.
On Sun, May 11, 2014 at 8:09 AM, Azuryy azury...@gmail.com wrote:
Got.
But it doesn't indicate all can receive this test.
Mail list is unstable recently.
Sent from my iPhone5s
On 2014年5月10日, at 13:31, Matei Zaharia
when I put 200 png files to Hdfs , I found sparkStreaming counld detect 200
files , but the sum of rdd.count() is less than 200, always between 130 and
170, I don't know why...Is this a Bug?
PS: When I put 200 files in hdfs before streaming run , It get the correct
count and right result.
Here is
I have built shark in sbt way,but the sbt exception turn out:
[error] sbt.resolveException:unresolved dependency:
org.apache.hadoop#hadoop-client;2.0.0: not found.
How can I do to build it well?
--
View this message in context:
I haven't been getting mail either. This was the last message I received:
http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5491.html
--
View this message in context:
hi,all
I am now using spark in production. but I notice spark driver including rdd
and dag...
and the executors will try to register with the driver.
I think the driver should run on the cluster.and client should run on the
gateway.
Similar like:
22 matches
Mail list logo