Have you checked reactive monitoring(https://github.com/eigengo/monitor) or
kamon monitoring (https://github.com/kamon-io/Kamon)
Instrumenting needs absolutely no code change. All you do is weaving.
In our environment we use Graphite to get the statsd(you can also get
dtrace) events and display
On Sat, Oct 4, 2014 at 5:28 PM, Abraham Jacob abe.jac...@gmail.com wrote:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
Good. There is also a
Hi,
I am trying to call some c code, let's say the compiled file is /path/code,
and it has chmod +x. When I call it directly, it works. Now i want to call
it from Spark 1.1. My problem is not building it into Spark, but making sure
Spark can find it.
I have tried:
You're putting those into spark-env.sh? Try setting LD_LIBRARY_PATH as
well, that might help.
Also where is the exception coming from? You have to set this properly for
both the cluster and the driver, which are independently set.
Cheers!
Andrew
On Sun, Oct 5, 2014 at 1:06 PM, Tom
Hi Michael,
I couldn't find anything in Jira for it --
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming
Could you or Adrian please file a Jira ticket explaining the functionality
and maybe a proposed API? This
Hi Mingyu,
Maybe we should be limiting our heaps to 32GB max and running multiple
workers per machine to avoid large GC issues.
For a 128GB memory, 32 core machine, this could look like:
SPARK_WORKER_INSTANCES=4
SPARK_WORKER_MEMORY=32
SPARK_WORKER_CORES=8
Are people running with large (32GB+)
You may also be writing your algorithm in a way that it requires high peak
memory usage. An example of this could be using .groupByKey() where
.reduceByKey() might suffice instead. Maybe you can express the algorithm
in a different way that's more efficient?
On Thu, Oct 2, 2014 at 4:30 AM, Sean
Arko,
On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee
arkoprovomukher...@gmail.com wrote:
Apologies if this is a stupid question but I am trying to understand
why this can or cannot be done. As far as I understand that streaming
algorithms need to be different from batch algorithms as
Hi, all.
I'm in an unusual situation.
The code,
...
1: val cell = dataSet.flatMap(parse(_)).cache
2: val distinctCell = cell.keyBy(_._1).reduceByKey(removeDuplication(_,
_)).mapValues(_._3).cache
3: val groupedCellByLine =
distinctCell.map(cellToIterableColumn).groupByKey.cache
4: val result = (1
i understand that graphx is an immutable rdd.
i'm working on an algorithm that requires a mutable graph. initially, the
graph starts with just a few nodes and edges. then over time, it adds more
and more nodes and edges.
what would be the best way to implement this growing graph with
10 matches
Mail list logo