unsubscribe
rder
> to make them smaller in size.
>
> HTH,
> Deng
>
> On Thu, Oct 29, 2015 at 8:40 PM, Yifan LI <iamyifa...@gmail.com> wrote:
>
>> I have a guess that before scanning that RDD, I sorted it and set
>> partitioning, so the result is not balanced:
>>
>&g
)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Do you have any idea?
I have set partitioning quite big, like 4
Best,
Yifan LI
ll try to repartition it to see if it helps.
Best,
Yifan LI
> On 29 Oct 2015, at 12:52, Yifan LI <iamyifa...@gmail.com> wrote:
>
> Hey,
>
> I was just trying to scan a large RDD sortedRdd, ~1billion elements, using
> toLocalIterator api, but an exception retu
Thanks for your advice, Jem. :)
I will increase the partitioning and see if it helps.
Best,
Yifan LI
> On 23 Oct 2015, at 12:48, Jem Tucker <jem.tuc...@gmail.com> wrote:
>
> Hi Yifan,
>
> I think this is a result of Kryo trying to seriallize something too lar
.scala:730)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Best,
Yifan LI
uot;))
…
#2:
60 is matched! 60/2 = 30, the collection right now should be as:
(3, (53.5, “ccc”))
(4, (48, “ddd”))
(2, (30, “bbb”)) <— inserted back here
(5, (29, “eee"))
…
Best,
Yifan LI
that “sort again”? it is too
costly… :(
Anyway thank you again!
Best,
Yifan LI
> On 12 Oct 2015, at 12:19, Adrian Tanase <atan...@adobe.com> wrote:
>
> I think you’re looking for the flatMap (or flatMapValues) operator – you can
> do something like
>
> so
Shiwei, yes, you might be right. Thanks. :)
Best,
Yifan LI
> On 12 Oct 2015, at 12:55, 郭士伟 <guoshi...@gmail.com> wrote:
>
> I think this is not a problem Spark can solve effectively, cause RDD in
> immutable. Every time you want to change an RDD, you create a new one,
Hi,
I just encountered the same problem, when I run a PageRank program which has
lots of stages(iterations)…
The master was lost after my program done.
And, the issue still remains even I increased driver memory.
Have any idea? e.g. how to increase the master memory?
Thanks.
Best,
Yifan
it might because there was a large shuffling…
Is there anyone has idea to fix it? Thanks in advance!
Best,
Yifan LI
)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Best,
Yifan LI
,
Yifan LI
computation…, so maybe sometime the request
on that node was too bigger than available space.
But, is there any way to avoid this kind of error? I am sure that the overall
disk space of all nodes is enough for my application.
Thanks in advance!
Best,
Yifan LI
Thanks, Shao. :-)
I am wondering if the spark will rebalance the storage overhead in
runtime…since still there is some available space on other nodes.
Best,
Yifan LI
On 06 May 2015, at 14:57, Saisai Shao sai.sai.s...@gmail.com wrote:
I think you could configure multiple disks through
Yes, you are right. For now I have to say the workload/executor is distributed
evenly…so, like you said, it is difficult to improve the situation.
However, have you any idea of how to make a *skew* data/executor distribution?
Best,
Yifan LI
On 06 May 2015, at 15:13, Saisai Shao
…
…
Have any idea? Thanks in advance! :)
Best,
Yifan LI
Thanks, Olivier and Franz. :)
Best,
Yifan LI
On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote:
I guess :
val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_)
val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_)
val srdd_s3 = srdd.filter(_.startsWith(s3_
), the
repartitioning will be inevitable??
Thanks in advance!
Best,
Yifan LI
Hi Kannan,
I am not sure I have understood what your question is exactly, but maybe the
reduceByKey or reduceByKeyLocally functionality is better to your need.
Best,
Yifan LI
On 17 Feb 2015, at 17:37, Vijayasarathy Kannan kvi...@vt.edu wrote:
Hi,
I am working on a Spark
Thanks, Kelvin :)
The error seems to disappear after I decreased both
spark.storage.memoryFraction and spark.shuffle.memoryFraction to 0.2
And, some increase on driver memory.
Best,
Yifan LI
On 10 Feb 2015, at 18:58, Kelvin Chu 2dot7kel...@gmail.com wrote:
Since the stacktrace
during pregel supersteps. so, it seems to suffer from
high GC?
Best,
Yifan LI
On 10 Feb 2015, at 10:26, Akhil Das ak...@sigmoidanalytics.com wrote:
You could try increasing the driver memory. Also, can you be more specific
about the data volume?
Thanks
Best Regards
On Mon, Feb 9, 2015
Yes, I have read it, and am trying to find some way to do that… Thanks :)
Best,
Yifan LI
On 10 Feb 2015, at 12:06, Akhil Das ak...@sigmoidanalytics.com wrote:
Did you have a chance to look at this doc
http://spark.apache.org/docs/1.2.0/tuning.html
http://spark.apache.org/docs/1.2.0
)
at
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:128)
at
org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110)
Best,
Yifan LI
? Where I can get more details for this issue?
Best,
Yifan LI
Anyone has idea on where I can find the detailed log of that lost executor(why
it was lost)?
Thanks in advance!
On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote:
Hi,
I am running a heavy memory/cpu overhead graphx application, I think the
memory is sufficient and set
Thanks, Sonal.
But it seems to be an error happened when “cleaning broadcast”?
BTW, what is the timeout of “[30 seconds]”? can I increase it?
Best,
Yifan LI
On 02 Feb 2015, at 11:12, Sonal Goyal sonalgoy...@gmail.com wrote:
That may be the cause of your issue. Take a look
executor 13
15/02/02 11:48:49 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 13
Anyone has points on this?
Best,
Yifan LI
On 02 Feb 2015, at 11:47, Yifan LI iamyifa...@gmail.com wrote:
Thanks, Sonal.
But it seems to be an error happened when “cleaning
Yes, I think so, esp. for a pregel application… have any suggestion?
Best,
Yifan LI
On 30 Jan 2015, at 22:25, Sonal Goyal sonalgoy...@gmail.com wrote:
Is your code hitting frequent garbage collection?
Best Regards,
Sonal
Founder, Nube Technologies http://www.nubetech.co/
http
ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 8
Best,
Yifan LI
: remote Akka client disassociated
15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 8
15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 8
Best,
Yifan LI
Hi,
I just saw an Edge RDD is 300% Fraction Cached” in Storage WebUI, what does
that mean? I can understand if the value was under 100%…
Thanks.
Best,
Yifan LI
they communicate
between partitions(and machines)?
Anyone has some points on this, or communication between RDDs?
Thanks, :)
Best,
Yifan LI
, one superset is enough
- by using spark basic operations(groupByKey, leftJoin, etc) on vertices RDD
and its intermediate results.
w.r.t the communication among machines, and the high cost of
groupByKey/leftJoin, I guess that 1st option is better?
what’s your idea?
Best,
Yifan LI
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3)
…
generally, for each element, it might be mapped to multiple.
Thanks in advance!
Best,
Yifan LI
Thanks, Paolo and Mark. :)
On 04 Dec 2014, at 11:58, Paolo Platter paolo.plat...@agilelab.it wrote:
Hi,
rdd.flatMap( e = e._2.map( i = ( i, e._1)))
Should work, but I didn't test it so maybe I'm missing something.
Paolo
Inviata dal mio Windows Phone
Da: Yifan LI mailto:iamyifa
(SparkSubmit.scala)
anyone has some points on this?
Best,
Yifan LI
you chose, wrt the vertices replication factor
- the distribution of partitions on cluster
...
Best,
Yifan LI
LIP6, UPMC, Paris
On 17 Nov 2014, at 11:59, Hlib Mykhailenko hlib.mykhaile...@inria.fr wrote:
Hello,
I use Spark Standalone Cluster and I want to measure somehow internode
Hi Arpit,
To try this:
val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions =
numPartitions, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)
Best,
Yifan LI
On 28 Oct 2014, at 11:17, Arpit Kumar arp8...@gmail.com
I am not sure if it can work on Spark 1.0, but give it a try.
or, Maybe you can try:
1) to construct the edges and vertices RDDs respectively with desired storage
level.
2) then, to obtain a graph by using Graph(verticesRDD, edgesRDD).
Best,
Yifan LI
On 28 Oct 2014, at 12:10, Arpit Kumar
it using basic spark table operations(join, etc), for
instance, in [1])
[1] http://event.cwi.nl/grades2014/03-salihoglu.pdf
Best,
Yifan LI
already been
introduced in graphx pregel api? )
Best,
Yifan LI
On 15 Sep 2014, at 23:07, Ankur Dave ankurd...@gmail.com wrote:
At 2014-09-15 16:25:04 +0200, Yifan LI iamyifa...@gmail.com wrote:
I am wondering if the vertex active/inactive(corresponding the change of its
value between two
]) =
Iterator((edge.dstId, hmCal(edge.srcAttr)))
or, I should do that by a customised measure function, e.g. by keeping its
change in vertex attribute after each iteration.
I noticed that there is an optional parameter “skipStale in mrTriplets
operator.
Best,
Yifan LI
Dave ankurd...@gmail.com:
At 2014-09-03 17:58:09 +0200, Yifan LI iamyifa...@gmail.com wrote:
val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions =
numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK)
Error
)
Error: java.lang.UnsupportedOperationException: Cannot change storage level
of an RDD after it was already assigned a level
Is there anyone could give me help?
Best,
Yifan
2014-08-18 23:52 GMT+02:00 Ankur Dave ankurd...@gmail.com:
On Mon, Aug 18, 2014 at 6:29 AM, Yifan LI iamyifa...@gmail.com
Try this:
./bin/run-example graphx.LiveJournalPageRank edge_list_file…
On Aug 2, 2014, at 5:55 PM, Deep Pradhan pradhandeep1...@gmail.com wrote:
Hi,
I am running Spark in a single node cluster. I am able to run the codes in
Spark like SparkPageRank.scala, SparkKMeans.scala by the following
hubs can receive messages and compute results in
the LAST iteration? since we don't need the final result of non-hub vertices.
Best,
Yifan LI
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
with Total Tasks?
2) what are the exact meanings of Shuffle Read/Shuffle Write?
Best,
Yifan LI
, Yifan LI iamyifa...@gmail.com wrote:
I don't understand, for instance, we have 3 edge partition tables(EA: a - b,
a - c; EB: a - d, a - e; EC: d - c ), 2 vertex partition tables(VA: a, b,
c; VB: d, e), the whole vertex table VA will be replicated to all these 3
edge partitions? since each
Thanks, Abel.
Best,
Yifan LI
On Jul 21, 2014, at 4:16 PM, Abel Coronado Iruegas acoronadoirue...@gmail.com
wrote:
Hi Yifan
This works for me:
export SPARK_JAVA_OPTS=-Xms10g -Xmx40g -XX:MaxPermSize=10g
export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar
export SPARK_MEM
in memory?
how do they communicate when the required data(edges?) in another partition?
On Jul 15, 2014, at 9:30 PM, Ankur Dave ankurd...@gmail.com wrote:
On Jul 15, 2014, at 12:06 PM, Yifan LI iamyifa...@gmail.com wrote:
Btw, is there any possibility to customise the partition strategy as we
them,
but that way others can benefit as well.
Ankur
On Fri, Jul 11, 2014 at 3:05 AM, Yifan LI iamyifa...@gmail.com wrote:
Hi Ankur,
I am doing graph computation using GraphX on a single multicore machine(not a
cluster).
But It seems that I couldn't find enough docs w.r.t how GraphX
Hi,
I am doing graph computation using GraphX, but it seems to be an error on graph
partition strategy specification.
as in GraphX programming guide:
The Graph.partitionBy operator allows users to choose the graph partitioning
strategy, but due to SPARK-1931, this method is broken in Spark
Hi,
I am planning to do graph(social network) computation on a cluster(hadoop has
been installed), but it seems there are a Pre-built package for hadoop which
I am NOT sure if the graphX has been included in.
or, should I install other released version(obviously the graphX has been
included)?
54 matches
Mail list logo