unsubscribe

2023-08-11 Thread Yifan LI
unsubscribe

Re: [Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-30 Thread Yifan LI
rder > to make them smaller in size. > > HTH, > Deng > > On Thu, Oct 29, 2015 at 8:40 PM, Yifan LI <iamyifa...@gmail.com> wrote: > >> I have a guess that before scanning that RDD, I sorted it and set >> partitioning, so the result is not balanced: >> >&g

[Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-29 Thread Yifan LI
) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Do you have any idea? I have set partitioning quite big, like 4 Best, Yifan LI

Re: [Spark] java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-29 Thread Yifan LI
ll try to repartition it to see if it helps. Best, Yifan LI > On 29 Oct 2015, at 12:52, Yifan LI <iamyifa...@gmail.com> wrote: > > Hey, > > I was just trying to scan a large RDD sortedRdd, ~1billion elements, using > toLocalIterator api, but an exception retu

Re: java.lang.NegativeArraySizeException? as iterating a big RDD

2015-10-23 Thread Yifan LI
Thanks for your advice, Jem. :) I will increase the partitioning and see if it helps. Best, Yifan LI > On 23 Oct 2015, at 12:48, Jem Tucker <jem.tuc...@gmail.com> wrote: > > Hi Yifan, > > I think this is a result of Kryo trying to seriallize something too lar

java.lang.NegativeArraySizeException? as iterating a big RDD

2015-10-23 Thread Yifan LI
.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Best, Yifan LI

"dynamically" sort a large collection?

2015-10-12 Thread Yifan LI
uot;)) … #2: 60 is matched! 60/2 = 30, the collection right now should be as: (3, (53.5, “ccc”)) (4, (48, “ddd”)) (2, (30, “bbb”)) <— inserted back here (5, (29, “eee")) … Best, Yifan LI

Re: "dynamically" sort a large collection?

2015-10-12 Thread Yifan LI
that “sort again”? it is too costly… :( Anyway thank you again! Best, Yifan LI > On 12 Oct 2015, at 12:19, Adrian Tanase <atan...@adobe.com> wrote: > > I think you’re looking for the flatMap (or flatMapValues) operator – you can > do something like > > so

Re: "dynamically" sort a large collection?

2015-10-12 Thread Yifan LI
Shiwei, yes, you might be right. Thanks. :) Best, Yifan LI > On 12 Oct 2015, at 12:55, 郭士伟 <guoshi...@gmail.com> wrote: > > I think this is not a problem Spark can solve effectively, cause RDD in > immutable. Every time you want to change an RDD, you create a new one,

Re: Master dies after program finishes normally

2015-06-26 Thread Yifan LI
Hi, I just encountered the same problem, when I run a PageRank program which has lots of stages(iterations)… The master was lost after my program done. And, the issue still remains even I increased driver memory. Have any idea? e.g. how to increase the master memory? Thanks. Best, Yifan

large shuffling = executor lost?

2015-06-04 Thread Yifan LI
it might because there was a large shuffling… Is there anyone has idea to fix it? Thanks in advance! Best, Yifan LI

com.esotericsoftware.kryo.KryoException: java.io.IOException: Stream is corrupted

2015-05-13 Thread Yifan LI
) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Best, Yifan LI

applications are still in progress?

2015-05-13 Thread Yifan LI
, Yifan LI

No space left on device??

2015-05-06 Thread Yifan LI
computation…, so maybe sometime the request on that node was too bigger than available space. But, is there any way to avoid this kind of error? I am sure that the overall disk space of all nodes is enough for my application. Thanks in advance! Best, Yifan LI

Re: No space left on device??

2015-05-06 Thread Yifan LI
Thanks, Shao. :-) I am wondering if the spark will rebalance the storage overhead in runtime…since still there is some available space on other nodes. Best, Yifan LI On 06 May 2015, at 14:57, Saisai Shao sai.sai.s...@gmail.com wrote: I think you could configure multiple disks through

Re: No space left on device??

2015-05-06 Thread Yifan LI
Yes, you are right. For now I have to say the workload/executor is distributed evenly…so, like you said, it is difficult to improve the situation. However, have you any idea of how to make a *skew* data/executor distribution? Best, Yifan LI On 06 May 2015, at 15:13, Saisai Shao

to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
… … Have any idea? Thanks in advance! :) Best, Yifan LI

Re: to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
Thanks, Olivier and Franz. :) Best, Yifan LI On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote: I guess : val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_) val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_) val srdd_s3 = srdd.filter(_.startsWith(s3_

How to avoid the repartitioning in graph construction

2015-03-27 Thread Yifan LI
), the repartitioning will be inevitable?? Thanks in advance! Best, Yifan LI

Re: Processing graphs

2015-02-17 Thread Yifan LI
Hi Kannan, I am not sure I have understood what your question is exactly, but maybe the reduceByKey or reduceByKeyLocally functionality is better to your need. Best, Yifan LI On 17 Feb 2015, at 17:37, Vijayasarathy Kannan kvi...@vt.edu wrote: Hi, I am working on a Spark

Re: OutofMemoryError: Java heap space

2015-02-12 Thread Yifan LI
Thanks, Kelvin :) The error seems to disappear after I decreased both spark.storage.memoryFraction and spark.shuffle.memoryFraction to 0.2 And, some increase on driver memory. Best, Yifan LI On 10 Feb 2015, at 18:58, Kelvin Chu 2dot7kel...@gmail.com wrote: Since the stacktrace

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
during pregel supersteps. so, it seems to suffer from high GC? Best, Yifan LI On 10 Feb 2015, at 10:26, Akhil Das ak...@sigmoidanalytics.com wrote: You could try increasing the driver memory. Also, can you be more specific about the data volume? Thanks Best Regards On Mon, Feb 9, 2015

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
Yes, I have read it, and am trying to find some way to do that… Thanks :) Best, Yifan LI On 10 Feb 2015, at 12:06, Akhil Das ak...@sigmoidanalytics.com wrote: Did you have a chance to look at this doc http://spark.apache.org/docs/1.2.0/tuning.html http://spark.apache.org/docs/1.2.0

OutofMemoryError: Java heap space

2015-02-09 Thread Yifan LI
) at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:128) at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110) Best, Yifan LI

how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Yifan LI
? Where I can get more details for this issue? Best, Yifan LI

Re: how to debug this kind of error, e.g. lost executor?

2015-02-05 Thread Yifan LI
Anyone has idea on where I can find the detailed log of that lost executor(why it was lost)? Thanks in advance! On 05 Feb 2015, at 16:14, Yifan LI iamyifa...@gmail.com wrote: Hi, I am running a heavy memory/cpu overhead graphx application, I think the memory is sufficient and set

Re: [Graphx Spark] Error of Lost executor and TimeoutException

2015-02-02 Thread Yifan LI
Thanks, Sonal. But it seems to be an error happened when “cleaning broadcast”? BTW, what is the timeout of “[30 seconds]”? can I increase it? Best, Yifan LI On 02 Feb 2015, at 11:12, Sonal Goyal sonalgoy...@gmail.com wrote: That may be the cause of your issue. Take a look

Re: [Graphx Spark] Error of Lost executor and TimeoutException

2015-02-02 Thread Yifan LI
executor 13 15/02/02 11:48:49 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 13 Anyone has points on this? Best, Yifan LI On 02 Feb 2015, at 11:47, Yifan LI iamyifa...@gmail.com wrote: Thanks, Sonal. But it seems to be an error happened when “cleaning

Re: [Graphx Spark] Error of Lost executor and TimeoutException

2015-01-30 Thread Yifan LI
Yes, I think so, esp. for a pregel application… have any suggestion? Best, Yifan LI On 30 Jan 2015, at 22:25, Sonal Goyal sonalgoy...@gmail.com wrote: Is your code hitting frequent garbage collection? Best Regards, Sonal Founder, Nube Technologies http://www.nubetech.co/ http

[Graphx Spark] Error of Lost executor and TimeoutException

2015-01-30 Thread Yifan LI
ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 Best, Yifan LI

[Graphx Spark] Error of Lost executor and TimeoutException

2015-01-30 Thread Yifan LI
: remote Akka client disassociated 15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 15/01/29 23:57:30 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 Best, Yifan LI

300% Fraction Cached?

2014-12-19 Thread Yifan LI
Hi, I just saw an Edge RDD is 300% Fraction Cached” in Storage WebUI, what does that mean? I can understand if the value was under 100%… Thanks. Best, Yifan LI

[Graphx] the communication cost of leftJoin

2014-12-12 Thread Yifan LI
they communicate between partitions(and machines)? Anyone has some points on this, or communication between RDDs? Thanks, :) Best, Yifan LI

[Graphx] which way is better to access faraway neighbors?

2014-12-05 Thread Yifan LI
, one superset is enough - by using spark basic operations(groupByKey, leftJoin, etc) on vertices RDD and its intermediate results. w.r.t the communication among machines, and the high cost of groupByKey/leftJoin, I guess that 1st option is better? what’s your idea? Best, Yifan LI

map function

2014-12-04 Thread Yifan LI
Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3) … generally, for each element, it might be mapped to multiple. Thanks in advance! Best, Yifan LI

Re: map function

2014-12-04 Thread Yifan LI
Thanks, Paolo and Mark. :) On 04 Dec 2014, at 11:58, Paolo Platter paolo.plat...@agilelab.it wrote: Hi, rdd.flatMap( e = e._2.map( i = ( i, e._1))) Should work, but I didn't test it so maybe I'm missing something. Paolo Inviata dal mio Windows Phone Da: Yifan LI mailto:iamyifa

[graphx] failed to submit an application with java.lang.ClassNotFoundException

2014-11-27 Thread Yifan LI
(SparkSubmit.scala) anyone has some points on this? Best, Yifan LI

Re: How to measure communication between nodes in Spark Standalone Cluster?

2014-11-17 Thread Yifan LI
you chose, wrt the vertices replication factor - the distribution of partitions on cluster ... Best, Yifan LI LIP6, UPMC, Paris On 17 Nov 2014, at 11:59, Hlib Mykhailenko hlib.mykhaile...@inria.fr wrote: Hello, I use Spark Standalone Cluster and I want to measure somehow internode

Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Yifan LI
Hi Arpit, To try this: val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = numPartitions, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK, vertexStorageLevel = StorageLevel.MEMORY_AND_DISK) Best, Yifan LI On 28 Oct 2014, at 11:17, Arpit Kumar arp8...@gmail.com

Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Yifan LI
I am not sure if it can work on Spark 1.0, but give it a try. or, Maybe you can try: 1) to construct the edges and vertices RDDs respectively with desired storage level. 2) then, to obtain a graph by using Graph(verticesRDD, edgesRDD). Best, Yifan LI On 28 Oct 2014, at 12:10, Arpit Kumar

how to send message to specific vertex by Pregel api

2014-10-02 Thread Yifan LI
it using basic spark table operations(join, etc), for instance, in [1]) [1] http://event.cwi.nl/grades2014/03-salihoglu.pdf Best, Yifan LI

Re: vertex active/inactive feature in Pregel API ?

2014-09-16 Thread Yifan LI
already been introduced in graphx pregel api? ) Best, Yifan LI On 15 Sep 2014, at 23:07, Ankur Dave ankurd...@gmail.com wrote: At 2014-09-15 16:25:04 +0200, Yifan LI iamyifa...@gmail.com wrote: I am wondering if the vertex active/inactive(corresponding the change of its value between two

vertex active/inactive feature in Pregel API ?

2014-09-15 Thread Yifan LI
]) = Iterator((edge.dstId, hmCal(edge.srcAttr))) or, I should do that by a customised measure function, e.g. by keeping its change in vertex attribute after each iteration. I noticed that there is an optional parameter “skipStale in mrTriplets operator. Best, Yifan LI

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-05 Thread Yifan LI
Dave ankurd...@gmail.com: At 2014-09-03 17:58:09 +0200, Yifan LI iamyifa...@gmail.com wrote: val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK) Error

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-03 Thread Yifan LI
) Error: java.lang.UnsupportedOperationException: Cannot change storage level of an RDD after it was already assigned a level Is there anyone could give me help? Best, Yifan 2014-08-18 23:52 GMT+02:00 Ankur Dave ankurd...@gmail.com: On Mon, Aug 18, 2014 at 6:29 AM, Yifan LI iamyifa...@gmail.com

Re: GraphX

2014-08-02 Thread Yifan LI
Try this: ./bin/run-example graphx.LiveJournalPageRank edge_list_file… On Aug 2, 2014, at 5:55 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, I am running Spark in a single node cluster. I am able to run the codes in Spark like SparkPageRank.scala, SparkKMeans.scala by the following

[GraphX] how to compute only a subset of vertices in the whole graph?

2014-08-02 Thread Yifan LI
hubs can receive messages and compute results in the LAST iteration? since we don't need the final result of non-hub vertices. Best, Yifan LI - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

the implications of some items in webUI

2014-07-22 Thread Yifan LI
with Total Tasks? 2) what are the exact meanings of Shuffle Read/Shuffle Write? Best, Yifan LI

Re: the default GraphX graph-partition strategy on multicore machine?

2014-07-21 Thread Yifan LI
, Yifan LI iamyifa...@gmail.com wrote: I don't understand, for instance, we have 3 edge partition tables(EA: a - b, a - c; EB: a - d, a - e; EC: d - c ), 2 vertex partition tables(VA: a, b, c; VB: d, e), the whole vertex table VA will be replicated to all these 3 edge partitions? since each

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Yifan LI
Thanks, Abel. Best, Yifan LI On Jul 21, 2014, at 4:16 PM, Abel Coronado Iruegas acoronadoirue...@gmail.com wrote: Hi Yifan This works for me: export SPARK_JAVA_OPTS=-Xms10g -Xmx40g -XX:MaxPermSize=10g export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar export SPARK_MEM

Re: the default GraphX graph-partition strategy on multicore machine?

2014-07-18 Thread Yifan LI
in memory? how do they communicate when the required data(edges?) in another partition? On Jul 15, 2014, at 9:30 PM, Ankur Dave ankurd...@gmail.com wrote: On Jul 15, 2014, at 12:06 PM, Yifan LI iamyifa...@gmail.com wrote: Btw, is there any possibility to customise the partition strategy as we

Re: the default GraphX graph-partition strategy on multicore machine?

2014-07-15 Thread Yifan LI
them, but that way others can benefit as well. Ankur On Fri, Jul 11, 2014 at 3:05 AM, Yifan LI iamyifa...@gmail.com wrote: Hi Ankur, I am doing graph computation using GraphX on a single multicore machine(not a cluster). But It seems that I couldn't find enough docs w.r.t how GraphX

GraphX: how to specify partition strategy?

2014-07-10 Thread Yifan LI
Hi, I am doing graph computation using GraphX, but it seems to be an error on graph partition strategy specification. as in GraphX programming guide: The Graph.partitionBy operator allows users to choose the graph partitioning strategy, but due to SPARK-1931, this method is broken in Spark

which Spark package(wrt. graphX) I should install to do graph computation on cluster?

2014-07-07 Thread Yifan LI
Hi, I am planning to do graph(social network) computation on a cluster(hadoop has been installed), but it seems there are a Pre-built package for hadoop which I am NOT sure if the graphX has been included in. or, should I install other released version(obviously the graphX has been included)?