RE: Data partitioning and node tracking in Spark-GraphX

Evo Eftimov Thu, 16 Apr 2015 08:27:01 -0700

Well you can use a [Key, Value] RDD and partition it based on hash function on 
the Key and even a specific number of partitions (and hence cluster nodes). 
This will a) index the data, b) divide it and send it to multiple nodes. Re 
your last requirement - in a cluster programming environment/framework your app 
code should not be bothered on which physical node exactly, a partition resides


 

Regards

Evo Eftimov

 

From: MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] 
Sent: Thursday, April 16, 2015 4:20 PM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: Data partitioning and node tracking in Spark-GraphX

 

I want to use Spark functions/APIs to do this task. My basic purpose is to 
index the data and divide and send it to multiple nodes. Then at the time of 
accessing i want to reach the right node and data partition. I don't have any 
clue how to do this.

Thanks,

 

On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

How do you intend to "fetch the required data" - from within Spark or using
an app / code / module outside Spark

-----Original Message-----
From: mas [mailto:mas.ha...@gmail.com]
Sent: Thursday, April 16, 2015 4:08 PM
To: user@spark.apache.org
Subject: Data partitioning and node tracking in Spark-GraphX

I have a big data file, i aim to create index on the data. I want to
partition the data based on user defined function in Spark-GraphX (Scala).
Further i want to keep track the node on which a particular data partition
is send and being processed so i could fetch the required data by accessing
the right node and data partition.
How can i achieve this?
Any help in this regard will be highly appreciated.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no 
<http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-node-tracking-in-Spark-GraphX-tp22527.html>
 
de-tracking-in-Spark-GraphX-tp22527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org







 

-- 

Regards,
Muhammad Aamir


CONFIDENTIALITY:This email is intended solely for the person(s) named and may 
be confidential and/or privileged.If you are not the intended recipient,please 
delete it,notify me and do not copy,use,or disclose its content.

RE: Data partitioning and node tracking in Spark-GraphX

Reply via email to