Re: Data partitioning and node tracking in Spark-GraphX

2015-05-17 Thread MUHAMMAD AAMIR
Can you please elaborate the way to fetch the records from a particular
partition (node in our case) For example, my RDD is distributed to 10 nodes
and i want to fetch the data of one particular node/partition  i.e.
partition/node with index 5.
How can i do this?
I have tried mapPartitionswithIndex as well as partitions.foreach
functions. However, these are expensive. Does any body know more efficient
way ?

Thanks in anticipation.


On Thu, Apr 16, 2015 at 5:49 PM, Evo Eftimov evo.efti...@isecc.com wrote:

 Well you can have a two level index structure, still without any need for
 physical cluster node awareness



 Level 1 Index is the previously described partitioned [K,V] RDD – this
 gets you to the value (RDD element) you need on the respective cluster node



 Level 2 Index – it will be built and reside within the Value of each [K,V]
 RDD element – so after you retrieve the appropriate Element from the
 appropriate cluster node based on Level 1 Index, then you query the Value
 in the element based on Level 2 Index



 *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com]
 *Sent:* Thursday, April 16, 2015 4:32 PM

 *To:* Evo Eftimov
 *Cc:* user@spark.apache.org
 *Subject:* Re: Data partitioning and node tracking in Spark-GraphX



 Thanks a lot for the reply. Indeed it is useful but to be more precise i
 have 3D data and want to index it using octree. Thus i aim to build a two
 level indexing mechanism i.e. First at global level i want to partition and
 send the data to the nodes then at node level i again want to use octree to
 inded my data at local level.

 Could you please elaborate the solution in this context ?



 On Thu, Apr 16, 2015 at 5:23 PM, Evo Eftimov evo.efti...@isecc.com
 wrote:

 Well you can use a [Key, Value] RDD and partition it based on hash
 function on the Key and even a specific number of partitions (and hence
 cluster nodes). This will a) index the data, b) divide it and send it to
 multiple nodes. Re your last requirement - in a cluster programming
 environment/framework your app code should not be bothered on which
 physical node exactly, a partition resides



 Regards

 Evo Eftimov



 *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com]
 *Sent:* Thursday, April 16, 2015 4:20 PM
 *To:* Evo Eftimov
 *Cc:* user@spark.apache.org
 *Subject:* Re: Data partitioning and node tracking in Spark-GraphX



 I want to use Spark functions/APIs to do this task. My basic purpose is to
 index the data and divide and send it to multiple nodes. Then at the time
 of accessing i want to reach the right node and data partition. I don't
 have any clue how to do this.

 Thanks,



 On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov evo.efti...@isecc.com
 wrote:

 How do you intend to fetch the required data - from within Spark or using
 an app / code / module outside Spark

 -Original Message-
 From: mas [mailto:mas.ha...@gmail.com]
 Sent: Thursday, April 16, 2015 4:08 PM
 To: user@spark.apache.org
 Subject: Data partitioning and node tracking in Spark-GraphX

 I have a big data file, i aim to create index on the data. I want to
 partition the data based on user defined function in Spark-GraphX (Scala).
 Further i want to keep track the node on which a particular data partition
 is send and being processed so i could fetch the required data by accessing
 the right node and data partition.
 How can i achieve this?
 Any help in this regard will be highly appreciated.



 --
 View this message in context:

 http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no
 de-tracking-in-Spark-GraphX-tp22527.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org





 --

 Regards,
 Muhammad Aamir


 *CONFIDENTIALITY:This email is intended solely for the person(s) named and
 may be confidential and/or privileged.If you are not the intended
 recipient,please delete it,notify me and do not copy,use,or disclose its
 content.*





 --

 Regards,
 Muhammad Aamir


 *CONFIDENTIALITY:This email is intended solely for the person(s) named and
 may be confidential and/or privileged.If you are not the intended
 recipient,please delete it,notify me and do not copy,use,or disclose its
 content.*




-- 
Regards,
Muhammad Aamir


*CONFIDENTIALITY:This email is intended solely for the person(s) named and
may be confidential and/or privileged.If you are not the intended
recipient,please delete it,notify me and do not copy,use,or disclose its
content.*


RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
How do you intend to fetch the required data - from within Spark or using
an app / code / module outside Spark  

-Original Message-
From: mas [mailto:mas.ha...@gmail.com] 
Sent: Thursday, April 16, 2015 4:08 PM
To: user@spark.apache.org
Subject: Data partitioning and node tracking in Spark-GraphX

I have a big data file, i aim to create index on the data. I want to
partition the data based on user defined function in Spark-GraphX (Scala). 
Further i want to keep track the node on which a particular data partition
is send and being processed so i could fetch the required data by accessing
the right node and data partition.
How can i achieve this? 
Any help in this regard will be highly appreciated.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no
de-tracking-in-Spark-GraphX-tp22527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread MUHAMMAD AAMIR
Thanks a lot for the reply. Indeed it is useful but to be more precise i
have 3D data and want to index it using octree. Thus i aim to build a two
level indexing mechanism i.e. First at global level i want to partition and
send the data to the nodes then at node level i again want to use octree to
inded my data at local level.
Could you please elaborate the solution in this context ?

On Thu, Apr 16, 2015 at 5:23 PM, Evo Eftimov evo.efti...@isecc.com wrote:

 Well you can use a [Key, Value] RDD and partition it based on hash
 function on the Key and even a specific number of partitions (and hence
 cluster nodes). This will a) index the data, b) divide it and send it to
 multiple nodes. Re your last requirement - in a cluster programming
 environment/framework your app code should not be bothered on which
 physical node exactly, a partition resides



 Regards

 Evo Eftimov



 *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com]
 *Sent:* Thursday, April 16, 2015 4:20 PM
 *To:* Evo Eftimov
 *Cc:* user@spark.apache.org
 *Subject:* Re: Data partitioning and node tracking in Spark-GraphX



 I want to use Spark functions/APIs to do this task. My basic purpose is to
 index the data and divide and send it to multiple nodes. Then at the time
 of accessing i want to reach the right node and data partition. I don't
 have any clue how to do this.

 Thanks,



 On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov evo.efti...@isecc.com
 wrote:

 How do you intend to fetch the required data - from within Spark or using
 an app / code / module outside Spark

 -Original Message-
 From: mas [mailto:mas.ha...@gmail.com]
 Sent: Thursday, April 16, 2015 4:08 PM
 To: user@spark.apache.org
 Subject: Data partitioning and node tracking in Spark-GraphX

 I have a big data file, i aim to create index on the data. I want to
 partition the data based on user defined function in Spark-GraphX (Scala).
 Further i want to keep track the node on which a particular data partition
 is send and being processed so i could fetch the required data by accessing
 the right node and data partition.
 How can i achieve this?
 Any help in this regard will be highly appreciated.



 --
 View this message in context:

 http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no
 de-tracking-in-Spark-GraphX-tp22527.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org





 --

 Regards,
 Muhammad Aamir


 *CONFIDENTIALITY:This email is intended solely for the person(s) named and
 may be confidential and/or privileged.If you are not the intended
 recipient,please delete it,notify me and do not copy,use,or disclose its
 content.*




-- 
Regards,
Muhammad Aamir


*CONFIDENTIALITY:This email is intended solely for the person(s) named and
may be confidential and/or privileged.If you are not the intended
recipient,please delete it,notify me and do not copy,use,or disclose its
content.*


Re: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread MUHAMMAD AAMIR
I want to use Spark functions/APIs to do this task. My basic purpose is to
index the data and divide and send it to multiple nodes. Then at the time
of accessing i want to reach the right node and data partition. I don't
have any clue how to do this.
Thanks,

On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov evo.efti...@isecc.com wrote:

 How do you intend to fetch the required data - from within Spark or using
 an app / code / module outside Spark

 -Original Message-
 From: mas [mailto:mas.ha...@gmail.com]
 Sent: Thursday, April 16, 2015 4:08 PM
 To: user@spark.apache.org
 Subject: Data partitioning and node tracking in Spark-GraphX

 I have a big data file, i aim to create index on the data. I want to
 partition the data based on user defined function in Spark-GraphX (Scala).
 Further i want to keep track the node on which a particular data partition
 is send and being processed so i could fetch the required data by accessing
 the right node and data partition.
 How can i achieve this?
 Any help in this regard will be highly appreciated.



 --
 View this message in context:

 http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no
 de-tracking-in-Spark-GraphX-tp22527.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org





-- 
Regards,
Muhammad Aamir


*CONFIDENTIALITY:This email is intended solely for the person(s) named and
may be confidential and/or privileged.If you are not the intended
recipient,please delete it,notify me and do not copy,use,or disclose its
content.*


RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
Well you can use a [Key, Value] RDD and partition it based on hash function on 
the Key and even a specific number of partitions (and hence cluster nodes). 
This will a) index the data, b) divide it and send it to multiple nodes. Re 
your last requirement - in a cluster programming environment/framework your app 
code should not be bothered on which physical node exactly, a partition resides 
 

 

Regards

Evo Eftimov

 

From: MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] 
Sent: Thursday, April 16, 2015 4:20 PM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: Data partitioning and node tracking in Spark-GraphX

 

I want to use Spark functions/APIs to do this task. My basic purpose is to 
index the data and divide and send it to multiple nodes. Then at the time of 
accessing i want to reach the right node and data partition. I don't have any 
clue how to do this.

Thanks,

 

On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov evo.efti...@isecc.com wrote:

How do you intend to fetch the required data - from within Spark or using
an app / code / module outside Spark

-Original Message-
From: mas [mailto:mas.ha...@gmail.com]
Sent: Thursday, April 16, 2015 4:08 PM
To: user@spark.apache.org
Subject: Data partitioning and node tracking in Spark-GraphX

I have a big data file, i aim to create index on the data. I want to
partition the data based on user defined function in Spark-GraphX (Scala).
Further i want to keep track the node on which a particular data partition
is send and being processed so i could fetch the required data by accessing
the right node and data partition.
How can i achieve this?
Any help in this regard will be highly appreciated.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no 
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-node-tracking-in-Spark-GraphX-tp22527.html
 
de-tracking-in-Spark-GraphX-tp22527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org







 

-- 

Regards,
Muhammad Aamir


CONFIDENTIALITY:This email is intended solely for the person(s) named and may 
be confidential and/or privileged.If you are not the intended recipient,please 
delete it,notify me and do not copy,use,or disclose its content.



RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
Well you can have a two level index structure, still without any need for 
physical cluster node awareness

 

Level 1 Index is the previously described partitioned [K,V] RDD – this gets you 
to the value (RDD element) you need on the respective cluster node

 

Level 2 Index – it will be built and reside within the Value of each [K,V] RDD 
element – so after you retrieve the appropriate Element from the appropriate 
cluster node based on Level 1 Index, then you query the Value in the element 
based on Level 2 Index  

 

From: MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] 
Sent: Thursday, April 16, 2015 4:32 PM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: Data partitioning and node tracking in Spark-GraphX

 

Thanks a lot for the reply. Indeed it is useful but to be more precise i have 
3D data and want to index it using octree. Thus i aim to build a two level 
indexing mechanism i.e. First at global level i want to partition and send the 
data to the nodes then at node level i again want to use octree to inded my 
data at local level.

Could you please elaborate the solution in this context ?

 

On Thu, Apr 16, 2015 at 5:23 PM, Evo Eftimov evo.efti...@isecc.com wrote:

Well you can use a [Key, Value] RDD and partition it based on hash function on 
the Key and even a specific number of partitions (and hence cluster nodes). 
This will a) index the data, b) divide it and send it to multiple nodes. Re 
your last requirement - in a cluster programming environment/framework your app 
code should not be bothered on which physical node exactly, a partition resides 
 

 

Regards

Evo Eftimov

 

From: MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] 
Sent: Thursday, April 16, 2015 4:20 PM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: Data partitioning and node tracking in Spark-GraphX

 

I want to use Spark functions/APIs to do this task. My basic purpose is to 
index the data and divide and send it to multiple nodes. Then at the time of 
accessing i want to reach the right node and data partition. I don't have any 
clue how to do this.

Thanks,

 

On Thu, Apr 16, 2015 at 5:13 PM, Evo Eftimov evo.efti...@isecc.com wrote:

How do you intend to fetch the required data - from within Spark or using
an app / code / module outside Spark

-Original Message-
From: mas [mailto:mas.ha...@gmail.com]
Sent: Thursday, April 16, 2015 4:08 PM
To: user@spark.apache.org
Subject: Data partitioning and node tracking in Spark-GraphX

I have a big data file, i aim to create index on the data. I want to
partition the data based on user defined function in Spark-GraphX (Scala).
Further i want to keep track the node on which a particular data partition
is send and being processed so i could fetch the required data by accessing
the right node and data partition.
How can i achieve this?
Any help in this regard will be highly appreciated.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-no 
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-node-tracking-in-Spark-GraphX-tp22527.html
 
de-tracking-in-Spark-GraphX-tp22527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org





 

-- 

Regards,
Muhammad Aamir


CONFIDENTIALITY:This email is intended solely for the person(s) named and may 
be confidential and/or privileged.If you are not the intended recipient,please 
delete it,notify me and do not copy,use,or disclose its content.





 

-- 

Regards,
Muhammad Aamir


CONFIDENTIALITY:This email is intended solely for the person(s) named and may 
be confidential and/or privileged.If you are not the intended recipient,please 
delete it,notify me and do not copy,use,or disclose its content.