Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-06 Thread Dino Fancellu
Ok, thanks, just wanted to make sure I wasn't missing something
obvious. I've worked with Neo4j cypher as well, where it was rather
more obvious.

e.g. http://neo4j.com/docs/milestone/query-match.html#_shortest_path
http://neo4j.com/docs/stable/cypher-refcard/

Dino.

On 6 October 2015 at 06:43, Robineast [via Apache Spark User List]
 wrote:
> GraphX doesn't implement Tinkerpop functionality but there is an external
> effort to provide an implementation. See
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4279
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
> 
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24941.html
> To unsubscribe from GraphX: How can I tell if 2 nodes are connected?, click
> here.
> NAML




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24944.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
Ah thanks, got it working with that.

e.g.

val (_,smap)=shortest.vertices.filter(_._1==src).first
smap.contains(dest)

Is there anything a little less eager?

i.e. that doesn't compute all the distances from all source nodes, where I
can supply the source vertex id,  dest vertex id, and just get an int back.

Thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24935.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
Is there an existing api to see if 2 nodes in a graph are connected?

e.g. a->b, b->c, c->d

can I get to d, starting from a? (yes I hope!)

I'm not asking the route, just want to know if there is a route.

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Dino Fancellu
I'm using Windows.

Are you saying it works with Windows?

Dino.

On 29 August 2015 at 09:04, Akhil Das ak...@sigmoidanalytics.com wrote:
 You can also mount HDFS through the NFS gateway and access i think.

 Thanks
 Best Regards

 On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu d...@felstar.com wrote:

 http://hortonworks.com/blog/windows-explorer-experience-hdfs/

 Seemed to exist, now now sign.

 Anything similar to tie HDFS into windows explorer?

 Thanks,



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Where-is-Redgate-s-HDFS-explorer-tp24431.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Local Spark talking to remote HDFS?

2015-08-25 Thread Dino Fancellu
Tried adding 50010, 50020 and 50090. Still no difference.

I can't imagine I'm the only person on the planet wanting to do this.

Anyway, thanks for trying to help.

Dino.

On 25 August 2015 at 08:22, Roberto Congiu roberto.con...@gmail.com wrote:
 Port 8020 is not the only port you need tunnelled for HDFS to work. If you
 only list the contents of a directory, port 8020 is enough... for instance,
 using something

 val p = new org.apache.hadoop.fs.Path(hdfs://localhost:8020/)
 val fs = p.getFileSystem(sc.hadoopConfiguration)
 fs.listStatus(p)

 you should see the file list.
 But then, when accessing a file, you need to actually get its blocks, it has
 to connect to the data node.
 The error 'could not obtain block' means it can't get that block from the
 DataNode.
 Refer to
 http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html
 to see the complete list of ports that also need to be tunnelled.



 2015-08-24 13:10 GMT-07:00 Dino Fancellu d...@felstar.com:

 Changing the ip to the guest IP address just never connects.

 The VM has port tunnelling, and it passes through all the main ports,
 8020 included to the host VM.

 You can tell that it was talking to the guest VM before, simply
 because it said when file not found

 Error is:

 Exception in thread main org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
 recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
 org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
 BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
 file=/tmp/people.txt

 but I have no idea what it means by that. It certainly can find the
 file and knows it exists.



 On 24 August 2015 at 20:43, Roberto Congiu roberto.con...@gmail.com
 wrote:
  When you launch your HDP guest VM, most likely it gets launched with NAT
  and
  an address on a private network (192.168.x.x) so on your windows host
  you
  should use that address (you can find out using ifconfig on the guest
  OS).
  I usually add an entry to my /etc/hosts for VMs that I use oftenif
  you
  use vagrant, there's also a vagrant module that can do that
  automatically.
  Also, I am not sure how the default HDP VM is set up, that is, if it
  only
  binds HDFS to 127.0.0.1 or to all addresses. You can check that with
  netstat
  -a.
 
  R.
 
  2015-08-24 11:46 GMT-07:00 Dino Fancellu d...@felstar.com:
 
  I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.
 
  If I go into the guest spark-shell and refer to the file thus, it works
  fine
 
val words=sc.textFile(hdfs:///tmp/people.txt)
words.count
 
  However if I try to access it from a local Spark app on my Windows
  host,
  it
  doesn't work
 
val conf = new SparkConf().setMaster(local).setAppName(My App)
val sc = new SparkContext(conf)
 
val words=sc.textFile(hdfs://localhost:8020/tmp/people.txt)
words.count
 
  Emits
 
 
 
  The port 8020 is open, and if I choose the wrong file name, it will
  tell
  me
 
 
 
  My pom has
 
  dependency
  groupIdorg.apache.spark/groupId
  artifactIdspark-core_2.11/artifactId
  version1.4.1/version
  scopeprovided/scope
  /dependency
 
  Am I doing something wrong?
 
  Thanks.
 
 
 
 
  --
  View this message in context:
 
  http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
  Sent from the Apache Spark User List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Local Spark talking to remote HDFS?

2015-08-24 Thread Dino Fancellu
I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.

If I go into the guest spark-shell and refer to the file thus, it works fine

  val words=sc.textFile(hdfs:///tmp/people.txt)
  words.count

However if I try to access it from a local Spark app on my Windows host, it
doesn't work

  val conf = new SparkConf().setMaster(local).setAppName(My App)
  val sc = new SparkContext(conf)
  
  val words=sc.textFile(hdfs://localhost:8020/tmp/people.txt)
  words.count

Emits



The port 8020 is open, and if I choose the wrong file name, it will tell me



My pom has

dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.11/artifactId
version1.4.1/version
scopeprovided/scope
/dependency

Am I doing something wrong?

Thanks.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Local Spark talking to remote HDFS?

2015-08-24 Thread Dino Fancellu
Changing the ip to the guest IP address just never connects.

The VM has port tunnelling, and it passes through all the main ports,
8020 included to the host VM.

You can tell that it was talking to the guest VM before, simply
because it said when file not found

Error is:

Exception in thread main org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
file=/tmp/people.txt

but I have no idea what it means by that. It certainly can find the
file and knows it exists.



On 24 August 2015 at 20:43, Roberto Congiu roberto.con...@gmail.com wrote:
 When you launch your HDP guest VM, most likely it gets launched with NAT and
 an address on a private network (192.168.x.x) so on your windows host you
 should use that address (you can find out using ifconfig on the guest OS).
 I usually add an entry to my /etc/hosts for VMs that I use oftenif you
 use vagrant, there's also a vagrant module that can do that automatically.
 Also, I am not sure how the default HDP VM is set up, that is, if it only
 binds HDFS to 127.0.0.1 or to all addresses. You can check that with netstat
 -a.

 R.

 2015-08-24 11:46 GMT-07:00 Dino Fancellu d...@felstar.com:

 I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.

 If I go into the guest spark-shell and refer to the file thus, it works
 fine

   val words=sc.textFile(hdfs:///tmp/people.txt)
   words.count

 However if I try to access it from a local Spark app on my Windows host,
 it
 doesn't work

   val conf = new SparkConf().setMaster(local).setAppName(My App)
   val sc = new SparkContext(conf)

   val words=sc.textFile(hdfs://localhost:8020/tmp/people.txt)
   words.count

 Emits



 The port 8020 is open, and if I choose the wrong file name, it will tell
 me



 My pom has

 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-core_2.11/artifactId
 version1.4.1/version
 scopeprovided/scope
 /dependency

 Am I doing something wrong?

 Thanks.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Where is Redgate's HDFS explorer?

2015-08-24 Thread Dino Fancellu
http://hortonworks.com/blog/windows-explorer-experience-hdfs/

Seemed to exist, now now sign.

Anything similar to tie HDFS into windows explorer?

Thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Where-is-Redgate-s-HDFS-explorer-tp24431.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org