Re: Local Spark talking to remote HDFS?

Roberto Congiu Tue, 25 Aug 2015 11:58:55 -0700

That's what I'd suggest too. Furthermore, if you use vagrant to spin up
VMs, there's a module that can do that automatically for you.


R.

2015-08-25 10:11 GMT-07:00 Steve Loughran <ste...@hortonworks.com>:

> I wouldn't try to play with forwarding & tunnelling; always hard to work
> out what ports get used everywhere, and the services like hostname==URL in
> paths.
>
> Can't you just set up an entry in the windows /etc/hosts file? It's what I
> do (on Unix) to talk to VMs
>
>
> > On 25 Aug 2015, at 04:49, Dino Fancellu <d...@felstar.com> wrote:
> >
> > Tried adding 50010, 50020 and 50090. Still no difference.
> >
> > I can't imagine I'm the only person on the planet wanting to do this.
> >
> > Anyway, thanks for trying to help.
> >
> > Dino.
> >
> > On 25 August 2015 at 08:22, Roberto Congiu <roberto.con...@gmail.com>
> wrote:
> >> Port 8020 is not the only port you need tunnelled for HDFS to work. If
> you
> >> only list the contents of a directory, port 8020 is enough... for
> instance,
> >> using something
> >>
> >> val p = new org.apache.hadoop.fs.Path("hdfs://localhost:8020/")
> >> val fs = p.getFileSystem(sc.hadoopConfiguration)
> >> fs.listStatus(p)
> >>
> >> you should see the file list.
> >> But then, when accessing a file, you need to actually get its blocks,
> it has
> >> to connect to the data node.
> >> The error 'could not obtain block' means it can't get that block from
> the
> >> DataNode.
> >> Refer to
> >>
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html
> >> to see the complete list of ports that also need to be tunnelled.
> >>
> >>
> >>
> >> 2015-08-24 13:10 GMT-07:00 Dino Fancellu <d...@felstar.com>:
> >>>
> >>> Changing the ip to the guest IP address just never connects.
> >>>
> >>> The VM has port tunnelling, and it passes through all the main ports,
> >>> 8020 included to the host VM.
> >>>
> >>> You can tell that it was talking to the guest VM before, simply
> >>> because it said when file not found
> >>>
> >>> Error is:
> >>>
> >>> Exception in thread "main" org.apache.spark.SparkException: Job
> >>> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
> >>> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
> >>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> >>> BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
> >>> file=/tmp/people.txt
> >>>
> >>> but I have no idea what it means by that. It certainly can find the
> >>> file and knows it exists.
> >>>
> >>>
> >>>
> >>> On 24 August 2015 at 20:43, Roberto Congiu <roberto.con...@gmail.com>
> >>> wrote:
> >>>> When you launch your HDP guest VM, most likely it gets launched with
> NAT
> >>>> and
> >>>> an address on a private network (192.168.x.x) so on your windows host
> >>>> you
> >>>> should use that address (you can find out using ifconfig on the guest
> >>>> OS).
> >>>> I usually add an entry to my /etc/hosts for VMs that I use often....if
> >>>> you
> >>>> use vagrant, there's also a vagrant module that can do that
> >>>> automatically.
> >>>> Also, I am not sure how the default HDP VM is set up, that is, if it
> >>>> only
> >>>> binds HDFS to 127.0.0.1 or to all addresses. You can check that with
> >>>> netstat
> >>>> -a.
> >>>>
> >>>> R.
> >>>>
> >>>> 2015-08-24 11:46 GMT-07:00 Dino Fancellu <d...@felstar.com>:
> >>>>>
> >>>>> I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.
> >>>>>
> >>>>> If I go into the guest spark-shell and refer to the file thus, it
> works
> >>>>> fine
> >>>>>
> >>>>>  val words=sc.textFile("hdfs:///tmp/people.txt")
> >>>>>  words.count
> >>>>>
> >>>>> However if I try to access it from a local Spark app on my Windows
> >>>>> host,
> >>>>> it
> >>>>> doesn't work
> >>>>>
> >>>>>  val conf = new SparkConf().setMaster("local").setAppName("My App")
> >>>>>  val sc = new SparkContext(conf)
> >>>>>
> >>>>>  val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt")
> >>>>>  words.count
> >>>>>
> >>>>> Emits
> >>>>>
> >>>>>
> >>>>>
> >>>>> The port 8020 is open, and if I choose the wrong file name, it will
> >>>>> tell
> >>>>> me
> >>>>>
> >>>>>
> >>>>>
> >>>>> My pom has
> >>>>>
> >>>>>        <dependency>
> >>>>>                        <groupId>org.apache.spark</groupId>
> >>>>>                        <artifactId>spark-core_2.11</artifactId>
> >>>>>                        <version>1.4.1</version>
> >>>>>                        <scope>provided</scope>
> >>>>>                </dependency>
> >>>>>
> >>>>> Am I doing something wrong?
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> View this message in context:
> >>>>>
> >>>>>
> http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
> >>>>> Sent from the Apache Spark User List mailing list archive at
> >>>>> Nabble.com.
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>>>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>>>
> >>>>
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
>
>

Re: Local Spark talking to remote HDFS?

Reply via email to