Re: Testing spark HDFS ... anything we should think about?

jay vyas Fri, 20 Feb 2015 18:38:29 -0800

Spark <-> HDFS integration works great.  just confirmed it.

1) Build spark (gradlew spark-yum)
2) vagrant up (add spark to vagrant conf file)
3) hadoop fs -put /etc/passwd /etc/passwd
4) val lines = sc.textFile("/tmp/passwd")


and then you can do lines.collect ... which prints it out.



On Fri, Feb 20, 2015 at 7:36 PM, jay vyas <[email protected]>
wrote:

> no prob im verifying !
>
> On Fri, Feb 20, 2015 at 7:17 PM, Konstantin Boudnik <[email protected]>
> wrote:
>
>> On Fri, Feb 20, 2015 at 02:07PM, jay vyas wrote:
>> > yes, i think thats what he means, b/c when running on yarn, you read in
>> the
>> > conf from hadoop_conf, and you manually send jars like
>> spark-examples.jar
>> > (which would, otherwise, be available to workers if you had spark
>> instaled
>> > on all nodes).
>> >
>> > im okay w/ either (standalone, yarn, mesos, whatever) spark deplopyment,
>> > but we should probably pick one :)
>> >
>> > for now, at a minimum,  we want to make sure we are able to at least
>> > leverage HDFS properly, even if we just run standalone spark
>>
>> I don't think it is an issue, really. Or at least it wasn't when we did
>> Spark
>> initially. Would be great if someone is willing to verify - I have no
>> cycles
>> for Spark anymore, honestly.
>>
>> Cos
>>
>> > On Fri, Feb 20, 2015 at 1:58 PM, Konstantin Boudnik <[email protected]>
>> wrote:
>> >
>> > > On Fri, Feb 20, 2015 at 02:21PM, Evans Ye wrote:
>> > > > I don't have spark expertise, but here're some points I'm thinking
>> about.
>> > > > IIRC, spark standalone do not support Kerberos. And the benefit of
>> > > > deploying spark on yarn should be that you don't need to maintain
>> > > packages
>> > > > by your own on hundreds of node cluster.
>> > >
>> > > Could you clarify what you mean by this? Are you referring that you
>> won't
>> > > need
>> > > to install spark-worker on the cluster's nodes?
>> > >
>> > > Cos
>> > >
>> > > > Not sure if there're downsides. Just want to add some points:)
>> > > >
>> > > > 2015-02-20 9:49 GMT+08:00 Konstantin Boudnik <[email protected]>:
>> > > >
>> > > > > They way we're deploying spark is in the standalone mode - I never
>> > > seen any
>> > > > > value in using YARN for that, but I guess it's just me.
>> > > > >
>> > > > > HDFS use comes with no hassle, AFAIR, the way we setup it up. But
>> my
>> > > > > knowledge
>> > > > > might be a bit outdated...
>> > > > >
>> > > > > Cos
>> > > > >
>> > > > > On Thu, Feb 19, 2015 at 08:45PM, jay vyas wrote:
>> > > > > > hi folks.
>> > > > > >
>> > > > > > is anyone planning to use spark on yarn or spark w/ hdfs in
>> bigtop?
>> > > I
>> > > > > > havent tried either...
>> > > > > >
>> > > > > > - anyone using spark <-> HDFS in bigtop ? Do we need to update
>> any
>> > > spark
>> > > > > > configs to do so ?
>> > > > > > - we want spark to run  on yarn ? standalone ?
>> > > > > >
>> > > > > > im spinning some VMs up now, ill let folks know if it works.
>> > > > > > --
>> > > > > > jay vyas
>> > > > >
>> > >
>> >
>> >
>> >
>> > --
>> > jay vyas
>>
>
>
>
> --
> jay vyas
>



-- 
jay vyas

Re: Testing spark HDFS ... anything we should think about?

Reply via email to