Re: Accessing Hadoop2 HDFS from Spark app

Shivaram Venkataraman Mon, 17 Feb 2014 22:01:22 -0800

Thanks for the pointer -- I guess I should have checked Spark's build
script again while debugging. This might be useful to include in a
documentation page about how to write and run Spark apps. I think
there's are a bunch of such know-how just floating around right now.


Shivaram

On Mon, Feb 17, 2014 at 9:27 PM, Patrick Wendell <[email protected]> wrote:
> BTW my fix in Spark was later generalized to be equivalent to what you
> did, which is do this for the entire services directory rather than
> just FileSystem.
>
> On Mon, Feb 17, 2014 at 9:26 PM, Patrick Wendell <[email protected]> wrote:
>> Ya I ran into this a few months ago. We actually patched the spark
>> build back then. It took me a long time to figure it out.
>>
>> https://github.com/apache/incubator-spark/commit/0c1985b153a2dc2c891ae61c1ee67506926384ae
>>
>> On Mon, Feb 17, 2014 at 6:47 PM, Shivaram Venkataraman
>> <[email protected]> wrote:
>>> Thanks a lot Jey ! That fixes things. For reference I had to add the
>>> following line to build.sbt
>>>
>>>     case m if m.toLowerCase.matches("meta-inf/services.*$")  =>
>>> MergeStrategy.concat
>>>
>>> Should we also add this to Spark's assembly build ?
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Mon, Feb 17, 2014 at 6:27 PM, Jey Kottalam <[email protected]> wrote:
>>>> We ran into this issue with ADAM, and it came down to an issue of not
>>>> merging the "META-INF/services" files correctly. Here's the change we made
>>>> to our Maven build files to fix it, can probably do something similar under
>>>> SBT too:
>>>> https://github.com/bigdatagenomics/adam/commit/b0997760b23c4284efe32eeb968ef2744af8be82
>>>>
>>>> -Jey
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 6:15 PM, Shivaram Venkataraman
>>>> <[email protected]> wrote:
>>>>>
>>>>> I ran into a weird bug today where trying to read a file from HDFS
>>>>> built using Hadoop 2 gives an error saying "No FileSystem for scheme:
>>>>> hdfs".  Specifically this only seems to happen when building an
>>>>> assembly jar in the application and not when using sbt's run-main.
>>>>>
>>>>> The project's setup[0] is pretty simple and is only a slight
>>>>> modification of the project used by the release audit tool. The sbt
>>>>> assembly instructions[1] are mostly copied from Spark's sbt build
>>>>> files.
>>>>>
>>>>> We run into this in SparkR as well, so it'll be great if anybody has
>>>>> an idea on how to debug this.
>>>>> To repoduce, you can do the following:
>>>>>
>>>>> 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2
>>>>> 2. Clone https://github.com/shivaram/spark-utils
>>>>> 3. Run release-audits/sbt_app_core/run-hdfs-test.sh
>>>>>
>>>>> Thanks
>>>>> Shivaram
>>>>>
>>>>> [0]
>>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala
>>>>> [1]
>>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt
>>>>
>>>>

Re: Accessing Hadoop2 HDFS from Spark app

Reply via email to