Thanks for the pointer -- I guess I should have checked Spark's build script again while debugging. This might be useful to include in a documentation page about how to write and run Spark apps. I think there's are a bunch of such know-how just floating around right now.
Shivaram On Mon, Feb 17, 2014 at 9:27 PM, Patrick Wendell <pwend...@gmail.com> wrote: > BTW my fix in Spark was later generalized to be equivalent to what you > did, which is do this for the entire services directory rather than > just FileSystem. > > On Mon, Feb 17, 2014 at 9:26 PM, Patrick Wendell <pwend...@gmail.com> wrote: >> Ya I ran into this a few months ago. We actually patched the spark >> build back then. It took me a long time to figure it out. >> >> https://github.com/apache/incubator-spark/commit/0c1985b153a2dc2c891ae61c1ee67506926384ae >> >> On Mon, Feb 17, 2014 at 6:47 PM, Shivaram Venkataraman >> <shiva...@eecs.berkeley.edu> wrote: >>> Thanks a lot Jey ! That fixes things. For reference I had to add the >>> following line to build.sbt >>> >>> case m if m.toLowerCase.matches("meta-inf/services.*$") => >>> MergeStrategy.concat >>> >>> Should we also add this to Spark's assembly build ? >>> >>> Thanks >>> Shivaram >>> >>> On Mon, Feb 17, 2014 at 6:27 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote: >>>> We ran into this issue with ADAM, and it came down to an issue of not >>>> merging the "META-INF/services" files correctly. Here's the change we made >>>> to our Maven build files to fix it, can probably do something similar under >>>> SBT too: >>>> https://github.com/bigdatagenomics/adam/commit/b0997760b23c4284efe32eeb968ef2744af8be82 >>>> >>>> -Jey >>>> >>>> >>>> On Mon, Feb 17, 2014 at 6:15 PM, Shivaram Venkataraman >>>> <shiva...@eecs.berkeley.edu> wrote: >>>>> >>>>> I ran into a weird bug today where trying to read a file from HDFS >>>>> built using Hadoop 2 gives an error saying "No FileSystem for scheme: >>>>> hdfs". Specifically this only seems to happen when building an >>>>> assembly jar in the application and not when using sbt's run-main. >>>>> >>>>> The project's setup[0] is pretty simple and is only a slight >>>>> modification of the project used by the release audit tool. The sbt >>>>> assembly instructions[1] are mostly copied from Spark's sbt build >>>>> files. >>>>> >>>>> We run into this in SparkR as well, so it'll be great if anybody has >>>>> an idea on how to debug this. >>>>> To repoduce, you can do the following: >>>>> >>>>> 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2 >>>>> 2. Clone https://github.com/shivaram/spark-utils >>>>> 3. Run release-audits/sbt_app_core/run-hdfs-test.sh >>>>> >>>>> Thanks >>>>> Shivaram >>>>> >>>>> [0] >>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala >>>>> [1] >>>>> https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt >>>> >>>>