Re: What else is need to setup native support of BLAS/LAPACK with Spark?
Thanks for the additional info, I tried to follow that and went ahead and directly added netlib to my application POM/JAR - that should be sufficient to make it work? And that is at least definietely on the executor class path? Still got the same warning, so not sure where else to take it. Thanks for all the help everyone! But not sure worth still pursuing, not sure what else to try. Thanks, Arun On Tue, Jul 21, 2015 at 11:16 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: FWIW I've run into similar BLAS related problems before and wrote up a document on how to do this for Spark EC2 clusters at https://github.com/amplab/ml-matrix/blob/master/EC2.md -- Note that this works with a vanilla Spark build (you only need to link to netlib-lgpl in your App) but requires the app jar to be present on all the machines. Thanks Shivaram On Tue, Jul 21, 2015 at 7:37 AM, Arun Ahuja aahuj...@gmail.com wrote: Yes, I imagine it's the driver's classpath - I'm pulling those screenshots straight from the Spark UI environment page. Is there somewhere else to grab the executor class path? Also, the warning is only printing once, so it's also not clear whether the warning is from the driver or exectuor, would you know? Thanks, Arun On Tue, Jul 21, 2015 at 7:52 AM, Sean Owen so...@cloudera.com wrote: Great, and that file exists on HDFS and is world readable? just double-checking. What classpath is this -- your driver or executor? this is the driver, no? I assume so just because it looks like it references the assembly you built locally and from which you're launching the driver. I think we're concerned with the executors and what they have on the classpath. I suspect there is still a problem somewhere in there. On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja aahuj...@gmail.com wrote: Cool, I tried that as well, and doesn't seem different: spark.yarn.jar seems set [image: Inline image 1] This actually doesn't change the classpath, not sure if it should: [image: Inline image 3] But same netlib warning. Thanks for the help! - Arun On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Can you try setting the spark.yarn.jar property to make sure it points to the jar you're thinking of? -Sandy On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote: Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME set to the directory above and using the spark-submit script from there. bin/spark-submit --master yarn-client --executor-memory 10g --driver-memory 8g --num-executors 400 --executor-cores 1 --class org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 --conf spark.storage.memoryFraction=0.15 libgfortran.so.3 is also there ls /usr/lib64/libgfortran.so.3 /usr/lib64/libgfortran.so.3 These are jniloader files in the jar jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep jniloader META-INF/maven/com.github.fommil/jniloader/ META-INF/maven/com.github.fommil/jniloader/pom.xml META-INF/maven/com.github.fommil/jniloader/pom.properties Thanks, Arun On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote: Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue. I'm pretty sure the answer is 'yes', but, make sure the assembly has jniloader too. I don't see why it wouldn't, but, that's needed. What is your env like -- local, standalone, YARN? how are you running? Just want to make sure you are using this assembly across your cluster. On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote: Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native com/github/fommil/netlib/NativeRefARPACK.class com/github/fommil/netlib/NativeRefBLAS.class com/github/fommil/netlib/NativeRefLAPACK.class com/github/fommil/netlib/NativeSystemARPACK.class com/github/fommil/netlib/NativeSystemBLAS.class com/github/fommil/netlib/NativeSystemLAPACK.class Also, I checked the gfortran version on the cluster nodes and it is available and is 5.1 $ gfortran --version GNU Fortran (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. and still see: 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Does anything need to be adjusted in my application POM
Re: What else is need to setup native support of BLAS/LAPACK with Spark?
Yes, I imagine it's the driver's classpath - I'm pulling those screenshots straight from the Spark UI environment page. Is there somewhere else to grab the executor class path? Also, the warning is only printing once, so it's also not clear whether the warning is from the driver or exectuor, would you know? Thanks, Arun On Tue, Jul 21, 2015 at 7:52 AM, Sean Owen so...@cloudera.com wrote: Great, and that file exists on HDFS and is world readable? just double-checking. What classpath is this -- your driver or executor? this is the driver, no? I assume so just because it looks like it references the assembly you built locally and from which you're launching the driver. I think we're concerned with the executors and what they have on the classpath. I suspect there is still a problem somewhere in there. On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja aahuj...@gmail.com wrote: Cool, I tried that as well, and doesn't seem different: spark.yarn.jar seems set [image: Inline image 1] This actually doesn't change the classpath, not sure if it should: [image: Inline image 3] But same netlib warning. Thanks for the help! - Arun On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Can you try setting the spark.yarn.jar property to make sure it points to the jar you're thinking of? -Sandy On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote: Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME set to the directory above and using the spark-submit script from there. bin/spark-submit --master yarn-client --executor-memory 10g --driver-memory 8g --num-executors 400 --executor-cores 1 --class org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 --conf spark.storage.memoryFraction=0.15 libgfortran.so.3 is also there ls /usr/lib64/libgfortran.so.3 /usr/lib64/libgfortran.so.3 These are jniloader files in the jar jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep jniloader META-INF/maven/com.github.fommil/jniloader/ META-INF/maven/com.github.fommil/jniloader/pom.xml META-INF/maven/com.github.fommil/jniloader/pom.properties Thanks, Arun On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote: Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue. I'm pretty sure the answer is 'yes', but, make sure the assembly has jniloader too. I don't see why it wouldn't, but, that's needed. What is your env like -- local, standalone, YARN? how are you running? Just want to make sure you are using this assembly across your cluster. On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote: Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native com/github/fommil/netlib/NativeRefARPACK.class com/github/fommil/netlib/NativeRefBLAS.class com/github/fommil/netlib/NativeRefLAPACK.class com/github/fommil/netlib/NativeSystemARPACK.class com/github/fommil/netlib/NativeSystemBLAS.class com/github/fommil/netlib/NativeSystemLAPACK.class Also, I checked the gfortran version on the cluster nodes and it is available and is 5.1 $ gfortran --version GNU Fortran (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. and still see: 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Does anything need to be adjusted in my application POM? Thanks, Arun On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote: Yes, that's most of the work, just getting the native libs into the assembly. netlib can find them from there even if you don't have BLAS libs on your OS, since it includes a reference implementation as a fallback. One common reason it won't load is not having libgfortran installed on your OSes though. It has to be 4.6+ too. That can't be shipped even in netlib and has to exist on your hosts. The other thing I'd double-check is whether you are really using this assembly you built for your job -- like, it's the actually the assembly the executors are using. On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote: Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar
Re: What else is need to setup native support of BLAS/LAPACK with Spark?
Cool, I tried that as well, and doesn't seem different: spark.yarn.jar seems set [image: Inline image 1] This actually doesn't change the classpath, not sure if it should: [image: Inline image 3] But same netlib warning. Thanks for the help! - Arun On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Can you try setting the spark.yarn.jar property to make sure it points to the jar you're thinking of? -Sandy On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote: Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME set to the directory above and using the spark-submit script from there. bin/spark-submit --master yarn-client --executor-memory 10g --driver-memory 8g --num-executors 400 --executor-cores 1 --class org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 --conf spark.storage.memoryFraction=0.15 libgfortran.so.3 is also there ls /usr/lib64/libgfortran.so.3 /usr/lib64/libgfortran.so.3 These are jniloader files in the jar jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep jniloader META-INF/maven/com.github.fommil/jniloader/ META-INF/maven/com.github.fommil/jniloader/pom.xml META-INF/maven/com.github.fommil/jniloader/pom.properties Thanks, Arun On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote: Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue. I'm pretty sure the answer is 'yes', but, make sure the assembly has jniloader too. I don't see why it wouldn't, but, that's needed. What is your env like -- local, standalone, YARN? how are you running? Just want to make sure you are using this assembly across your cluster. On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote: Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native com/github/fommil/netlib/NativeRefARPACK.class com/github/fommil/netlib/NativeRefBLAS.class com/github/fommil/netlib/NativeRefLAPACK.class com/github/fommil/netlib/NativeSystemARPACK.class com/github/fommil/netlib/NativeSystemBLAS.class com/github/fommil/netlib/NativeSystemLAPACK.class Also, I checked the gfortran version on the cluster nodes and it is available and is 5.1 $ gfortran --version GNU Fortran (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. and still see: 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Does anything need to be adjusted in my application POM? Thanks, Arun On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote: Yes, that's most of the work, just getting the native libs into the assembly. netlib can find them from there even if you don't have BLAS libs on your OS, since it includes a reference implementation as a fallback. One common reason it won't load is not having libgfortran installed on your OSes though. It has to be 4.6+ too. That can't be shipped even in netlib and has to exist on your hosts. The other thing I'd double-check is whether you are really using this assembly you built for your job -- like, it's the actually the assembly the executors are using. On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote: Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar. jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native 6625 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefARPACK.class 21123 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefBLAS.class 178334 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefLAPACK.class 6640 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemARPACK.class 21138 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemBLAS.class 178349 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemLAPACK.class Also I see the following in /usr/lib64 ls /usr/lib64/libblas. libblas.a libblas.solibblas.so.3 libblas.so.3.2 libblas.so.3.2.1 ls /usr/lib64/liblapack liblapack.a liblapack_pic.a liblapack.so liblapack.so.3 liblapack.so.3.2liblapack.so.3.2.1
Re: What else is need to setup native support of BLAS/LAPACK with Spark?
Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native com/github/fommil/netlib/NativeRefARPACK.class com/github/fommil/netlib/NativeRefBLAS.class com/github/fommil/netlib/NativeRefLAPACK.class com/github/fommil/netlib/NativeSystemARPACK.class com/github/fommil/netlib/NativeSystemBLAS.class com/github/fommil/netlib/NativeSystemLAPACK.class Also, I checked the gfortran version on the cluster nodes and it is available and is 5.1 $ gfortran --version GNU Fortran (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. and still see: 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Does anything need to be adjusted in my application POM? Thanks, Arun On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote: Yes, that's most of the work, just getting the native libs into the assembly. netlib can find them from there even if you don't have BLAS libs on your OS, since it includes a reference implementation as a fallback. One common reason it won't load is not having libgfortran installed on your OSes though. It has to be 4.6+ too. That can't be shipped even in netlib and has to exist on your hosts. The other thing I'd double-check is whether you are really using this assembly you built for your job -- like, it's the actually the assembly the executors are using. On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote: Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar. jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native 6625 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefARPACK.class 21123 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefBLAS.class 178334 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefLAPACK.class 6640 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemARPACK.class 21138 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemBLAS.class 178349 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemLAPACK.class Also I see the following in /usr/lib64 ls /usr/lib64/libblas. libblas.a libblas.solibblas.so.3 libblas.so.3.2 libblas.so.3.2.1 ls /usr/lib64/liblapack liblapack.a liblapack_pic.a liblapack.so liblapack.so.3 liblapack.so.3.2liblapack.so.3.2.1 But I stil see the following in the Spark logs: 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Anything in this process I missed? Thanks, Arun
Re: What else is need to setup native support of BLAS/LAPACK with Spark?
Yes, it's a YARN cluster and using spark-submit to run. I have SPARK_HOME set to the directory above and using the spark-submit script from there. bin/spark-submit --master yarn-client --executor-memory 10g --driver-memory 8g --num-executors 400 --executor-cores 1 --class org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 --conf spark.storage.memoryFraction=0.15 libgfortran.so.3 is also there ls /usr/lib64/libgfortran.so.3 /usr/lib64/libgfortran.so.3 These are jniloader files in the jar jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep jniloader META-INF/maven/com.github.fommil/jniloader/ META-INF/maven/com.github.fommil/jniloader/pom.xml META-INF/maven/com.github.fommil/jniloader/pom.properties Thanks, Arun On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote: Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue. I'm pretty sure the answer is 'yes', but, make sure the assembly has jniloader too. I don't see why it wouldn't, but, that's needed. What is your env like -- local, standalone, YARN? how are you running? Just want to make sure you are using this assembly across your cluster. On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote: Hi Sean, Thanks for the reply! I did double-check that the jar is one I think I am running: [image: Inline image 2] jar tf /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native com/github/fommil/netlib/NativeRefARPACK.class com/github/fommil/netlib/NativeRefBLAS.class com/github/fommil/netlib/NativeRefLAPACK.class com/github/fommil/netlib/NativeSystemARPACK.class com/github/fommil/netlib/NativeSystemBLAS.class com/github/fommil/netlib/NativeSystemLAPACK.class Also, I checked the gfortran version on the cluster nodes and it is available and is 5.1 $ gfortran --version GNU Fortran (GCC) 5.1.0 Copyright (C) 2015 Free Software Foundation, Inc. and still see: 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Does anything need to be adjusted in my application POM? Thanks, Arun On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote: Yes, that's most of the work, just getting the native libs into the assembly. netlib can find them from there even if you don't have BLAS libs on your OS, since it includes a reference implementation as a fallback. One common reason it won't load is not having libgfortran installed on your OSes though. It has to be 4.6+ too. That can't be shipped even in netlib and has to exist on your hosts. The other thing I'd double-check is whether you are really using this assembly you built for your job -- like, it's the actually the assembly the executors are using. On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote: Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar. jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native 6625 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefARPACK.class 21123 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefBLAS.class 178334 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefLAPACK.class 6640 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemARPACK.class 21138 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemBLAS.class 178349 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemLAPACK.class Also I see the following in /usr/lib64 ls /usr/lib64/libblas. libblas.a libblas.solibblas.so.3 libblas.so.3.2 libblas.so.3.2.1 ls /usr/lib64/liblapack liblapack.a liblapack_pic.a liblapack.so liblapack.so.3 liblapack.so.3.2liblapack.so.3.2.1 But I stil see the following in the Spark logs: 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Anything in this process I missed
What else is need to setup native support of BLAS/LAPACK with Spark?
Is there more documentation on what is needed to setup BLAS/LAPACK native suport with Spark. I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib classes are in the assembly jar. jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib | grep Native 6625 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefARPACK.class 21123 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefBLAS.class178334 Tue Jul 07 15:22:08 EDT 2015 com/github/fommil/netlib/NativeRefLAPACK.class 6640 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemARPACK.class 21138 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemBLAS.class178349 Tue Jul 07 15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemLAPACK.class Also I see the following in /usr/lib64 ls /usr/lib64/libblas. libblas.a libblas.solibblas.so.3 libblas.so.3.2 libblas.so.3.2.1 ls /usr/lib64/liblapack liblapack.a liblapack_pic.a liblapack.so liblapack.so.3 liblapack.so.3.2liblapack.so.3.2.1 But I stil see the following in the Spark logs: 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/07 15:36:25 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Anything in this process I missed? Thanks, Arun
Re: unable to bring up cluster with ec2 script
Sorry, I can't help with this issue, but if you are interested in a simple way to launch a Spark cluster on Amazon, Spark is now offered as an application in Amazon EMR. With this you can have a full cluster with a few clicks: https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/ - Arun On Tue, Jul 7, 2015 at 4:34 PM, Pagliari, Roberto rpagli...@appcomsci.com wrote: I'm following the tutorial about Apache Spark on EC2. The output is the following: $ ./spark-ec2 -i ../spark.pem -k spark --copy launch spark-training Setting up security groups... Searching for existing cluster spark-training... Latest Spark AMI: ami-19474270 Launching instances... Launched 5 slaves in us-east-1d, regid = r-59a0d4b6 Launched master in us-east-1d, regid = r-9ba2d674 Waiting for instances to start up... Waiting 120 more seconds... Copying SSH key ../spark.pem to master... ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port 22: Connection refused Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30 ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port 22: Connection refused Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30 ssh: Could not resolve hostname ec2-54-152-15-165.compute-1.amazonaws.com: Name or service not known Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30 ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port 22: Connection refused Traceback (most recent call last): File ./spark_ec2.py, line 925, in module main() File ./spark_ec2.py, line 766, in main setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True) File ./spark_ec2.py, line 406, in setup_cluster ssh(master, opts, 'mkdir -p ~/.ssh') File ./spark_ec2.py, line 712, in ssh raise e subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255 However, I can see the six instances created on my EC2 console, and I could even get the name of the master. I'm not sure how to fix the ssh issue (my region is US EST).
Re: Spark on YARN memory utilization
Hi Denny, This is due the spark.yarn.memoryOverhead parameter, depending on what version of Spark you are on the default of this may differ, but it should be the larger of 1024mb per executor or .07 * executorMemory. When you set executor memory, the yarn resource request is executorMemory + yarnOverhead. - Arun On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote: This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated in YARN via the various configurations. For example, if I spin up my cluster with 4GB with a different number of executors as noted below 4GB executor-memory x 10 executors = 46GB (4GB x 10 = 40 + 6) 4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3) 4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2) The pattern when observing the RM is that there is a container for each executor and one additional container. From the basis of memory, it looks like there is an additional (1GB + (0.5GB x # executors)) that is allocated in YARN. Just wondering why is this - or is this just an artifact of YARN itself? Thanks!
Re: spark-submit on YARN is slow
Hey Sandy, What are those sleeps for and do they still exist? We have seen about a 1min to 1:30 executor startup time, which is a large chunk for jobs that run in ~10min. Thanks, Arun On Fri, Dec 5, 2014 at 3:20 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Denny, Those sleeps were only at startup, so if jobs are taking significantly longer on YARN, that should be a different problem. When you ran on YARN, did you use the --executor-cores, --executor-memory, and --num-executors arguments? When running against a standalone cluster, by default Spark will make use of all the cluster resources, but when running against YARN, Spark defaults to a couple tiny executors. -Sandy On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote: My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps. If I was running this on standalone cluster mode the query finished in 55s but on YARN, the query was still running 30min later. Would the hard coded sleeps potentially be in play here? On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote: Hi Tobias, What version are you using? In some recent versions, we had a couple of large hardcoded sleeps on the Spark side. -Sandy On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote: Hey Tobias, As you suspect, the reason why it's slow is because the resource manager in YARN takes a while to grant resources. This is because YARN needs to first set up the application master container, and then this AM needs to request more containers for Spark executors. I think this accounts for most of the overhead. The remaining source probably comes from how our own YARN integration code polls application (every second) and cluster resource states (every 5 seconds IIRC). I haven't explored in detail whether there are optimizations there that can speed this up, but I believe most of the overhead comes from YARN itself. In other words, no I don't know of any quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp: Hi, I am using spark-submit to submit my application to YARN in yarn-cluster mode. I have both the Spark assembly jar file as well as my application jar file put in HDFS and can see from the logging output that both files are used from there. However, it still takes about 10 seconds for my application's yarnAppState to switch from ACCEPTED to RUNNING. I am aware that this is probably not a Spark issue, but some YARN configuration setting (or YARN-inherent slowness), I was just wondering if anyone has an advice for how to speed this up. Thanks Tobias
Re: Nightly releases
Great - what can we do to make this happen? So should I file a JIRA to track? Thanks, Arun On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash and...@andrewash.com wrote: I can see this being valuable for users wanting to live on the cutting edge without building CI infrastructure themselves, myself included. I think Patrick's recent work on the build scripts for 1.2.0 will make delivering nightly builds to a public maven repo easier. On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja aahuj...@gmail.com wrote: Of course we can run this as well to get the lastest, but the build is fairly long and this seems like a resource many would need. On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote: Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance improvements being commited to Spark and using the latest commits is useful (or even necessary for some jobs). Is there a place to post them, it doesn't seem like it would diffcult to run make-dist nightly and place it somwhere? Is is possible extract this from Jenkins bulds? Thanks, Arun
Re: Nightly releases
Great - posted here https://issues.apache.org/jira/browse/SPARK-4542 On Fri, Nov 21, 2014 at 1:03 PM, Andrew Ash and...@andrewash.com wrote: Yes you should file a Jira and echo it out here so others can follow and comment on it. Thanks Arun! On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja aahuj...@gmail.com wrote: Great - what can we do to make this happen? So should I file a JIRA to track? Thanks, Arun On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash and...@andrewash.com wrote: I can see this being valuable for users wanting to live on the cutting edge without building CI infrastructure themselves, myself included. I think Patrick's recent work on the build scripts for 1.2.0 will make delivering nightly builds to a public maven repo easier. On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja aahuj...@gmail.com wrote: Of course we can run this as well to get the lastest, but the build is fairly long and this seems like a resource many would need. On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote: Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance improvements being commited to Spark and using the latest commits is useful (or even necessary for some jobs). Is there a place to post them, it doesn't seem like it would diffcult to run make-dist nightly and place it somwhere? Is is possible extract this from Jenkins bulds? Thanks, Arun
Nightly releases
Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance improvements being commited to Spark and using the latest commits is useful (or even necessary for some jobs). Is there a place to post them, it doesn't seem like it would diffcult to run make-dist nightly and place it somwhere? Is is possible extract this from Jenkins bulds? Thanks, Arun
Re: Nightly releases
Of course we can run this as well to get the lastest, but the build is fairly long and this seems like a resource many would need. On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote: Are nightly releases posted anywhere? There are quite a few vital bugfixes and performance improvements being commited to Spark and using the latest commits is useful (or even necessary for some jobs). Is there a place to post them, it doesn't seem like it would diffcult to run make-dist nightly and place it somwhere? Is is possible extract this from Jenkins bulds? Thanks, Arun
Re: Increase Executor Memory on YARN
If you are using spark-submit with --master yarn you can also pass as a flag --executor-memory On Mon, Nov 10, 2014 at 8:58 AM, Mudassar Sarwar mudassar.sar...@northbaysolutions.net wrote: Hi, How can we increase the executor memory of a running spark cluster on YARN? We want to increase the executor memory on the addition of new nodes in the cluster. We are running spark version 1.0.2. Thanks Mudassar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Increase-Executor-Memory-on-YARN-tp18489.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Viewing web UI after fact
We are running our applications through YARN and are only somtimes seeing them into the History Server. Most do not seem to have the APPLICATION_COMPLETE file. Specifically any job that ends because of yarn application -kill does not show up. For other ones what would be a reason for them not to appear in the Spark UI? Is there any update on this? Thanks, Arun On Mon, Sep 15, 2014 at 4:10 AM, Grzegorz Białek grzegorz.bia...@codilime.com wrote: Hi Andrew, sorry for late response. Thank you very much for solving my problem. There was no APPLICATION_COMPLETE file in log directory due to not calling sc.stop() at the end of program. With stopping spark context everything works correctly, so thank you again. Best regards, Grzegorz On Fri, Sep 5, 2014 at 8:06 PM, Andrew Or and...@databricks.com wrote: Hi Grzegorz, Can you verify that there are APPLICATION_COMPLETE files in the event log directories? E.g. Does file:/tmp/spark-events/app-name-1234567890/APPLICATION_COMPLETE exist? If not, it could be that your application didn't call sc.stop(), so the ApplicationEnd event is not actually logged. The HistoryServer looks for this special file to identify applications to display. You could also try manually adding the APPLICATION_COMPLETE file to this directory; the HistoryServer should pick this up and display the application, though the information displayed will be incomplete because the log did not capture all the events (sc.stop() does a final close() on the file written). Andrew 2014-09-05 1:50 GMT-07:00 Grzegorz Białek grzegorz.bia...@codilime.com: Hi Andrew, thank you very much for your answer. Unfortunately it still doesn't work. I'm using Spark 1.0.0, and I start history server running sbin/start-history-server.sh dir, although I also set SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory in conf/spark-env.sh. I tried also other dir than /tmp/spark-events which have all possible permissions enabled. Also adding file: (and file://) didn't help - history server still shows: History Server Event Log Location: file:/tmp/spark-events/ No Completed Applications Found. Best regards, Grzegorz On Thu, Sep 4, 2014 at 8:20 PM, Andrew Or and...@databricks.com wrote: Hi Grzegorz, Sorry for the late response. Unfortunately, if the Master UI doesn't know about your applications (they are completed with respect to a different Master), then it can't regenerate the UIs even if the logs exist. You will have to use the history server for that. How did you start the history server? If you are using Spark =1.0, you can pass the directory as an argument to the sbin/start-history-server.sh script. Otherwise, you may need to set the following in your conf/spark-env.sh to specify the log directory: export SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory=/tmp/spark-events It could also be a permissions thing. Make sure your logs in /tmp/spark-events are accessible by the JVM that runs the history server. Also, there's a chance that /tmp/spark-events is interpreted as an HDFS path depending on which Spark version you're running. To resolve any ambiguity, you may set the log path to file:/tmp/spark-events instead. But first verify whether they actually exist. Let me know if you get it working, -Andrew 2014-08-19 8:23 GMT-07:00 Grzegorz Białek grzegorz.bia...@codilime.com : Hi, Is there any way view history of applications statistics in master ui after restarting master server? I have all logs ing /tmp/spark-events/ but when I start history server in this directory it says No Completed Applications Found. Maybe I could copy this logs to dir used by master server but I couldn't find any. Or maybe I'm doing something wrong launching history server. Do you have any idea how to solve it? Thanks, Grzegorz On Thu, Aug 14, 2014 at 10:53 AM, Grzegorz Białek grzegorz.bia...@codilime.com wrote: Hi, Thank you both for your answers. Browsing using Master UI works fine. Unfortunately History Server shows No Completed Applications Found even if logs exists under given directory, but using Master UI is enough for me. Best regards, Grzegorz On Wed, Aug 13, 2014 at 8:09 PM, Andrew Or and...@databricks.com wrote: The Spark UI isn't available through the same address; otherwise new applications won't be able to bind to it. Once the old application finishes, the standalone Master renders the after-the-fact application UI and exposes it under a different URL. To see this, go to the Master UI (master-url:8080) and click on your application in the Completed Applications table. 2014-08-13 10:56 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com: Take a look at http://spark.apache.org/docs/latest/monitoring.html -- you need to launch a history server to serve the logs. Matei On August 13, 2014 at 2:03:08 AM, grzegorz-bialek ( grzegorz.bia...@codilime.com) wrote: Hi, I wanted to access Spark web UI after application
Re: Larger heap leads to perf degradation due to GC
We have used the strategy that you suggested, Andrew - using many workers per machine and keeping the heaps small ( 20gb). Using a large heap resulted in workers hanging or not responding (leading to timeouts). The same dataset/job for us will fail (most often due to akka disassociated or fetch failures errors) with 10 cores / 100 executors, 60 gb per executor while succceed with 1 core / 1000 executors / 6gb per executor. When the job does succceed with more cores per executor and larger heap it is usually much slower than the smaller executors (the same 8-10 min job taking 15-20 min to complete) The unfortunate downside of this has been, we have had some large broadcast variables which may not fit into memory (and unnecessarily duplicated) when using the smaller executors. Most of this is anecdotal but for the most part we have had more success and consistency with more executors with smaller memory requirements. On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash and...@andrewash.com wrote: Hi Mingyu, Maybe we should be limiting our heaps to 32GB max and running multiple workers per machine to avoid large GC issues. For a 128GB memory, 32 core machine, this could look like: SPARK_WORKER_INSTANCES=4 SPARK_WORKER_MEMORY=32 SPARK_WORKER_CORES=8 Are people running with large (32GB+) executor heaps in production? I'd be curious to hear if so. Cheers! Andrew On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim m...@palantir.com wrote: This issue definitely needs more investigation, but I just wanted to quickly check if anyone has run into this problem or has general guidance around it. We’ve seen a performance degradation with a large heap on a simple map task (I.e. No shuffle). We’ve seen the slowness starting around from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the CPU usage, there were just a lot of GCs going on. Has anyone seen a similar problem? Thanks, Mingyu
Re: IOException running streaming job
We are also seeing this PARSING_ERROR(2) error due to Caused by: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:362) at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159) at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142) at com.esotericsoftware.kryo.io.Input.fill(Input.java:140) It usually comes from here at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) There seems to be an open issue to investigate this: https://issues.apache.org/jira/browse/SPARK-3630 On Tue, Sep 23, 2014 at 8:18 PM, Emil Gustafsson e...@cellfish.se wrote: I'm trying out some streaming with spark and I'm getting an error that puzzles me since I'm new to Spark. I get this error all the time but 1-2 batches in the stream are processed before the job stops. but never the complete job and often no batch is processed at all. I use Spark 1.1.0. The job is started with --master local[4]. The job is doing this: val conf = new SparkConf() .setAppName(My Application) val sc = new SparkContext(conf) val ssc = new StreamingContext(conf, Seconds(2)) val queue = new SynchronizedQueue[RDD[(Int, String)]]() val input = ssc.queueStream(queue) //val mapped = input.map(_._2) input.print() ssc.start() var last = 0 for (i - 1 to 5) { Thread.sleep(1000) if (i != 2) { //val casRdd = cr.where(id = 42 and t %d and t = %d.format(last, i)) var l = List[(Int, String)]() for (j - last + 1 to i) { l = l :+ (j, foo%d.format(i)) } l.foreach(x = println(*** %s.format(x))) val casRdd = sc.parallelize(l) //casRdd.foreach(println) last = i queue += casRdd } } Thread.sleep(1000) ssc.stop() The error stack I get is: 14/09/24 00:08:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.io.IOException: PARSING_ERROR(2) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84) at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594) at org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) at org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) at org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) at org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/09/24 00:08:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
java.io.IOException Error in task deserialization
Has anyone else seen this erorr in task deserialization? The task is processing a small amount of data and doesn't seem to have much data hanging to the closure? I've only seen this with Spark 1.1 Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, most recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): java.io.IOException: unexpected exception type java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744)
Re: java.io.IOException Error in task deserialization
No for me as well it is non-deterministic. It happens in a piece of code that does many filter and counts on a small set of records (~1k-10k). The originally set is persisted in memory and we have a Kryo serializer set for it. The task itself takes in just a few filtering parameters. This with the same setting has sometimes completed to sucess and sometimes failed during this step. Arun On Fri, Sep 26, 2014 at 1:32 PM, Brad Miller bmill...@eecs.berkeley.edu wrote: I've had multiple jobs crash due to java.io.IOException: unexpected exception type; I've been running the 1.1 branch for some time and am now running the 1.1 release binaries. Note that I only use PySpark. I haven't kept detailed notes or the tracebacks around since there are other problems that have caused my greater grief (namely key not found errors). For me the exception seems to occur non-deterministically, which is a bit interesting since the error message shows that the same stage has failed multiple times. Are you able to consistently re-produce the bug across multiple invocations at the same place? On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja aahuj...@gmail.com wrote: Has anyone else seen this erorr in task deserialization? The task is processing a small amount of data and doesn't seem to have much data hanging to the closure? I've only seen this with Spark 1.1 Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, most recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): java.io.IOException: unexpected exception type java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744)
Specifying Spark Executor Java options using Spark Submit
What is the proper way to specify java options for the Spark executors using spark-submit? We had done this previously using export SPARK_JAVA_OPTS='.. previously, for example to attach a debugger to each executor or add -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps On spark-submit I see --driver-java-options but is there an equivalent for individual executors? Thanks, Arun
TorrentBroadcast causes java.io.IOException: unexpected exception type
Since upgrading to Spark 1.1 we have been seeing the following error in the logs: 14/09/23 02:14:42 ERROR executor.Executor: Exception in task 1087.0 in stage 0.0 (TID 607) java.io.IOException: unexpected exception type at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Failed to get broadcast_3_piece0 of broadcast_3 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:124) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBlocks$1.apply(TorrentBroadcast.scala:104) Does anyone have some background on what change could have caused this? Is TorrentBroadcast now the default broadcast method? Thanks Arun
General question on persist
I have a general question on when persisting will be beneficial and when it won't: I have a task that runs as follow keyedRecordPieces = records.flatMap( record = Seq(key, recordPieces)) partitoned = keyedRecordPieces.partitionBy(KeyPartitioner) partitoned.mapPartitions(doComputation).save() Is there value in having a persist somewhere here? For example if the flatMap step is particularly expensive, will it ever be computed twice when there are no failures? Thanks Arun
Re: General question on persist
Thanks Liquan, that makes sense, but if I am only doin the computation once, there will essentially be no difference, correct? I had second question related to mapPartitions 1) All of the records of the Iterator[T] that a single function call in mapPartitions process must fit into memory, correct? 2) Is there someway to process that iterator in sorted order? Thanks! Arun On Tue, Sep 23, 2014 at 5:21 PM, Liquan Pei liquan...@gmail.com wrote: Hi Arun, The intermediate results like keyedRecordPieces will not be materialized. This indicates that if you run partitoned = keyedRecordPieces.partitionBy(KeyPartitioner) partitoned.mapPartitions(doComputation).save() again, the keyedRecordPieces will be re-computed . In this case, cache or persist keyedRecordPieces is a good idea to eliminate unnecessary expensive computation. What you can probably do is keyedRecordPieces = records.flatMap( record = Seq(key, recordPieces)).cache() Which will cache the RDD referenced by keyedRecordPieces in memory. For more options on cache and persist, take a look at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD. There are two APIs you can use to persist RDDs and one allows you to specify storage level. Thanks, Liquan On Tue, Sep 23, 2014 at 2:08 PM, Arun Ahuja aahuj...@gmail.com wrote: I have a general question on when persisting will be beneficial and when it won't: I have a task that runs as follow keyedRecordPieces = records.flatMap( record = Seq(key, recordPieces)) partitoned = keyedRecordPieces.partitionBy(KeyPartitioner) partitoned.mapPartitions(doComputation).save() Is there value in having a persist somewhere here? For example if the flatMap step is particularly expensive, will it ever be computed twice when there are no failures? Thanks Arun -- Liquan Pei Department of Physics University of Massachusetts Amherst
Input Field in Spark 1.1 Web UI
Is there more information on what the Input column on the Spark UI means? How is this computed? I am processing a fairly small (but zipped) file and see the value as [image: Inline image 1] This does not seem correct? Thanks, Arun
Re: Failed jobs show up as succeeded in YARN?
We see this all the time as well, I don't the believe there is much a relationship before the Spark job status and the what Yarn shows as the status. On Mon, Aug 11, 2014 at 3:17 PM, Shay Rojansky r...@roji.org wrote: Spark 1.0.2, Python, Cloudera 5.1 (Hadoop 2.3.0) It seems that Python jobs I'm sending to YARN show up as succeeded even if they failed... Am I doing something wrong, is this a known issue? Thanks, Shay
spark-submit with Yarn
Is there more documentation on using spark-submit with Yarn? Trying to launch a simple job does not seem to work. My run command is as follows: /opt/cloudera/parcels/CDH/bin/spark-submit \ --master yarn \ --deploy-mode client \ --executor-memory 10g \ --driver-memory 10g \ --num-executors 50 \ --class $MAIN_CLASS \ --verbose \ $JAR \ $@ The verbose logging correctly parses the arguments: System properties: spark.executor.memory - 10g spark.executor.instances - 50 SPARK_SUBMIT - true spark.master - yarn-client But when I view the job 4040 page, SparkUI, there is a single executor (just the driver node) and I see the following in enviroment spark.master - local[24] Also, when I run with yarn-cluster, how can I access the SparkUI page? Thanks, Arun
Re: spark-submit with Yarn
Yes, the application is overwriting it - I need to pass it as argument to the application otherwise it will be set as local. Thanks for the quick reply! Also, yes now the appTrackingUrl is set properly as well, before it just said unassigned. Thanks! Arun On Tue, Aug 19, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote: /opt/cloudera/parcels/CDH/bin/spark-submit \ --master yarn \ --deploy-mode client \ This should be enough. But when I view the job 4040 page, SparkUI, there is a single executor (just the driver node) and I see the following in enviroment spark.master - local[24] Hmmm. Are you sure the app itself is not overwriting spark.master before creating the SparkContext? That's the only explanation I can think of. Also, when I run with yarn-cluster, how can I access the SparkUI page? You can click on the link in the RM application list. The address is also printed to the AM logs, which are also available through the RM web ui. Finally, the link is printed to the output of the launcher process (look for appTrackingUrl). -- Marcelo
java.net.SocketTimeoutException: Read timed out and java.io.IOException: Filesystem closed on Spark 1.0
Hi all, I'm running a job that seems to continually fail with the following exception: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) ... org.apache.spark.executor.Executor.org http://org.apache.spark.executor.executor.org/ $apache$spark$executor$Executor$$updateDependencies(Executor.scala:330) This is running spark-assembly-1.0.0-hadoop2.3.0 through yarn. The only additional error I see is 14/06/20 10:44:15 WARN NewHadoopRDD: Exception in RecordReader.close() net.sf.samtools.util.RuntimeIOException: java.io.IOException: Filesystem closed I had thought this issue of the file system closed was resolved in https://issues.apache.org/jira/browse/SPARK-1676. I've also attempted to run under a single core to avoid this issue (which seems to help sometimes as this failure is intermittent) I saw a previous mail thread: http://apache-spark-user-list.1001560.n3.nabble.com/Filesystem-closed-while-running-spark-job-td4596.html a suggestion to disable caching? Anyone seen this before or know a resolution. As I mentioned this is intermittent as sometimes the job runs to completion, or sometimes fails in this way. Thanks, Arun
Re: Yarn configuration file doesn't work when run with yarn-client mode
I was actually able to get this to work. I was NOT setting the classpath properly originally. Simply running java -cp /etc/hadoop/conf/:yarn, hadoop jars com.domain.JobClass and setting yarn-client as the spark master worked for me. Originally I had not put the configuration on the classpath. Also, I used $SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant jars. The job properly connects to the am at the correct port. Is there any intuition on how spark executor map to yarn workers or how the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM? Thanks, Arun On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote: Hi Gaurav and Arun, Your settings seem reasonable; as long as YARN_CONF_DIR or HADOOP_CONF_DIR is properly set, the application should be able to find the correct RM port. Have you tried running the examples in yarn-client mode, and your custom application in yarn-standalone (now yarn-cluster) mode? 2014-05-20 5:17 GMT-07:00 gaurav.dasgupta gaurav.d...@gmail.com: Few more details I would like to provide (Sorry as I should have provided with the previous post): *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2) - Hadoop Version = 2.4.0 (Hortonworks) - I am trying to execute a Spark Streaming program* Because I am using Hortornworks Hadoop (HDP), YARN is configured with different port numbers than the default Apache's default configurations. For example, *resourcemanager.address* is IP:8050 in HDP whereas it defaults to IP:8032. When I run the Spark examples using bin/run-example, I can see in the console logs, that it is connecting to the right port configured by HDP, i.e., 8050. Please refer the below console log: */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar bin/run-example org.apache.spark.examples.HdfsTest yarn-client /user/root/test SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/05/20 06:55:29 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/20 06:55:29 INFO Remoting: Starting remoting 14/05/20 06:55:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@IP:60988] 14/05/20 06:55:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@lt;IP:60988] 14/05/20 06:55:29 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/20 06:55:29 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140520065529-924f 14/05/20 06:55:29 INFO storage.MemoryStore: MemoryStore started with capacity 4.2 GB. 14/05/20 06:55:29 INFO network.ConnectionManager: Bound socket to port 35359 with id = ConnectionManagerId(IP,35359) 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/20 06:55:29 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager IP:35359 with 4.2 GB RAM 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Registered BlockManager 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/05/20 06:55:29 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59418 14/05/20 06:55:29 INFO broadcast.HttpBroadcast: Broadcast server started at http://IP:59418 14/05/20 06:55:29 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/20 06:55:29 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-fc34fdc8-d940-420b-b184-fc7a8a65501a 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/05/20 06:55:29 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:53425 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/05/20 06:55:29 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null} 14/05/20 06:55:29 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null} 14/05/20 06:55:29 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null} 14/05/20 06:55:29 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null} 14/05/20 06:55:29 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null} 14/05/20 06:55:29 INFO handler.ContextHandler: started
Re: advice on maintaining a production spark cluster?
Hi Matei, Unfortunately, I don't have more detailed information, but we have seen the loss of workers in standalone mode as well. If a job is killed through CTRL-C we will often see in the Spark Master page the number of workers and cores decrease. They are still alive and well in the Cloudera Manager page, but not visible on the Spark master, simply restarting the workers usually resolves this, but we often seen workers disappear after a failed or killed job. If we see this occur again, I'll try and provide some logs. On Mon, May 19, 2014 at 10:51 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Which version is this with? I haven’t seen standalone masters lose workers. Is there other stuff on the machines that’s killing them, or what errors do you see? Matei On May 16, 2014, at 9:53 AM, Josh Marcus jmar...@meetup.com wrote: Hey folks, I'm wondering what strategies other folks are using for maintaining and monitoring the stability of stand-alone spark clusters. Our master very regularly loses workers, and they (as expected) never rejoin the cluster. This is the same behavior I've seen using akka cluster (if that's what spark is using in stand-alone mode) -- are there configuration options we could be setting to make the cluster more robust? We have a custom script which monitors the number of workers (through the web interface) and restarts the cluster when necessary, as well as resolving other issues we face (like spark shells left open permanently claiming resources), and it works, but it's no where close to a great solution. What are other folks doing? Is this something that other folks observe as well? I suspect that the loss of workers is tied to jobs that run out of memory on the client side or our use of very large broadcast variables, but I don't have an isolated test case. I'm open to general answers here: for example, perhaps we should simply be using mesos or yarn instead of stand-alone mode. --j
Re: Yarn configuration file doesn't work when run with yarn-client mode
Yes, we are on Spark 0.9.0 so that explains the first piece, thanks! Also, yes, I meant SPARK_WORKER_MEMORY. Thanks for the hierarchy. Similarly is there some best practice on setting SPARK_WORKER_INSTANCES and spark.default.parallelism? Thanks, Arun On Tue, May 20, 2014 at 3:04 PM, Andrew Or and...@databricks.com wrote: I'm assuming you're running Spark 0.9.x, because in the latest version of Spark you shouldn't have to add the HADOOP_CONF_DIR to the java class path manually. I tested this out on my own YARN cluster and was able to confirm that. In Spark 1.0, SPARK_MEM is deprecated and should not be used. Instead, you should set the per-executor memory through spark.executor.memory, which has the same effect but takes higher priority. By YARN_WORKER_MEM, do you mean SPARK_EXECUTOR_MEMORY? It also does the same thing. In Spark 1.0, the priority hierarchy is as follows: spark.executor.memory (set through spark-defaults.conf) SPARK_EXECUTOR_MEMORY SPARK_MEM (deprecated) In Spark 0.9, the hierarchy very similar: spark.executor.memory (set through SPARK_JAVA_OPTS in spark-env) SPARK_MEM For more information: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/configuration.html http://spark.apache.org/docs/0.9.1/configuration.html 2014-05-20 11:30 GMT-07:00 Arun Ahuja aahuj...@gmail.com: I was actually able to get this to work. I was NOT setting the classpath properly originally. Simply running java -cp /etc/hadoop/conf/:yarn, hadoop jars com.domain.JobClass and setting yarn-client as the spark master worked for me. Originally I had not put the configuration on the classpath. Also, I used $SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant jars. The job properly connects to the am at the correct port. Is there any intuition on how spark executor map to yarn workers or how the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM? Thanks, Arun On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote: Hi Gaurav and Arun, Your settings seem reasonable; as long as YARN_CONF_DIR or HADOOP_CONF_DIR is properly set, the application should be able to find the correct RM port. Have you tried running the examples in yarn-client mode, and your custom application in yarn-standalone (now yarn-cluster) mode? 2014-05-20 5:17 GMT-07:00 gaurav.dasgupta gaurav.d...@gmail.com: Few more details I would like to provide (Sorry as I should have provided with the previous post): *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2) - Hadoop Version = 2.4.0 (Hortonworks) - I am trying to execute a Spark Streaming program* Because I am using Hortornworks Hadoop (HDP), YARN is configured with different port numbers than the default Apache's default configurations. For example, *resourcemanager.address* is IP:8050 in HDP whereas it defaults to IP:8032. When I run the Spark examples using bin/run-example, I can see in the console logs, that it is connecting to the right port configured by HDP, i.e., 8050. Please refer the below console log: */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar bin/run-example org.apache.spark.examples.HdfsTest yarn-client /user/root/test SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/05/20 06:55:29 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/20 06:55:29 INFO Remoting: Starting remoting 14/05/20 06:55:29 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@IP:60988] 14/05/20 06:55:29 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@lt;IP:60988] 14/05/20 06:55:29 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/20 06:55:29 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140520065529-924f 14/05/20 06:55:29 INFO storage.MemoryStore: MemoryStore started with capacity 4.2 GB. 14/05/20 06:55:29 INFO network.ConnectionManager: Bound socket to port 35359 with id = ConnectionManagerId(IP,35359) 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/05/20 06:55:29 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager IP:35359 with 4.2 GB RAM 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Registered
Re: Yarn configuration file doesn't work when run with yarn-client mode
I am encountering the same thing. Basic yarn apps work as does the SparkPi example, but my custom application gives this result. I am using compute-classpath to create the proper classpath for my application, same with SparkPi - was there a resolution to this issue? Thanks, Arun On Wed, Feb 12, 2014 at 1:28 AM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all When I run my application with yarn-client mode, it seems that the system didn’t load my configuration file correctly, because the local app master always tries to register with RM via a default IP 14/02/12 05:00:23 INFO SparkContext: Added JAR target/scala-2.10/rec_system_2.10-1.0.jar at http://172.31.37.160:51750/jars/rec_system_2.10-1.0.jar with timestamp 1392181223818 14/02/12 05:00:24 INFO RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032 14/02/12 05:00:25 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/02/12 05:00:26 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/02/12 05:00:27 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) However, if I run in a standalone mode, everything works fine (YARN_CONF_DIR, SPARK_APP, SPARK_YARN_APP_JAR are all set correctly) is it a bug? Best, -- Nan Zhu