Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-24 Thread Arun Ahuja
Thanks for the additional info, I tried to follow that and went ahead and
directly added netlib to my application POM/JAR - that should be sufficient
to make it work? And that is at least definietely on the executor class
path? Still got the same warning, so not sure where else to take it.

Thanks for all the help everyone! But not sure worth still pursuing, not
sure what else to try.

Thanks,
Arun

On Tue, Jul 21, 2015 at 11:16 AM, Shivaram Venkataraman 
shiva...@eecs.berkeley.edu wrote:

 FWIW I've run into similar BLAS related problems before and wrote up a
 document on how to do this for Spark EC2 clusters at
 https://github.com/amplab/ml-matrix/blob/master/EC2.md -- Note that this
 works with a vanilla Spark build (you only need to link to netlib-lgpl in
 your App) but requires the app jar to be present on all the machines.

 Thanks
 Shivaram

 On Tue, Jul 21, 2015 at 7:37 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Yes, I imagine it's the driver's classpath -  I'm pulling those
 screenshots straight from the Spark UI environment page.  Is there
 somewhere else to grab the executor class path?

 Also, the warning is only printing once, so it's also not clear whether
 the warning is from the driver or exectuor, would you know?

 Thanks,
 Arun

 On Tue, Jul 21, 2015 at 7:52 AM, Sean Owen so...@cloudera.com wrote:

 Great, and that file exists on HDFS and is world readable? just
 double-checking.

 What classpath is this -- your driver or executor? this is the driver,
 no? I assume so just because it looks like it references the assembly you
 built locally and from which you're launching the driver.

 I think we're concerned with the executors and what they have on the
 classpath. I suspect there is still a problem somewhere in there.

 On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja aahuj...@gmail.com wrote:

 Cool, I tried that as well, and doesn't seem different:

 spark.yarn.jar seems set

 [image: Inline image 1]

 This actually doesn't change the classpath, not sure if it should:

 [image: Inline image 3]

 But same netlib warning.

 Thanks for the help!
 - Arun

 On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Can you try setting the spark.yarn.jar property to make sure it points
 to the jar you're thinking of?

 -Sandy

 On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com
 wrote:

 Yes, it's a YARN cluster and using spark-submit to run.  I have
 SPARK_HOME set to the directory above and using the spark-submit script
 from there.

 bin/spark-submit --master yarn-client --executor-memory 10g 
 --driver-memory 8g --num-executors 400 --executor-cores 1 --class 
 org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 
 --conf spark.storage.memoryFraction=0.15

 ​

 libgfortran.so.3 is also there

 ls  /usr/lib64/libgfortran.so.3
 /usr/lib64/libgfortran.so.3

 These are jniloader files in the jar

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep jniloader
 META-INF/maven/com.github.fommil/jniloader/
 META-INF/maven/com.github.fommil/jniloader/pom.xml
 META-INF/maven/com.github.fommil/jniloader/pom.properties

 ​

 Thanks,
 Arun

 On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com
 wrote:

 Make sure /usr/lib64 contains libgfortran.so.3; that's really the
 issue.

 I'm pretty sure the answer is 'yes', but, make sure the assembly has
 jniloader too. I don't see why it wouldn't, but, that's needed.

 What is your env like -- local, standalone, YARN? how are you
 running?
 Just want to make sure you are using this assembly across your
 cluster.

 On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com
 wrote:

 Hi Sean,

 Thanks for the reply! I did double-check that the jar is one I
 think I am running:

 [image: Inline image 2]

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep netlib | grep Native
 com/github/fommil/netlib/NativeRefARPACK.class
 com/github/fommil/netlib/NativeRefBLAS.class
 com/github/fommil/netlib/NativeRefLAPACK.class
 com/github/fommil/netlib/NativeSystemARPACK.class
 com/github/fommil/netlib/NativeSystemBLAS.class
 com/github/fommil/netlib/NativeSystemLAPACK.class

 Also, I checked the gfortran version on the cluster nodes and it is
 available and is 5.1

 $ gfortran --version
 GNU Fortran (GCC) 5.1.0
 Copyright (C) 2015 Free Software Foundation, Inc.

 and still see:

 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemBLAS
 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefBLAS
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemLAPACK
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefLAPACK

 ​

 Does anything need to be adjusted in my application POM

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-21 Thread Arun Ahuja
Yes, I imagine it's the driver's classpath -  I'm pulling those screenshots
straight from the Spark UI environment page.  Is there somewhere else to
grab the executor class path?

Also, the warning is only printing once, so it's also not clear whether the
warning is from the driver or exectuor, would you know?

Thanks,
Arun

On Tue, Jul 21, 2015 at 7:52 AM, Sean Owen so...@cloudera.com wrote:

 Great, and that file exists on HDFS and is world readable? just
 double-checking.

 What classpath is this -- your driver or executor? this is the driver, no?
 I assume so just because it looks like it references the assembly you built
 locally and from which you're launching the driver.

 I think we're concerned with the executors and what they have on the
 classpath. I suspect there is still a problem somewhere in there.

 On Mon, Jul 20, 2015 at 4:59 PM, Arun Ahuja aahuj...@gmail.com wrote:

 Cool, I tried that as well, and doesn't seem different:

 spark.yarn.jar seems set

 [image: Inline image 1]

 This actually doesn't change the classpath, not sure if it should:

 [image: Inline image 3]

 But same netlib warning.

 Thanks for the help!
 - Arun

 On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Can you try setting the spark.yarn.jar property to make sure it points
 to the jar you're thinking of?

 -Sandy

 On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Yes, it's a YARN cluster and using spark-submit to run.  I have
 SPARK_HOME set to the directory above and using the spark-submit script
 from there.

 bin/spark-submit --master yarn-client --executor-memory 10g 
 --driver-memory 8g --num-executors 400 --executor-cores 1 --class 
 org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 
 --conf spark.storage.memoryFraction=0.15

 ​

 libgfortran.so.3 is also there

 ls  /usr/lib64/libgfortran.so.3
 /usr/lib64/libgfortran.so.3

 These are jniloader files in the jar

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep jniloader
 META-INF/maven/com.github.fommil/jniloader/
 META-INF/maven/com.github.fommil/jniloader/pom.xml
 META-INF/maven/com.github.fommil/jniloader/pom.properties

 ​

 Thanks,
 Arun

 On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote:

 Make sure /usr/lib64 contains libgfortran.so.3; that's really the
 issue.

 I'm pretty sure the answer is 'yes', but, make sure the assembly has
 jniloader too. I don't see why it wouldn't, but, that's needed.

 What is your env like -- local, standalone, YARN? how are you running?
 Just want to make sure you are using this assembly across your cluster.

 On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com
 wrote:

 Hi Sean,

 Thanks for the reply! I did double-check that the jar is one I think
 I am running:

 [image: Inline image 2]

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep netlib | grep Native
 com/github/fommil/netlib/NativeRefARPACK.class
 com/github/fommil/netlib/NativeRefBLAS.class
 com/github/fommil/netlib/NativeRefLAPACK.class
 com/github/fommil/netlib/NativeSystemARPACK.class
 com/github/fommil/netlib/NativeSystemBLAS.class
 com/github/fommil/netlib/NativeSystemLAPACK.class

 Also, I checked the gfortran version on the cluster nodes and it is
 available and is 5.1

 $ gfortran --version
 GNU Fortran (GCC) 5.1.0
 Copyright (C) 2015 Free Software Foundation, Inc.

 and still see:

 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemBLAS
 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefBLAS
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemLAPACK
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefLAPACK

 ​

 Does anything need to be adjusted in my application POM?

 Thanks,
 Arun

 On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com
 wrote:

 Yes, that's most of the work, just getting the native libs into the
 assembly. netlib can find them from there even if you don't have BLAS
 libs on your OS, since it includes a reference implementation as a
 fallback.

 One common reason it won't load is not having libgfortran installed
 on
 your OSes though. It has to be 4.6+ too. That can't be shipped even
 in
 netlib and has to exist on your hosts.

 The other thing I'd double-check is whether you are really using this
 assembly you built for your job -- like, it's the actually the
 assembly the executors are using.


 On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com
 wrote:
  Is there more documentation on what is needed to setup BLAS/LAPACK
 native
  suport with Spark.
 
  I’ve built spark with the -Pnetlib-lgpl flag and see that the
 netlib classes
  are in the assembly jar

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-20 Thread Arun Ahuja
Cool, I tried that as well, and doesn't seem different:

spark.yarn.jar seems set

[image: Inline image 1]

This actually doesn't change the classpath, not sure if it should:

[image: Inline image 3]

But same netlib warning.

Thanks for the help!
- Arun

On Fri, Jul 17, 2015 at 3:18 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Can you try setting the spark.yarn.jar property to make sure it points to
 the jar you're thinking of?

 -Sandy

 On Fri, Jul 17, 2015 at 11:32 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Yes, it's a YARN cluster and using spark-submit to run.  I have
 SPARK_HOME set to the directory above and using the spark-submit script
 from there.

 bin/spark-submit --master yarn-client --executor-memory 10g --driver-memory 
 8g --num-executors 400 --executor-cores 1 --class 
 org.hammerlab.guacamole.Guacamole --conf spark.default.parallelism=4000 
 --conf spark.storage.memoryFraction=0.15

 ​

 libgfortran.so.3 is also there

 ls  /usr/lib64/libgfortran.so.3
 /usr/lib64/libgfortran.so.3

 These are jniloader files in the jar

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep jniloader
 META-INF/maven/com.github.fommil/jniloader/
 META-INF/maven/com.github.fommil/jniloader/pom.xml
 META-INF/maven/com.github.fommil/jniloader/pom.properties

 ​

 Thanks,
 Arun

 On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote:

 Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue.

 I'm pretty sure the answer is 'yes', but, make sure the assembly has
 jniloader too. I don't see why it wouldn't, but, that's needed.

 What is your env like -- local, standalone, YARN? how are you running?
 Just want to make sure you are using this assembly across your cluster.

 On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote:

 Hi Sean,

 Thanks for the reply! I did double-check that the jar is one I think I
 am running:

 [image: Inline image 2]

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep netlib | grep Native
 com/github/fommil/netlib/NativeRefARPACK.class
 com/github/fommil/netlib/NativeRefBLAS.class
 com/github/fommil/netlib/NativeRefLAPACK.class
 com/github/fommil/netlib/NativeSystemARPACK.class
 com/github/fommil/netlib/NativeSystemBLAS.class
 com/github/fommil/netlib/NativeSystemLAPACK.class

 Also, I checked the gfortran version on the cluster nodes and it is
 available and is 5.1

 $ gfortran --version
 GNU Fortran (GCC) 5.1.0
 Copyright (C) 2015 Free Software Foundation, Inc.

 and still see:

 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemBLAS
 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefBLAS
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemLAPACK
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefLAPACK

 ​

 Does anything need to be adjusted in my application POM?

 Thanks,
 Arun

 On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote:

 Yes, that's most of the work, just getting the native libs into the
 assembly. netlib can find them from there even if you don't have BLAS
 libs on your OS, since it includes a reference implementation as a
 fallback.

 One common reason it won't load is not having libgfortran installed on
 your OSes though. It has to be 4.6+ too. That can't be shipped even in
 netlib and has to exist on your hosts.

 The other thing I'd double-check is whether you are really using this
 assembly you built for your job -- like, it's the actually the
 assembly the executors are using.


 On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote:
  Is there more documentation on what is needed to setup BLAS/LAPACK
 native
  suport with Spark.
 
  I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib
 classes
  are in the assembly jar.
 
  jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar  | grep netlib
 | grep
  Native
6625 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefARPACK.class
   21123 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefBLAS.class
  178334 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefLAPACK.class
6640 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemARPACK.class
   21138 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemBLAS.class
  178349 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemLAPACK.class
 
  Also I see the following in /usr/lib64
 
  ls /usr/lib64/libblas.
  libblas.a libblas.solibblas.so.3  libblas.so.3.2
  libblas.so.3.2.1
 
  ls /usr/lib64/liblapack
  liblapack.a liblapack_pic.a liblapack.so
 liblapack.so.3
  liblapack.so.3.2liblapack.so.3.2.1

Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-17 Thread Arun Ahuja
Hi Sean,

Thanks for the reply! I did double-check that the jar is one I think I am
running:

[image: Inline image 2]

jar tf 
/hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
| grep netlib | grep Native
com/github/fommil/netlib/NativeRefARPACK.class
com/github/fommil/netlib/NativeRefBLAS.class
com/github/fommil/netlib/NativeRefLAPACK.class
com/github/fommil/netlib/NativeSystemARPACK.class
com/github/fommil/netlib/NativeSystemBLAS.class
com/github/fommil/netlib/NativeSystemLAPACK.class

Also, I checked the gfortran version on the cluster nodes and it is
available and is 5.1

$ gfortran --version
GNU Fortran (GCC) 5.1.0
Copyright (C) 2015 Free Software Foundation, Inc.

and still see:

15/07/17 13:20:53 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
15/07/17 13:20:53 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemLAPACK
15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeRefLAPACK

​

Does anything need to be adjusted in my application POM?

Thanks,
Arun

On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote:

 Yes, that's most of the work, just getting the native libs into the
 assembly. netlib can find them from there even if you don't have BLAS
 libs on your OS, since it includes a reference implementation as a
 fallback.

 One common reason it won't load is not having libgfortran installed on
 your OSes though. It has to be 4.6+ too. That can't be shipped even in
 netlib and has to exist on your hosts.

 The other thing I'd double-check is whether you are really using this
 assembly you built for your job -- like, it's the actually the
 assembly the executors are using.


 On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote:
  Is there more documentation on what is needed to setup BLAS/LAPACK native
  suport with Spark.
 
  I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib
 classes
  are in the assembly jar.
 
  jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar  | grep netlib |
 grep
  Native
6625 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefARPACK.class
   21123 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefBLAS.class
  178334 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefLAPACK.class
6640 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemARPACK.class
   21138 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemBLAS.class
  178349 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemLAPACK.class
 
  Also I see the following in /usr/lib64
 
  ls /usr/lib64/libblas.
  libblas.a libblas.solibblas.so.3  libblas.so.3.2
  libblas.so.3.2.1
 
  ls /usr/lib64/liblapack
  liblapack.a liblapack_pic.a liblapack.so
 liblapack.so.3
  liblapack.so.3.2liblapack.so.3.2.1
 
  But I stil see the following in the Spark logs:
 
  15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
  com.github.fommil.netlib.NativeSystemBLAS
  15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
  com.github.fommil.netlib.NativeRefBLAS
  15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
  com.github.fommil.netlib.NativeSystemLAPACK
  15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
  com.github.fommil.netlib.NativeRefLAPACK
 
  Anything in this process I missed?
 
  Thanks,
  Arun



Re: What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-17 Thread Arun Ahuja
Yes, it's a YARN cluster and using spark-submit to run.  I have SPARK_HOME
set to the directory above and using the spark-submit script from there.

bin/spark-submit --master yarn-client --executor-memory 10g
--driver-memory 8g --num-executors 400 --executor-cores 1 --class
org.hammerlab.guacamole.Guacamole --conf
spark.default.parallelism=4000 --conf
spark.storage.memoryFraction=0.15

​

libgfortran.so.3 is also there

ls  /usr/lib64/libgfortran.so.3
/usr/lib64/libgfortran.so.3

These are jniloader files in the jar

jar tf 
/hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
| grep jniloader
META-INF/maven/com.github.fommil/jniloader/
META-INF/maven/com.github.fommil/jniloader/pom.xml
META-INF/maven/com.github.fommil/jniloader/pom.properties

​

Thanks,
Arun

On Fri, Jul 17, 2015 at 1:30 PM, Sean Owen so...@cloudera.com wrote:

 Make sure /usr/lib64 contains libgfortran.so.3; that's really the issue.

 I'm pretty sure the answer is 'yes', but, make sure the assembly has
 jniloader too. I don't see why it wouldn't, but, that's needed.

 What is your env like -- local, standalone, YARN? how are you running?
 Just want to make sure you are using this assembly across your cluster.

 On Fri, Jul 17, 2015 at 6:26 PM, Arun Ahuja aahuj...@gmail.com wrote:

 Hi Sean,

 Thanks for the reply! I did double-check that the jar is one I think I am
 running:

 [image: Inline image 2]

 jar tf 
 /hpc/users/ahujaa01/src/spark/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar
  | grep netlib | grep Native
 com/github/fommil/netlib/NativeRefARPACK.class
 com/github/fommil/netlib/NativeRefBLAS.class
 com/github/fommil/netlib/NativeRefLAPACK.class
 com/github/fommil/netlib/NativeSystemARPACK.class
 com/github/fommil/netlib/NativeSystemBLAS.class
 com/github/fommil/netlib/NativeSystemLAPACK.class

 Also, I checked the gfortran version on the cluster nodes and it is
 available and is 5.1

 $ gfortran --version
 GNU Fortran (GCC) 5.1.0
 Copyright (C) 2015 Free Software Foundation, Inc.

 and still see:

 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemBLAS
 15/07/17 13:20:53 WARN BLAS: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefBLAS
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeSystemLAPACK
 15/07/17 13:20:53 WARN LAPACK: Failed to load implementation from: 
 com.github.fommil.netlib.NativeRefLAPACK

 ​

 Does anything need to be adjusted in my application POM?

 Thanks,
 Arun

 On Thu, Jul 16, 2015 at 5:26 PM, Sean Owen so...@cloudera.com wrote:

 Yes, that's most of the work, just getting the native libs into the
 assembly. netlib can find them from there even if you don't have BLAS
 libs on your OS, since it includes a reference implementation as a
 fallback.

 One common reason it won't load is not having libgfortran installed on
 your OSes though. It has to be 4.6+ too. That can't be shipped even in
 netlib and has to exist on your hosts.

 The other thing I'd double-check is whether you are really using this
 assembly you built for your job -- like, it's the actually the
 assembly the executors are using.


 On Tue, Jul 7, 2015 at 8:47 PM, Arun Ahuja aahuj...@gmail.com wrote:
  Is there more documentation on what is needed to setup BLAS/LAPACK
 native
  suport with Spark.
 
  I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib
 classes
  are in the assembly jar.
 
  jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar  | grep netlib |
 grep
  Native
6625 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefARPACK.class
   21123 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefBLAS.class
  178334 Tue Jul 07 15:22:08 EDT 2015
  com/github/fommil/netlib/NativeRefLAPACK.class
6640 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemARPACK.class
   21138 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemBLAS.class
  178349 Tue Jul 07 15:22:10 EDT 2015
  com/github/fommil/netlib/NativeSystemLAPACK.class
 
  Also I see the following in /usr/lib64
 
  ls /usr/lib64/libblas.
  libblas.a libblas.solibblas.so.3  libblas.so.3.2
  libblas.so.3.2.1
 
  ls /usr/lib64/liblapack
  liblapack.a liblapack_pic.a liblapack.so
 liblapack.so.3
  liblapack.so.3.2liblapack.so.3.2.1
 
  But I stil see the following in the Spark logs:
 
  15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
  com.github.fommil.netlib.NativeSystemBLAS
  15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
  com.github.fommil.netlib.NativeRefBLAS
  15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
  com.github.fommil.netlib.NativeSystemLAPACK
  15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
  com.github.fommil.netlib.NativeRefLAPACK
 
  Anything in this process I missed

What else is need to setup native support of BLAS/LAPACK with Spark?

2015-07-07 Thread Arun Ahuja
Is there more documentation on what is needed to setup BLAS/LAPACK native
suport with Spark.

I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib
classes are in the assembly jar.

jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar  | grep netlib |
grep Native
  6625 Tue Jul 07 15:22:08 EDT 2015
com/github/fommil/netlib/NativeRefARPACK.class
 21123 Tue Jul 07 15:22:08 EDT 2015
com/github/fommil/netlib/NativeRefBLAS.class178334 Tue Jul 07 15:22:08
EDT 2015 com/github/fommil/netlib/NativeRefLAPACK.class
  6640 Tue Jul 07 15:22:10 EDT 2015
com/github/fommil/netlib/NativeSystemARPACK.class
 21138 Tue Jul 07 15:22:10 EDT 2015
com/github/fommil/netlib/NativeSystemBLAS.class178349 Tue Jul 07
15:22:10 EDT 2015 com/github/fommil/netlib/NativeSystemLAPACK.class

Also I see the following in /usr/lib64

 ls /usr/lib64/libblas.
libblas.a libblas.solibblas.so.3  libblas.so.3.2
 libblas.so.3.2.1

 ls /usr/lib64/liblapack
liblapack.a liblapack_pic.a liblapack.so
liblapack.so.3  liblapack.so.3.2liblapack.so.3.2.1

But I stil see the following in the Spark logs:

15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
15/07/07 15:36:25 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemLAPACK
15/07/07 15:36:26 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeRefLAPACK

​
Anything in this process I missed?

Thanks,
Arun


Re: unable to bring up cluster with ec2 script

2015-07-07 Thread Arun Ahuja
Sorry, I can't help with this issue, but if you are interested in a simple
way to launch a Spark cluster on Amazon, Spark is now offered as an
application in Amazon EMR.  With this you can have a full cluster with a
few clicks:

https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/

- Arun

On Tue, Jul 7, 2015 at 4:34 PM, Pagliari, Roberto rpagli...@appcomsci.com
wrote:





 I'm following the tutorial about Apache Spark on EC2. The output is the
 following:





 $ ./spark-ec2 -i ../spark.pem -k spark --copy launch spark-training

 Setting up security groups...

 Searching for existing cluster spark-training...

 Latest Spark AMI: ami-19474270

 Launching instances...

 Launched 5 slaves in us-east-1d, regid = r-59a0d4b6

 Launched master in us-east-1d, regid = r-9ba2d674

 Waiting for instances to start up...

 Waiting 120 more seconds...

 Copying SSH key ../spark.pem to master...

 ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port
 22: Connection refused

 Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no
 -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p
 ~/.ssh'' returned non-zero exit status 255, sleeping 30

 ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port
 22: Connection refused

 Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no
 -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p
 ~/.ssh'' returned non-zero exit status 255, sleeping 30

 ssh: Could not resolve hostname
 ec2-54-152-15-165.compute-1.amazonaws.com: Name or service not known

 Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no
 -i ../spark.pem r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p
 ~/.ssh'' returned non-zero exit status 255, sleeping 30

 ssh: connect to host ec2-54-152-15-165.compute-1.amazonaws.com port
 22: Connection refused

Traceback (most recent call last):

   File ./spark_ec2.py, line 925, in module

 main()

   File ./spark_ec2.py, line 766, in main

 setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts,
 True)

   File ./spark_ec2.py, line 406, in setup_cluster

 ssh(master, opts, 'mkdir -p ~/.ssh')

   File ./spark_ec2.py, line 712, in ssh

 raise e

 subprocess.CalledProcessError: Command 'ssh -t -o
 StrictHostKeyChecking=no -i ../spark.pem
 r...@ec2-54-152-15-165.compute-1.amazonaws.com 'mkdir -p ~/.ssh''
 returned non-zero exit status 255





 However, I can see the six instances created on my EC2 console, and I
 could even get the name of the master. I'm not sure how to fix the ssh
 issue (my region is US EST).





Re: Spark on YARN memory utilization

2014-12-06 Thread Arun Ahuja
Hi Denny,

This is due the spark.yarn.memoryOverhead parameter, depending on what
version of Spark you are on the default of this may differ, but it should
be the larger of 1024mb per executor or .07 * executorMemory.

When you set executor memory, the yarn resource request is executorMemory +
yarnOverhead.

- Arun

On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote:

 This is perhaps more of a YARN question than a Spark question but i was
 just curious to how is memory allocated in YARN via the various
 configurations.  For example, if I spin up my cluster with 4GB with a
 different number of executors as noted below

  4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
  4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
  4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)

 The pattern when observing the RM is that there is a container for each
 executor and one additional container.  From the basis of memory, it looks
 like there is an additional (1GB + (0.5GB x # executors)) that is allocated
 in YARN.

 Just wondering why is this  - or is this just an artifact of YARN itself?

 Thanks!




Re: spark-submit on YARN is slow

2014-12-05 Thread Arun Ahuja
Hey Sandy,

What are those sleeps for and do they still exist?  We have seen about a
1min to 1:30 executor startup time, which is a large chunk for jobs that
run in ~10min.

Thanks,
Arun

On Fri, Dec 5, 2014 at 3:20 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Denny,

 Those sleeps were only at startup, so if jobs are taking significantly
 longer on YARN, that should be a different problem.  When you ran on YARN,
 did you use the --executor-cores, --executor-memory, and --num-executors
 arguments?  When running against a standalone cluster, by default Spark
 will make use of all the cluster resources, but when running against YARN,
 Spark defaults to a couple tiny executors.

 -Sandy

 On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote:

 My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
 steps. If I was running this on standalone cluster mode the query finished
 in 55s but on YARN, the query was still running 30min later. Would the hard
 coded sleeps potentially be in play here?
 On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Tobias,

 What version are you using?  In some recent versions, we had a couple of
 large hardcoded sleeps on the Spark side.

 -Sandy

 On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com
 wrote:

 Hey Tobias,

 As you suspect, the reason why it's slow is because the resource
 manager in YARN takes a while to grant resources. This is because YARN
 needs to first set up the application master container, and then this AM
 needs to request more containers for Spark executors. I think this accounts
 for most of the overhead. The remaining source probably comes from how our
 own YARN integration code polls application (every second) and cluster
 resource states (every 5 seconds IIRC). I haven't explored in detail
 whether there are optimizations there that can speed this up, but I believe
 most of the overhead comes from YARN itself.

 In other words, no I don't know of any quick fix on your end that you
 can do to speed this up.

 -Andrew


 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer t...@preferred.jp:

 Hi,

 I am using spark-submit to submit my application to YARN in
 yarn-cluster mode. I have both the Spark assembly jar file as well as my
 application jar file put in HDFS and can see from the logging output that
 both files are used from there. However, it still takes about 10 seconds
 for my application's yarnAppState to switch from ACCEPTED to RUNNING.

 I am aware that this is probably not a Spark issue, but some YARN
 configuration setting (or YARN-inherent slowness), I was just wondering if
 anyone has an advice for how to speed this up.

 Thanks
 Tobias







Re: Nightly releases

2014-11-21 Thread Arun Ahuja
Great - what can we do to make this happen?  So should I file a JIRA to
track?

Thanks,

Arun

On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash and...@andrewash.com wrote:

 I can see this being valuable for users wanting to live on the cutting
 edge without building CI infrastructure themselves, myself included.  I
 think Patrick's recent work on the build scripts for 1.2.0 will make
 delivering nightly builds to a public maven repo easier.

 On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Of course we can run this as well to get the lastest, but the build is
 fairly long and this seems like a resource many would need.

 On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Are nightly releases posted anywhere?  There are quite a few vital
 bugfixes and performance improvements being commited to Spark and using the
 latest commits is useful (or even necessary for some jobs).

 Is there a place to post them, it doesn't seem like it would diffcult to
 run make-dist nightly and place it somwhere?

 Is is possible extract this from Jenkins bulds?

 Thanks,
 Arun
  ​






Re: Nightly releases

2014-11-21 Thread Arun Ahuja
Great - posted here https://issues.apache.org/jira/browse/SPARK-4542

On Fri, Nov 21, 2014 at 1:03 PM, Andrew Ash and...@andrewash.com wrote:

 Yes you should file a Jira and echo it out here so others can follow and
 comment on it.  Thanks Arun!

 On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja aahuj...@gmail.com wrote:

 Great - what can we do to make this happen?  So should I file a JIRA to
 track?

 Thanks,

 Arun

 On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash and...@andrewash.com
 wrote:

 I can see this being valuable for users wanting to live on the cutting
 edge without building CI infrastructure themselves, myself included.  I
 think Patrick's recent work on the build scripts for 1.2.0 will make
 delivering nightly builds to a public maven repo easier.

 On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Of course we can run this as well to get the lastest, but the build is
 fairly long and this seems like a resource many would need.

 On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com
 wrote:

 Are nightly releases posted anywhere?  There are quite a few vital
 bugfixes and performance improvements being commited to Spark and using 
 the
 latest commits is useful (or even necessary for some jobs).

 Is there a place to post them, it doesn't seem like it would diffcult
 to run make-dist nightly and place it somwhere?

 Is is possible extract this from Jenkins bulds?

 Thanks,
 Arun
  ​








Nightly releases

2014-11-18 Thread Arun Ahuja
Are nightly releases posted anywhere?  There are quite a few vital bugfixes
and performance improvements being commited to Spark and using the latest
commits is useful (or even necessary for some jobs).

Is there a place to post them, it doesn't seem like it would diffcult to
run make-dist nightly and place it somwhere?

Is is possible extract this from Jenkins bulds?

Thanks,
Arun
 ​


Re: Nightly releases

2014-11-18 Thread Arun Ahuja
Of course we can run this as well to get the lastest, but the build is
fairly long and this seems like a resource many would need.

On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Are nightly releases posted anywhere?  There are quite a few vital
 bugfixes and performance improvements being commited to Spark and using the
 latest commits is useful (or even necessary for some jobs).

 Is there a place to post them, it doesn't seem like it would diffcult to
 run make-dist nightly and place it somwhere?

 Is is possible extract this from Jenkins bulds?

 Thanks,
 Arun
  ​



Re: Increase Executor Memory on YARN

2014-11-10 Thread Arun Ahuja
If you are using spark-submit with --master yarn you can also pass as a
flag --executor-memory 
​

On Mon, Nov 10, 2014 at 8:58 AM, Mudassar Sarwar 
mudassar.sar...@northbaysolutions.net wrote:

 Hi,

 How can we increase the executor memory of a running spark cluster on YARN?
 We want to increase the executor memory on the addition of new nodes in the
 cluster. We are running spark version 1.0.2.

 Thanks
 Mudassar



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Increase-Executor-Memory-on-YARN-tp18489.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Viewing web UI after fact

2014-11-07 Thread Arun Ahuja
We are running our applications through YARN and are only somtimes seeing
them into the History Server.  Most do not seem to have the
APPLICATION_COMPLETE file.  Specifically any job that ends because of yarn
application -kill does not show up.  For other ones what would be a reason
for them not to appear in the Spark UI?  Is there any update on this?

Thanks,
Arun

On Mon, Sep 15, 2014 at 4:10 AM, Grzegorz Białek 
grzegorz.bia...@codilime.com wrote:

 Hi Andrew,

 sorry for late response. Thank you very much for solving my problem. There
 was no APPLICATION_COMPLETE file in log directory due to not calling
 sc.stop() at the end of program. With stopping spark context everything
 works correctly, so thank you again.

 Best regards,
 Grzegorz


 On Fri, Sep 5, 2014 at 8:06 PM, Andrew Or and...@databricks.com wrote:

 Hi Grzegorz,

 Can you verify that there are APPLICATION_COMPLETE files in the event
 log directories? E.g. Does
 file:/tmp/spark-events/app-name-1234567890/APPLICATION_COMPLETE exist? If
 not, it could be that your application didn't call sc.stop(), so the
 ApplicationEnd event is not actually logged. The HistoryServer looks for
 this special file to identify applications to display. You could also try
 manually adding the APPLICATION_COMPLETE file to this directory; the
 HistoryServer should pick this up and display the application, though the
 information displayed will be incomplete because the log did not capture
 all the events (sc.stop() does a final close() on the file written).

 Andrew


 2014-09-05 1:50 GMT-07:00 Grzegorz Białek grzegorz.bia...@codilime.com:

 Hi Andrew,

 thank you very much for your answer. Unfortunately it still doesn't
 work. I'm using Spark 1.0.0, and I start history server running
 sbin/start-history-server.sh dir, although I also set
  SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory in
 conf/spark-env.sh. I tried also other dir than /tmp/spark-events which
 have all possible permissions enabled. Also adding file: (and file://)
 didn't help - history server still shows:
 History Server
 Event Log Location: file:/tmp/spark-events/
 No Completed Applications Found.

 Best regards,
 Grzegorz


 On Thu, Sep 4, 2014 at 8:20 PM, Andrew Or and...@databricks.com wrote:

 Hi Grzegorz,

 Sorry for the late response. Unfortunately, if the Master UI doesn't
 know about your applications (they are completed with respect to a
 different Master), then it can't regenerate the UIs even if the logs exist.
 You will have to use the history server for that.

 How did you start the history server? If you are using Spark =1.0, you
 can pass the directory as an argument to the sbin/start-history-server.sh
 script. Otherwise, you may need to set the following in your
 conf/spark-env.sh to specify the log directory:

 export
 SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory=/tmp/spark-events

 It could also be a permissions thing. Make sure your logs in
 /tmp/spark-events are accessible by the JVM that runs the history server.
 Also, there's a chance that /tmp/spark-events is interpreted as an HDFS
 path depending on which Spark version you're running. To resolve any
 ambiguity, you may set the log path to file:/tmp/spark-events instead.
 But first verify whether they actually exist.

 Let me know if you get it working,
 -Andrew



 2014-08-19 8:23 GMT-07:00 Grzegorz Białek grzegorz.bia...@codilime.com
 :

 Hi,
 Is there any way view history of applications statistics in master ui
 after restarting master server? I have all logs ing /tmp/spark-events/ but
 when I start history server in this directory it says No Completed
 Applications Found. Maybe I could copy this logs to dir used by master
 server but I couldn't find any. Or maybe I'm doing something wrong
 launching history server.
 Do you have any idea how to solve it?

 Thanks,
 Grzegorz


 On Thu, Aug 14, 2014 at 10:53 AM, Grzegorz Białek 
 grzegorz.bia...@codilime.com wrote:

 Hi,

 Thank you both for your answers. Browsing using Master UI works fine.
 Unfortunately History Server shows No Completed Applications Found even
 if logs exists under given directory, but using Master UI is enough for 
 me.

 Best regards,
 Grzegorz



 On Wed, Aug 13, 2014 at 8:09 PM, Andrew Or and...@databricks.com
 wrote:

 The Spark UI isn't available through the same address; otherwise new
 applications won't be able to bind to it. Once the old application
 finishes, the standalone Master renders the after-the-fact application 
 UI
 and exposes it under a different URL. To see this, go to the Master UI
 (master-url:8080) and click on your application in the Completed
 Applications table.


 2014-08-13 10:56 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com:

 Take a look at http://spark.apache.org/docs/latest/monitoring.html
 -- you need to launch a history server to serve the logs.

 Matei

 On August 13, 2014 at 2:03:08 AM, grzegorz-bialek (
 grzegorz.bia...@codilime.com) wrote:

 Hi,
 I wanted to access Spark web UI after application 

Re: Larger heap leads to perf degradation due to GC

2014-10-06 Thread Arun Ahuja
We have used the strategy that you suggested, Andrew - using many workers
per machine and keeping the heaps small ( 20gb).

Using a large heap resulted in workers hanging or not responding (leading
to timeouts).  The same dataset/job for us will fail (most often due to
akka disassociated or fetch failures errors) with 10 cores / 100 executors,
60 gb per executor while succceed with 1 core / 1000 executors / 6gb per
executor.

When the job does succceed with more cores per executor and larger heap it
is usually much slower than the smaller executors (the same 8-10 min job
taking 15-20 min to complete)

The unfortunate downside of this has been, we have had some large broadcast
variables which may not fit into memory (and unnecessarily duplicated) when
using the smaller executors.

Most of this is anecdotal but for the most part we have had more success
and consistency with more executors with smaller memory requirements.

On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash and...@andrewash.com wrote:

 Hi Mingyu,

 Maybe we should be limiting our heaps to 32GB max and running multiple
 workers per machine to avoid large GC issues.

 For a 128GB memory, 32 core machine, this could look like:

 SPARK_WORKER_INSTANCES=4
 SPARK_WORKER_MEMORY=32
 SPARK_WORKER_CORES=8

 Are people running with large (32GB+) executor heaps in production?  I'd
 be curious to hear if so.

 Cheers!
 Andrew

 On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim m...@palantir.com wrote:

 This issue definitely needs more investigation, but I just wanted to
 quickly check if anyone has run into this problem or has general guidance
 around it. We’ve seen a performance degradation with a large heap on a
 simple map task (I.e. No shuffle). We’ve seen the slowness starting around
 from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the
 CPU usage, there were just a lot of GCs going on.

 Has anyone seen a similar problem?

 Thanks,
 Mingyu





Re: IOException running streaming job

2014-09-29 Thread Arun Ahuja
We are also seeing this PARSING_ERROR(2) error due to

Caused by: java.io.IOException: failed to uncompress the chunk:
PARSING_ERROR(2)
at
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:362)
at
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:159)
at
org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:140)

It usually comes from here

at
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)

There seems to be an open issue to investigate this:
https://issues.apache.org/jira/browse/SPARK-3630



On Tue, Sep 23, 2014 at 8:18 PM, Emil Gustafsson e...@cellfish.se wrote:

 I'm trying out some streaming with spark and I'm getting an error that
 puzzles me since I'm new to Spark. I get this error all the time but 1-2
 batches in the stream are processed before the job stops. but never the
 complete job and often no batch is processed at all. I use Spark 1.1.0.

 The job is started with --master local[4].
 The job is doing this:
 val conf = new SparkConf()
 .setAppName(My Application)

 val sc = new SparkContext(conf)
 val ssc = new StreamingContext(conf, Seconds(2))
 val queue = new SynchronizedQueue[RDD[(Int, String)]]()
 val input = ssc.queueStream(queue)
 //val mapped = input.map(_._2)

 input.print()
 ssc.start()

 var last = 0
 for (i - 1 to 5) {
 Thread.sleep(1000)
 if (i != 2) {
 //val casRdd = cr.where(id = 42 and t  %d and t =
 %d.format(last, i))
 var l = List[(Int, String)]()
 for (j - last + 1 to i) {
 l = l :+ (j, foo%d.format(i))
 }
 l.foreach(x = println(*** %s.format(x)))
 val casRdd = sc.parallelize(l)
 //casRdd.foreach(println)
 last = i
 queue += casRdd
 }
 }

 Thread.sleep(1000)
 ssc.stop()


 The error stack I get is:
 14/09/24 00:08:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
 0)
 java.io.IOException: PARSING_ERROR(2)
 at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
 at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
 at
 org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
 at
 org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
 at
 org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
 at
 org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:170)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 14/09/24 00:08:56 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
 localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 

java.io.IOException Error in task deserialization

2014-09-26 Thread Arun Ahuja
Has anyone else seen this erorr in task deserialization?  The task is
processing a small amount of data and doesn't seem to have much data
hanging to the closure?  I've only seen this with Spark 1.1

Job aborted due to stage failure: Task 975 in stage 8.0 failed 4
times, most recent failure: Lost task 975.3 in stage 8.0 (TID 24777,
host.com): java.io.IOException: unexpected exception type

java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)


Re: java.io.IOException Error in task deserialization

2014-09-26 Thread Arun Ahuja
No for me as well it is non-deterministic.  It happens in a piece of code
that does many filter and counts on a small set of records (~1k-10k).  The
originally set is persisted in memory and we have a Kryo serializer set for
it.  The task itself takes in just a few filtering parameters.  This with
the same setting has sometimes completed to sucess and sometimes failed
during this step.

Arun

On Fri, Sep 26, 2014 at 1:32 PM, Brad Miller bmill...@eecs.berkeley.edu
wrote:

 I've had multiple jobs crash due to java.io.IOException: unexpected
 exception type; I've been running the 1.1 branch for some time and am now
 running the 1.1 release binaries. Note that I only use PySpark. I haven't
 kept detailed notes or the tracebacks around since there are other problems
 that have caused my greater grief (namely key not found errors).

 For me the exception seems to occur non-deterministically, which is a bit
 interesting since the error message shows that the same stage has failed
 multiple times.  Are you able to consistently re-produce the bug across
 multiple invocations at the same place?

 On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja aahuj...@gmail.com wrote:

 Has anyone else seen this erorr in task deserialization?  The task is
 processing a small amount of data and doesn't seem to have much data
 hanging to the closure?  I've only seen this with Spark 1.1

 Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 times, most 
 recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): 
 java.io.IOException: unexpected exception type
 
 java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)





Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Arun Ahuja
What is the proper way to specify java options for the Spark executors
using spark-submit?  We had done this previously using

export SPARK_JAVA_OPTS='..

previously, for example to attach a debugger to each executor or add
-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

On spark-submit I see --driver-java-options but is there an equivalent for
individual executors?

Thanks,
Arun


TorrentBroadcast causes java.io.IOException: unexpected exception type

2014-09-23 Thread Arun Ahuja
Since upgrading to Spark 1.1 we have been seeing the following error in the
logs:

14/09/23 02:14:42 ERROR executor.Executor: Exception in task 1087.0 in
stage 0.0 (TID 607)
java.io.IOException: unexpected exception type
at
java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Failed to get
broadcast_3_piece0 of broadcast_3
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:124)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBlocks$1.apply(TorrentBroadcast.scala:104)

Does anyone have some background on what change could have caused this?  Is
TorrentBroadcast now the default broadcast method?

Thanks

Arun


General question on persist

2014-09-23 Thread Arun Ahuja
I have a general question on when persisting will be beneficial and when it
won't:

I have a task that runs as follow

keyedRecordPieces  = records.flatMap( record = Seq(key, recordPieces))
partitoned = keyedRecordPieces.partitionBy(KeyPartitioner)

partitoned.mapPartitions(doComputation).save()

Is there value in having a persist somewhere here?  For example if the
flatMap step is particularly expensive, will it ever be computed twice when
there are no failures?

Thanks

Arun


Re: General question on persist

2014-09-23 Thread Arun Ahuja
Thanks Liquan, that makes sense, but if I am only doin the computation
once, there will essentially be no difference, correct?

I had second question related to mapPartitions
1) All of the records of the Iterator[T] that a single function call in
mapPartitions process must fit into memory, correct?
2) Is there someway to process that iterator in sorted order?

Thanks!

Arun

On Tue, Sep 23, 2014 at 5:21 PM, Liquan Pei liquan...@gmail.com wrote:

 Hi Arun,

 The intermediate results like keyedRecordPieces will not be
 materialized.  This indicates that if you run

 partitoned = keyedRecordPieces.partitionBy(KeyPartitioner)

 partitoned.mapPartitions(doComputation).save()

 again, the keyedRecordPieces will be re-computed . In this case, cache or
 persist keyedRecordPieces is a good idea to eliminate unnecessary expensive
 computation. What you can probably do is

 keyedRecordPieces  = records.flatMap( record = Seq(key,
 recordPieces)).cache()

 Which will cache the RDD referenced by keyedRecordPieces in memory. For
 more options on cache and persist, take a look at
 http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD.
 There are two APIs you can use to persist RDDs and one allows you to
 specify storage level.

 Thanks,
 Liquan



 On Tue, Sep 23, 2014 at 2:08 PM, Arun Ahuja aahuj...@gmail.com wrote:

 I have a general question on when persisting will be beneficial and when
 it won't:

 I have a task that runs as follow

 keyedRecordPieces  = records.flatMap( record = Seq(key, recordPieces))
 partitoned = keyedRecordPieces.partitionBy(KeyPartitioner)

 partitoned.mapPartitions(doComputation).save()

 Is there value in having a persist somewhere here?  For example if the
 flatMap step is particularly expensive, will it ever be computed twice when
 there are no failures?

 Thanks

 Arun




 --
 Liquan Pei
 Department of Physics
 University of Massachusetts Amherst



Input Field in Spark 1.1 Web UI

2014-09-08 Thread Arun Ahuja
Is there more information on what the Input column on the Spark UI means?
 How is this computed?  I am processing a fairly small (but zipped) file
and see the value as

[image: Inline image 1]

This does not seem correct?

Thanks,
Arun


Re: Failed jobs show up as succeeded in YARN?

2014-08-19 Thread Arun Ahuja
We see this all the time as well, I don't the believe there is much a
relationship before the Spark job status and the what Yarn shows as the
status.


On Mon, Aug 11, 2014 at 3:17 PM, Shay Rojansky r...@roji.org wrote:

 Spark 1.0.2, Python, Cloudera 5.1 (Hadoop 2.3.0)

 It seems that Python jobs I'm sending to YARN show up as succeeded even if
 they failed... Am I doing something wrong, is this a known issue?

 Thanks,

 Shay



spark-submit with Yarn

2014-08-19 Thread Arun Ahuja
Is there more documentation on using spark-submit with Yarn?  Trying to
launch a simple job does not seem to work.

My run command is as follows:

/opt/cloudera/parcels/CDH/bin/spark-submit \
--master yarn \
--deploy-mode client \
--executor-memory 10g \
--driver-memory 10g \
--num-executors 50 \
--class $MAIN_CLASS \
--verbose \
$JAR \
$@

The verbose logging correctly parses the arguments:

System properties:
spark.executor.memory - 10g
spark.executor.instances - 50
SPARK_SUBMIT - true
spark.master - yarn-client


But when I view the job 4040 page, SparkUI, there is a single executor
(just the driver node) and I see the following in enviroment

spark.master - local[24]

Also, when I run with yarn-cluster, how can I access the SparkUI page?

Thanks,
Arun


Re: spark-submit with Yarn

2014-08-19 Thread Arun Ahuja
Yes, the application is overwriting it - I need to pass it as argument to
the application otherwise it will be set as local.

Thanks for the quick reply!  Also, yes now the appTrackingUrl is set
properly as well, before it just said unassigned.

Thanks!
Arun


On Tue, Aug 19, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote:
  /opt/cloudera/parcels/CDH/bin/spark-submit \
  --master yarn \
  --deploy-mode client \

 This should be enough.

  But when I view the job 4040 page, SparkUI, there is a single executor
 (just
  the driver node) and I see the following in enviroment
 
  spark.master - local[24]

 Hmmm. Are you sure the app itself is not overwriting spark.master
 before creating the SparkContext? That's the only explanation I can
 think of.

  Also, when I run with yarn-cluster, how can I access the SparkUI page?

 You can click on the link in the RM application list. The address is
 also printed to the AM logs, which are also available through the RM
 web ui. Finally, the link is printed to the output of the launcher
 process (look for appTrackingUrl).


 --
 Marcelo



java.net.SocketTimeoutException: Read timed out and java.io.IOException: Filesystem closed on Spark 1.0

2014-06-20 Thread Arun Ahuja
Hi all,

I'm running a job that seems to continually fail with the following
exception:

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
...
org.apache.spark.executor.Executor.org
http://org.apache.spark.executor.executor.org/
$apache$spark$executor$Executor$$updateDependencies(Executor.scala:330)

This is running spark-assembly-1.0.0-hadoop2.3.0 through yarn.

The only additional error I see is
14/06/20 10:44:15 WARN NewHadoopRDD: Exception in RecordReader.close()
net.sf.samtools.util.RuntimeIOException: java.io.IOException: Filesystem
closed

I had thought this issue of the file system closed was resolved in
https://issues.apache.org/jira/browse/SPARK-1676.  I've also attempted to
run under a single core to avoid this issue (which seems to help sometimes
as this failure is intermittent)

I saw a previous mail thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Filesystem-closed-while-running-spark-job-td4596.html
a suggestion to disable caching?

Anyone seen this before or know a resolution.  As I mentioned this is
intermittent as sometimes the job runs to completion, or sometimes fails in
this way.

Thanks,
Arun


Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-20 Thread Arun Ahuja
I was actually able to get this to work.  I was NOT setting the classpath
properly originally.

Simply running
java -cp /etc/hadoop/conf/:yarn, hadoop jars com.domain.JobClass

and setting yarn-client as the spark master worked for me.  Originally I
had not put the configuration on the classpath. Also, I used
$SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant
jars.  The job properly connects to the am at the correct port.

Is there any intuition on how spark executor map to yarn workers or how the
different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM?

Thanks,
Arun


On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote:

 Hi Gaurav and Arun,

 Your settings seem reasonable; as long as YARN_CONF_DIR or HADOOP_CONF_DIR
 is properly set, the application should be able to find the correct RM
 port. Have you tried running the examples in yarn-client mode, and your
 custom application in yarn-standalone (now yarn-cluster) mode?



 2014-05-20 5:17 GMT-07:00 gaurav.dasgupta gaurav.d...@gmail.com:

 Few more details I would like to provide (Sorry as I should have provided
 with the previous post):

  *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2)
  - Hadoop Version = 2.4.0 (Hortonworks)
  - I am trying to execute a Spark Streaming program*

 Because I am using Hortornworks Hadoop (HDP), YARN is configured with
 different port numbers than the default Apache's default configurations.
 For
 example, *resourcemanager.address* is IP:8050 in HDP whereas it defaults
 to IP:8032.

 When I run the Spark examples using bin/run-example, I can see in the
 console logs, that it is connecting to the right port configured by HDP,
 i.e., 8050. Please refer the below console log:

 */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true

 SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar

 SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar
 bin/run-example org.apache.spark.examples.HdfsTest yarn-client
 /user/root/test
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in

 [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in

 [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/05/20 06:55:29 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/05/20 06:55:29 INFO Remoting: Starting remoting
 14/05/20 06:55:29 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://spark@IP:60988]
 14/05/20 06:55:29 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://spark@lt;IP:60988]
 14/05/20 06:55:29 INFO spark.SparkEnv: Registering BlockManagerMaster
 14/05/20 06:55:29 INFO storage.DiskBlockManager: Created local directory
 at
 /tmp/spark-local-20140520065529-924f
 14/05/20 06:55:29 INFO storage.MemoryStore: MemoryStore started with
 capacity 4.2 GB.
 14/05/20 06:55:29 INFO network.ConnectionManager: Bound socket to port
 35359
 with id = ConnectionManagerId(IP,35359)
 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Trying to register
 BlockManager
 14/05/20 06:55:29 INFO storage.BlockManagerMasterActor$BlockManagerInfo:
 Registering block manager IP:35359 with 4.2 GB RAM
 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Registered BlockManager
 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server
 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT
 14/05/20 06:55:29 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:59418
 14/05/20 06:55:29 INFO broadcast.HttpBroadcast: Broadcast server started
 at
 http://IP:59418
 14/05/20 06:55:29 INFO spark.SparkEnv: Registering MapOutputTracker
 14/05/20 06:55:29 INFO spark.HttpFileServer: HTTP File server directory is
 /tmp/spark-fc34fdc8-d940-420b-b184-fc7a8a65501a
 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server
 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT
 14/05/20 06:55:29 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:53425
 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 o.e.j.s.h.ContextHandler{/storage/rdd,null}
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 o.e.j.s.h.ContextHandler{/storage,null}
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 o.e.j.s.h.ContextHandler{/stages/stage,null}
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 o.e.j.s.h.ContextHandler{/stages/pool,null}
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 o.e.j.s.h.ContextHandler{/stages,null}
 14/05/20 06:55:29 INFO handler.ContextHandler: started
 

Re: advice on maintaining a production spark cluster?

2014-05-20 Thread Arun Ahuja
Hi Matei,

Unfortunately, I don't have more detailed information, but we have seen the
loss of workers in standalone mode as well.  If a job is killed through
CTRL-C we will often see in the Spark Master page the number of workers and
cores decrease.  They are still alive and well in the Cloudera Manager
page, but not visible on the Spark master, simply restarting the workers
usually resolves this, but we often seen workers disappear after a failed
or killed job.

If we see this occur again, I'll try and provide some logs.




On Mon, May 19, 2014 at 10:51 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 Which version is this with? I haven’t seen standalone masters lose
 workers. Is there other stuff on the machines that’s killing them, or what
 errors do you see?

 Matei

 On May 16, 2014, at 9:53 AM, Josh Marcus jmar...@meetup.com wrote:

  Hey folks,
 
  I'm wondering what strategies other folks are using for maintaining and
 monitoring the stability of stand-alone spark clusters.
 
  Our master very regularly loses workers, and they (as expected) never
 rejoin the cluster.  This is the same behavior I've seen
  using akka cluster (if that's what spark is using in stand-alone mode)
 -- are there configuration options we could be setting
  to make the cluster more robust?
 
  We have a custom script which monitors the number of workers (through
 the web interface) and restarts the cluster when
  necessary, as well as resolving other issues we face (like spark shells
 left open permanently claiming resources), and it
  works, but it's no where close to a great solution.
 
  What are other folks doing?  Is this something that other folks observe
 as well?  I suspect that the loss of workers is tied to
  jobs that run out of memory on the client side or our use of very large
 broadcast variables, but I don't have an isolated test case.
  I'm open to general answers here: for example, perhaps we should simply
 be using mesos or yarn instead of stand-alone mode.
 
  --j
 




Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-20 Thread Arun Ahuja
Yes, we are on Spark 0.9.0 so that explains the first piece, thanks!

Also, yes, I meant SPARK_WORKER_MEMORY.  Thanks for the hierarchy.
Similarly is there some best practice on setting SPARK_WORKER_INSTANCES and
spark.default.parallelism?

Thanks,
Arun


On Tue, May 20, 2014 at 3:04 PM, Andrew Or and...@databricks.com wrote:

 I'm assuming you're running Spark 0.9.x, because in the latest version of
 Spark you shouldn't have to add the HADOOP_CONF_DIR to the java class path
 manually. I tested this out on my own YARN cluster and was able to confirm
 that.

 In Spark 1.0, SPARK_MEM is deprecated and should not be used. Instead, you
 should set the per-executor memory through spark.executor.memory, which has
 the same effect but takes higher priority. By YARN_WORKER_MEM, do you mean
 SPARK_EXECUTOR_MEMORY? It also does the same thing. In Spark 1.0, the
 priority hierarchy is as follows:

 spark.executor.memory (set through spark-defaults.conf) 
 SPARK_EXECUTOR_MEMORY  SPARK_MEM (deprecated)

 In Spark 0.9, the hierarchy very similar:

 spark.executor.memory (set through SPARK_JAVA_OPTS in spark-env) 
 SPARK_MEM

 For more information:

 http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/configuration.html
 http://spark.apache.org/docs/0.9.1/configuration.html



 2014-05-20 11:30 GMT-07:00 Arun Ahuja aahuj...@gmail.com:

 I was actually able to get this to work.  I was NOT setting the classpath
 properly originally.

 Simply running
 java -cp /etc/hadoop/conf/:yarn, hadoop jars com.domain.JobClass

 and setting yarn-client as the spark master worked for me.  Originally I
 had not put the configuration on the classpath. Also, I used
 $SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant
 jars.  The job properly connects to the am at the correct port.

 Is there any intuition on how spark executor map to yarn workers or how
 the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM?

 Thanks,
 Arun


 On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote:

 Hi Gaurav and Arun,

 Your settings seem reasonable; as long as YARN_CONF_DIR or
 HADOOP_CONF_DIR is properly set, the application should be able to find the
 correct RM port. Have you tried running the examples in yarn-client mode,
 and your custom application in yarn-standalone (now yarn-cluster) mode?



 2014-05-20 5:17 GMT-07:00 gaurav.dasgupta gaurav.d...@gmail.com:

 Few more details I would like to provide (Sorry as I should have provided
 with the previous post):

  *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2)
  - Hadoop Version = 2.4.0 (Hortonworks)
  - I am trying to execute a Spark Streaming program*

 Because I am using Hortornworks Hadoop (HDP), YARN is configured with
 different port numbers than the default Apache's default
 configurations. For
 example, *resourcemanager.address* is IP:8050 in HDP whereas it
 defaults
 to IP:8032.

 When I run the Spark examples using bin/run-example, I can see in the
 console logs, that it is connecting to the right port configured by HDP,
 i.e., 8050. Please refer the below console log:

 */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true

 SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar

 SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar
 bin/run-example org.apache.spark.examples.HdfsTest yarn-client
 /user/root/test
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in

 [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in

 [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/05/20 06:55:29 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/05/20 06:55:29 INFO Remoting: Starting remoting
 14/05/20 06:55:29 INFO Remoting: Remoting started; listening on
 addresses
 :[akka.tcp://spark@IP:60988]
 14/05/20 06:55:29 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://spark@lt;IP:60988]
 14/05/20 06:55:29 INFO spark.SparkEnv: Registering BlockManagerMaster
 14/05/20 06:55:29 INFO storage.DiskBlockManager: Created local
 directory at
 /tmp/spark-local-20140520065529-924f
 14/05/20 06:55:29 INFO storage.MemoryStore: MemoryStore started with
 capacity 4.2 GB.
 14/05/20 06:55:29 INFO network.ConnectionManager: Bound socket to port
 35359
 with id = ConnectionManagerId(IP,35359)
 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Trying to register
 BlockManager
 14/05/20 06:55:29 INFO storage.BlockManagerMasterActor$BlockManagerInfo:
 Registering block manager IP:35359 with 4.2 GB RAM
 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Registered

Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-19 Thread Arun Ahuja
I am encountering the same thing.  Basic yarn apps work as does the SparkPi
example, but my custom application gives this result.  I am using
compute-classpath to create the proper classpath for my application, same
with SparkPi - was there a resolution to this issue?

Thanks,
Arun


On Wed, Feb 12, 2014 at 1:28 AM, Nan Zhu zhunanmcg...@gmail.com wrote:

  Hi, all

 When I run my application with  yarn-client mode, it seems that the system
 didn’t load my configuration file correctly, because the local app master
 always tries to register with RM via a default IP

 14/02/12 05:00:23 INFO SparkContext: Added JAR
 target/scala-2.10/rec_system_2.10-1.0.jar at
 http://172.31.37.160:51750/jars/rec_system_2.10-1.0.jar with timestamp
 1392181223818

 14/02/12 05:00:24 INFO RMProxy: Connecting to ResourceManager at /
 0.0.0.0:8032

 14/02/12 05:00:25 INFO Client: Retrying connect to server:
 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 14/02/12 05:00:26 INFO Client: Retrying connect to server:
 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 14/02/12 05:00:27 INFO Client: Retrying connect to server:
 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)


 However, if I run in a standalone mode, everything works fine
  (YARN_CONF_DIR, SPARK_APP, SPARK_YARN_APP_JAR are all set correctly)

 is it a bug?

 Best,

 --
 Nan Zhu