Re: Spark core maven error

2014-12-27 Thread Sean Owen
What is the full error? This doesn't specify much detail but could be due
to network problems for example.
On Dec 28, 2014 4:16 AM, "lalitagarw"  wrote:

> Hi All,
>
> I am using spark in a grails app and have added below maven details.
>
> compile group: 'org.apache.spark', name: 'spark-core_2.10', version:
> '1.2.0'
>
> It fails with below error:
>
> Resolve error obtaining dependencies: Failed to read artifact descriptor
> for
> org.apache.spark:spark-core_2.10:jar:1.2.0 (Use --stacktrace to see the
> full
> trace)
> Error |
> Resolve error obtaining dependencies: Failed to read artifact descriptor
> for
> org.apache.spark:spark-core_2.10:jar:1.2.0 (Use --stacktrace to see the
> full
> trace)
> Error |
> Resolve
>
> However,
> compile group: 'org.apache.spark', name: 'spark-core_2.10', version:
> '1.1.1'
> -- Works fine. Any idea what might be going wrong.
>
> Thanks
> Lalit
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-core-maven-error-tp20871.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Dynamic Allocation in Spark 1.2.0

2014-12-27 Thread Shixiong Zhu
I encountered the following issue when enabling dynamicAllocation. You may
want to take a look at it.

https://issues.apache.org/jira/browse/SPARK-4951

Best Regards,
Shixiong Zhu

2014-12-28 2:07 GMT+08:00 Tsuyoshi OZAWA :

> Hi Anders,
>
> I faced the same issue as you mentioned. Yes, you need to install
> spark shuffle plugin for YARN. Please check following PRs which add
> doc to enable dynamicAllocation:
>
> https://github.com/apache/spark/pull/3731
> https://github.com/apache/spark/pull/3757
>
> I could run Spark on YARN with dynamicAllocation by following the
> instructions described in the docs.
>
> Thanks,
> - Tsuyoshi
>
> On Sat, Dec 27, 2014 at 11:06 PM, Anders Arpteg 
> wrote:
> > Hey,
> >
> > Tried to get the new spark.dynamicAllocation.enabled feature working on
> Yarn
> > (Hadoop 2.2), but am unsuccessful so far. I've tested with the following
> > settings:
> >
> >   conf
> > .set("spark.dynamicAllocation.enabled", "true")
> > .set("spark.shuffle.service.enabled", "true")
> > .set("spark.dynamicAllocation.minExecutors", "10")
> > .set("spark.dynamicAllocation.maxExecutors", "700")
> >
> > The app works fine on Spark 1.2 if dynamicAllocation is not enabled, but
> > with the settings above, it will start the app and the first job is
> listed
> > in the web ui. However, no tasks are started and it seems to be stuck
> > waiting for a container to be allocated forever.
> >
> > Any help would be appreciated. Need to do something specific to get the
> > external yarn shuffle service running in the node manager?
> >
> > TIA,
> > Anders
>
>
>
> --
> - Tsuyoshi
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Compile error since Spark 1.2.0

2014-12-27 Thread Ted Yu
Please see:
[SPARK-3930] [SPARK-3933] Support fixed-precision decimal in SQL, and some
optimizations

Cheers

On Sat, Dec 27, 2014 at 7:20 PM, zigen  wrote:

> Compile error from Spark 1.2.0
>
>
> Hello , I am zigen.
>
> I am using the Spark SQL 1.1.0.
>
> I want to use the Spark SQL 1.2.0.
>
>
> but my Spark application is a compile error.
>
> Spark 1.1.0 had a DataType.DecimalType.
>
> but Spark1.2.0 had not DataType.DecimalType.
>
> Why ?
>
>
> JavaDoc (Spark 1.1.0)
>
> http://people.apache.org/~pwendell/spark-1.1.0-rc1-docs/api/java/org/apache/spark/sql/api/java/DataType.html
>
>
> JavaDoc (Spark 1.2.0)
>
> http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/api/java/org/apache/spark/sql/api/java/DataType.html
>
>
> programing guild (Spark 1.2.0)
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html#spark-sql-datatype-reference
>
>
>


Spark core maven error

2014-12-27 Thread lalitagarw
Hi All,

I am using spark in a grails app and have added below maven details.

compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.2.0'

It fails with below error:

Resolve error obtaining dependencies: Failed to read artifact descriptor for
org.apache.spark:spark-core_2.10:jar:1.2.0 (Use --stacktrace to see the full
trace)
Error |
Resolve error obtaining dependencies: Failed to read artifact descriptor for
org.apache.spark:spark-core_2.10:jar:1.2.0 (Use --stacktrace to see the full
trace)
Error |
Resolve

However,
compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.1.1'
-- Works fine. Any idea what might be going wrong.

Thanks
Lalit



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-core-maven-error-tp20871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Compile error since Spark 1.2.0

2014-12-27 Thread zigen
Compile error from Spark 1.2.0

Hello , I am zigen.
I am using the Spark SQL 1.1.0.
I want to use the Spark SQL 1.2.0.

but my Spark application is a compile error.
Spark 1.1.0 had a DataType.DecimalType.
but Spark1.2.0 had not DataType.DecimalType.
Why ?

JavaDoc (Spark 1.1.0)
http://people.apache.org/~pwendell/spark-1.1.0-rc1-docs/api/java/org/apache/spark/sql/api/java/DataType.html

JavaDoc (Spark 1.2.0)
http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/api/java/org/apache/spark/sql/api/java/DataType.html

programing guild (Spark 1.2.0)
https://spark.apache.org/docs/latest/sql-programming-guide.html#spark-sql-datatype-reference




Re: Problem with StreamingContext - getting SPARK-2243

2014-12-27 Thread Thomas Frisk
Yes you are right - thanks for that :)

On 27 December 2014 at 23:18, Ilya Ganelin  wrote:

> Are you trying to do this in the shell? Shell is instantiated with a spark
> context named sc.
>
> -Ilya Ganelin
>
> On Sat, Dec 27, 2014 at 5:24 PM, tfrisk  wrote:
>
>>
>> Hi,
>>
>> Doing:
>>val ssc = new StreamingContext(conf, Seconds(1))
>>
>> and getting:
>>Only one SparkContext may be running in this JVM (see SPARK-2243). To
>> ignore this error, set spark.driver.allowMultipleContexts = true.
>>
>>
>> But I dont think that I have another SparkContext running. Is there any
>> way
>> I can check this or force kill ?  I've tried restarting the server as I'm
>> desperate but still I get the same issue.  I was not getting this earlier
>> today.
>>
>> Any help much appreciated .
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-StreamingContext-getting-SPARK-2243-tp20869.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Using YARN on a cluster created with spark-ec2

2014-12-27 Thread firemonk9
Currently only standalone cluster is supported with the spark-ec2 script. You
can use Cloudera/ambari/sequenceiq for creating yarn cluster.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-YARN-on-a-cluster-created-with-spark-ec2-tp20816p20870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Problem with StreamingContext - getting SPARK-2243

2014-12-27 Thread Ilya Ganelin
Are you trying to do this in the shell? Shell is instantiated with a spark
context named sc.

-Ilya Ganelin

On Sat, Dec 27, 2014 at 5:24 PM, tfrisk  wrote:

>
> Hi,
>
> Doing:
>val ssc = new StreamingContext(conf, Seconds(1))
>
> and getting:
>Only one SparkContext may be running in this JVM (see SPARK-2243). To
> ignore this error, set spark.driver.allowMultipleContexts = true.
>
>
> But I dont think that I have another SparkContext running. Is there any way
> I can check this or force kill ?  I've tried restarting the server as I'm
> desperate but still I get the same issue.  I was not getting this earlier
> today.
>
> Any help much appreciated .
>
> Thanks,
>
> Thomas
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-StreamingContext-getting-SPARK-2243-tp20869.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Problem with StreamingContext - getting SPARK-2243

2014-12-27 Thread tfrisk

Hi,

Doing:
   val ssc = new StreamingContext(conf, Seconds(1))

and getting:
   Only one SparkContext may be running in this JVM (see SPARK-2243). To
ignore this error, set spark.driver.allowMultipleContexts = true.


But I dont think that I have another SparkContext running. Is there any way
I can check this or force kill ?  I've tried restarting the server as I'm
desperate but still I get the same issue.  I was not getting this earlier
today.

Any help much appreciated .

Thanks,

Thomas




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-StreamingContext-getting-SPARK-2243-tp20869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: unable to check whether an item is present in RDD

2014-12-27 Thread Nicholas Chammas
Is the item you're looking up an Int? So you want to find which of the
Iterable[Int] elements in your RDD contains the Int you're looking for?

On Sat Dec 27 2014 at 3:26:41 PM Amit Behera  wrote:

> Hi All,
>
> I want to check an item is present or not in a RDD of Iterable[Int] using
> scala
>
> something like in java we do :
>
> *list.contains(item)*
>
> and the statement returns true if the item is present otherwise false.
>
> Please help me to find the solution.
>
> Thanks
> Amit
>
>


Re: action progress in ipython notebook?

2014-12-27 Thread Josh Rosen
The console progress bars are implemented on top of a new stable "status
API" that was added in Spark 1.2.  It's possible to query job progress
using this interface (in older versions of Spark, you could implement a
custom SparkListener and maintain the counts of completed / running /
failed tasks / stages yourself).

There are actually several subtleties involved in implementing "job-level"
progress bars which behave in an intuitive way; there's a pretty extensive
discussion of the challenges at https://github.com/apache/spark/pull/3009.
Also, check out the pull request for the console progress bars for an
interesting design discussion around how they handle parallel stages:
https://github.com/apache/spark/pull/3029.

I'm not sure about the plumbing that would be necessary to display live
progress updates in the IPython notebook UI, though.  The general pattern
would probably involve a mapping to relate notebook cells to Spark jobs
(you can do this with job groups, I think), plus some periodic timer that
polls the driver for the status of the current job in order to update the
progress bar.

For Spark 1.3, I'm working on designing a REST interface to accesses this
type of job / stage / task progress information, as well as expanding the
types of information exposed through the stable status API interface.

- Josh

On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman 
wrote:

> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
> to the team for this release.
>
> A question about progress of actions.  I can see how things are
> progressing using the Spark UI.  I can also see the nice ASCII art
> animation on the spark driver console.
>
> Has anyone come up with a way to accomplish something similar in an
> iPython notebook using pyspark?
>
> Thanks
> Eric
>


unable to check whether an item is present in RDD

2014-12-27 Thread Amit Behera
Hi All,

I want to check an item is present or not in a RDD of Iterable[Int] using
scala

something like in java we do :

*list.contains(item)*

and the statement returns true if the item is present otherwise false.

Please help me to find the solution.

Thanks
Amit


init / shutdown for complex map job?

2014-12-27 Thread Kevin Burton
I have a job where I want to map over all data in a cassandra database.

I’m then selectively sending things to my own external system (ActiveMQ) if
the item matches criteria.

The problem is that I need to do some init and shutdown.  Basically on init
I need to create ActiveMQ connections and on shutdown I need to close them
or daemon threads will be left running.

What’s the best way to accomplish this. I could find it after I RTFMd…(but
perhaps I missed  it)

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed by ambari 1.7.0

2014-12-27 Thread Sean Owen
The problem is a conflicts in the version of Jackson used in your cluster
versus what you run. I would start by taking off things like the assembly
jar from your classpath. Try the userClassPathFirst option as well to avoid
using the Jackson in your Hadoop distribution.
Hi,
I build the 1.2.0 version of spark against single node hadoop 2.6.0
installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can
execute on my local Mac 10.9.5 and the centos virtual machine, which host
hadoop, but I can't run the SparkPi example inside yarn, it seems there's
something wrong with the classpathes:

export HADOOP_CONF_DIR=/etc/hadoop/conf


./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g
--executor-cores 1 --queue thequeue --jars
spark-assembly-1.2.0-hadoop2.6.0.jar,spark-1.2.0-yarn-shuffle.jar,datanucleus-core-3.2.10.jar,datanucleus-rdbms-3.2.9.jar,datanucleus-api-jdo-3.2.6.jar
lib/spark-examples-1.2.0-hadoop2.6.0.jar 10

Spark assembly has been built with Hive, including Datanucleus jars on
classpath

14/12/10 15:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

14/12/10 15:39:00 INFO impl.TimelineClientImpl: Timeline service address:
http://lix1.bh.com:8188/ws/v1/timeline/

Exception in thread "main" java.lang.NoClassDefFoundError:
org/codehaus/jackson/map/deser/std/StdDeserializer

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:57)

at
org.apache.hadoop.yarn.util.timeline.TimelineUtils.(TimelineUtils.java:47)

at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:65)

at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException:
org.codehaus.jackson.map.deser.std.StdDeserializer

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 28 more

[xiaobogu@lix1 spark-1.2.0-bin-2.6.0]$ ./bin/spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
--driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue
lib/spark-examples-1.2.0-hadoop2.6.0.jar 10

Spark assembly has been built with Hive, including Datanucleus jars on
classpath

14/12/10 15:39:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

14/12/10 15:39:51 INFO impl.TimelineClientImpl: Timeline service address:
http://lix1.bh.com:8188/ws/v1/timeline/

Exception in thread "main" java.lang.NoClassDefFoundError:
org/codehaus/jackson/map/deser/std/StdDeserializer

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at ja

Re: Dynamic Allocation in Spark 1.2.0

2014-12-27 Thread Tsuyoshi OZAWA
Hi Anders,

I faced the same issue as you mentioned. Yes, you need to install
spark shuffle plugin for YARN. Please check following PRs which add
doc to enable dynamicAllocation:

https://github.com/apache/spark/pull/3731
https://github.com/apache/spark/pull/3757

I could run Spark on YARN with dynamicAllocation by following the
instructions described in the docs.

Thanks,
- Tsuyoshi

On Sat, Dec 27, 2014 at 11:06 PM, Anders Arpteg  wrote:
> Hey,
>
> Tried to get the new spark.dynamicAllocation.enabled feature working on Yarn
> (Hadoop 2.2), but am unsuccessful so far. I've tested with the following
> settings:
>
>   conf
> .set("spark.dynamicAllocation.enabled", "true")
> .set("spark.shuffle.service.enabled", "true")
> .set("spark.dynamicAllocation.minExecutors", "10")
> .set("spark.dynamicAllocation.maxExecutors", "700")
>
> The app works fine on Spark 1.2 if dynamicAllocation is not enabled, but
> with the settings above, it will start the app and the first job is listed
> in the web ui. However, no tasks are started and it seems to be stuck
> waiting for a container to be allocated forever.
>
> Any help would be appreciated. Need to do something specific to get the
> external yarn shuffle service running in the node manager?
>
> TIA,
> Anders



-- 
- Tsuyoshi

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Installation Maven PermGen OutOfMemoryException

2014-12-27 Thread varun sharma
This works for me:

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m" && mvn -DskipTests clean package



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Installation-Maven-PermGen-OutOfMemoryException-tp20831p20868.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Playing along at home: recommendations as to system requirements?

2014-12-27 Thread Cody Koeninger
There are hardware recommendations at
http://spark.apache.org/docs/latest/hardware-provisioning.html  but they're
overkill for just testing things out.  You should be able to get meaningful
work done with 2 m3large for instance.

On Sat, Dec 27, 2014 at 8:27 AM, Amy Brown 
wrote:

> Hi all,
>
> Brand new to Spark and to big data technologies in general. Eventually I'd
> like to contribute to the testing effort on Spark.
>
> I have an ARM Chromebook at my disposal: that's it for the moment. I can
> vouch that it's OK for sending Hive queries to an AWS EMR cluster via SQL
> Workbench.
>
> I ran the SparkPI example using the prebuilt Hadoop 2.4 package and got a
> fatal error. I can post that error log if anyone wants to see it, but I
> want to rule out the obvious cause.
>
> Can anyone make recommendations as to minimum system requirements for
> using Spark - for example, with an AWS EMR cluster? I didn't see any on the
> Spark site.
>
> Thanks,
>
> Amy Brown
>


Re: How to build Spark against the latest

2014-12-27 Thread Sean Owen
Yes, it is just a warning but it can be ignored unless you are running old
Java 6 at runtime too.
On Dec 27, 2014 3:11 PM, "Ted Yu"  wrote:

> In make-distribution.sh, there is following check of Java version:
>
> if [[ ! "$JAVA_VERSION" =~ "1.6" && -z "$SKIP_JAVA_TEST" ]]; then
>   echo "***NOTE***: JAVA_HOME is not set to a JDK 6 installation. The
> resulting"
>
> FYI
>
> On Sat, Dec 27, 2014 at 1:31 AM, Sean Owen  wrote:
>
>> Why do you need to skip java tests? I build the distro just fine with
>> Java 8.
>> On Dec 27, 2014 4:21 AM, "Ted Yu"  wrote:
>>
>>> In case jdk 1.7 or higher is used to build, --skip-java-test needs to be
>>> specifed.
>>>
>>> FYI
>>>
>>> On Thu, Dec 25, 2014 at 5:03 PM, guxiaobo1982 
>>> wrote:
>>>
 The following command works

 ./make-distribution.sh --tgz  -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -Phive -DskipTests

 -- Original --
 *From: * "guxiaobo1982";;
 *Send time:* Thursday, Dec 25, 2014 3:58 PM
 *To:* ""; "Ted Yu";
 *Cc:* "user@spark.apache.org";
 *Subject: * Re: How to build Spark against the latest


 What options should I use when running the make-distribution.sh script,

 I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn
 -with-hive --with-tachyon --tgz
 with nothing came out.

 Regards

 -- Original --
 *From: * "guxiaobo1982";;
 *Send time:* Wednesday, Dec 24, 2014 6:52 PM
 *To:* "Ted Yu";
 *Cc:* "user@spark.apache.org";
 *Subject: * Re: How to build Spark against the latest

 Hi Ted,
  The reference command works, but where I can get the deployable
 binaries?

 Xiaobo Gu




 -- Original --
 *From: * "Ted Yu";;
 *Send time:* Wednesday, Dec 24, 2014 12:09 PM
 *To:* "";
 *Cc:* "user@spark.apache.org";
 *Subject: * Re: How to build Spark against the latest

 See http://search-hadoop.com/m/JW1q5Cew0j

 On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 
 wrote:

> Hi,
> The official pom.xml file only have profile for hadoop version 2.4 as
> the latest version, but I installed hadoop version 2.6.0 with ambari, how
> can I build spark against it, just using mvn -Dhadoop.version=2.6.0,
> or how to make a coresponding profile for it?
>
> Regards,
>
> Xiaobo
>


>>>
>


Re: How to build Spark against the latest

2014-12-27 Thread Ted Yu
In make-distribution.sh, there is following check of Java version:

if [[ ! "$JAVA_VERSION" =~ "1.6" && -z "$SKIP_JAVA_TEST" ]]; then
  echo "***NOTE***: JAVA_HOME is not set to a JDK 6 installation. The
resulting"

FYI

On Sat, Dec 27, 2014 at 1:31 AM, Sean Owen  wrote:

> Why do you need to skip java tests? I build the distro just fine with Java
> 8.
> On Dec 27, 2014 4:21 AM, "Ted Yu"  wrote:
>
>> In case jdk 1.7 or higher is used to build, --skip-java-test needs to be
>> specifed.
>>
>> FYI
>>
>> On Thu, Dec 25, 2014 at 5:03 PM, guxiaobo1982 
>> wrote:
>>
>>> The following command works
>>>
>>> ./make-distribution.sh --tgz  -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
>>> -Dhadoop.version=2.6.0 -Phive -DskipTests
>>>
>>> -- Original --
>>> *From: * "guxiaobo1982";;
>>> *Send time:* Thursday, Dec 25, 2014 3:58 PM
>>> *To:* ""; "Ted Yu";
>>> *Cc:* "user@spark.apache.org";
>>> *Subject: * Re: How to build Spark against the latest
>>>
>>>
>>> What options should I use when running the make-distribution.sh script,
>>>
>>> I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn
>>> -with-hive --with-tachyon --tgz
>>> with nothing came out.
>>>
>>> Regards
>>>
>>> -- Original --
>>> *From: * "guxiaobo1982";;
>>> *Send time:* Wednesday, Dec 24, 2014 6:52 PM
>>> *To:* "Ted Yu";
>>> *Cc:* "user@spark.apache.org";
>>> *Subject: * Re: How to build Spark against the latest
>>>
>>> Hi Ted,
>>>  The reference command works, but where I can get the deployable
>>> binaries?
>>>
>>> Xiaobo Gu
>>>
>>>
>>>
>>>
>>> -- Original --
>>> *From: * "Ted Yu";;
>>> *Send time:* Wednesday, Dec 24, 2014 12:09 PM
>>> *To:* "";
>>> *Cc:* "user@spark.apache.org";
>>> *Subject: * Re: How to build Spark against the latest
>>>
>>> See http://search-hadoop.com/m/JW1q5Cew0j
>>>
>>> On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 
>>> wrote:
>>>
 Hi,
 The official pom.xml file only have profile for hadoop version 2.4 as
 the latest version, but I installed hadoop version 2.6.0 with ambari, how
 can I build spark against it, just using mvn -Dhadoop.version=2.6.0,
 or how to make a coresponding profile for it?

 Regards,

 Xiaobo

>>>
>>>
>>


Dynamic Allocation in Spark 1.2.0

2014-12-27 Thread Anders Arpteg
Hey,

Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:

  conf
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.shuffle.service.enabled", "true")
.set("spark.dynamicAllocation.minExecutors", "10")
.set("spark.dynamicAllocation.maxExecutors", "700")

The app works fine on Spark 1.2 if dynamicAllocation is not enabled, but
with the settings above, it will start the app and the first job is listed
in the web ui. However, no tasks are started and it seems to be stuck
waiting for a container to be allocated forever.

Any help would be appreciated. Need to do something specific to get the
external yarn shuffle service running in the node manager?

TIA,
Anders


Playing along at home: recommendations as to system requirements?

2014-12-27 Thread Amy Brown
Hi all,

Brand new to Spark and to big data technologies in general. Eventually I'd
like to contribute to the testing effort on Spark.

I have an ARM Chromebook at my disposal: that's it for the moment. I can
vouch that it's OK for sending Hive queries to an AWS EMR cluster via SQL
Workbench.

I ran the SparkPI example using the prebuilt Hadoop 2.4 package and got a
fatal error. I can post that error log if anyone wants to see it, but I
want to rule out the obvious cause.

Can anyone make recommendations as to minimum system requirements for using
Spark - for example, with an AWS EMR cluster? I didn't see any on the Spark
site.

Thanks,

Amy Brown


Dynamic Allocation in Spark 1.2.0

2014-12-27 Thread Anders Arpteg
Hey,

Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:

  conf
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.shuffle.service.enabled", "true")
.set("spark.dynamicAllocation.minExecutors", "10")
.set("spark.dynamicAllocation.maxExecutors", "700")

The app works fine on Spark 1.2 if dynamicAllocation is not enabled, but
with the settings above, it will start the app and the first job is listed
in the web ui. However, no tasks are started and it seems to be stuck
waiting for a container to be allocated forever.

Any help would be appreciated. Need to do something specific to get the
external yarn shuffle service running in the node manager?

TIA,
Anders


Re: How to build Spark against the latest

2014-12-27 Thread Sean Owen
Why do you need to skip java tests? I build the distro just fine with Java
8.
On Dec 27, 2014 4:21 AM, "Ted Yu"  wrote:

> In case jdk 1.7 or higher is used to build, --skip-java-test needs to be
> specifed.
>
> FYI
>
> On Thu, Dec 25, 2014 at 5:03 PM, guxiaobo1982  wrote:
>
>> The following command works
>>
>> ./make-distribution.sh --tgz  -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
>> -Dhadoop.version=2.6.0 -Phive -DskipTests
>>
>> -- Original --
>> *From: * "guxiaobo1982";;
>> *Send time:* Thursday, Dec 25, 2014 3:58 PM
>> *To:* ""; "Ted Yu";
>> *Cc:* "user@spark.apache.org";
>> *Subject: * Re: How to build Spark against the latest
>>
>>
>> What options should I use when running the make-distribution.sh script,
>>
>> I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn
>> -with-hive --with-tachyon --tgz
>> with nothing came out.
>>
>> Regards
>>
>> -- Original --
>> *From: * "guxiaobo1982";;
>> *Send time:* Wednesday, Dec 24, 2014 6:52 PM
>> *To:* "Ted Yu";
>> *Cc:* "user@spark.apache.org";
>> *Subject: * Re: How to build Spark against the latest
>>
>> Hi Ted,
>>  The reference command works, but where I can get the deployable
>> binaries?
>>
>> Xiaobo Gu
>>
>>
>>
>>
>> -- Original --
>> *From: * "Ted Yu";;
>> *Send time:* Wednesday, Dec 24, 2014 12:09 PM
>> *To:* "";
>> *Cc:* "user@spark.apache.org";
>> *Subject: * Re: How to build Spark against the latest
>>
>> See http://search-hadoop.com/m/JW1q5Cew0j
>>
>> On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 
>> wrote:
>>
>>> Hi,
>>> The official pom.xml file only have profile for hadoop version 2.4 as
>>> the latest version, but I installed hadoop version 2.6.0 with ambari, how
>>> can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or
>>> how to make a coresponding profile for it?
>>>
>>> Regards,
>>>
>>> Xiaobo
>>>
>>
>>
>