Re: Error with --files

2016-04-14 Thread Benjamin Zaitlen
That fixed it! Thank you! --Ben On Thu, Apr 14, 2016 at 5:53 PM, Marcelo Vanzin wrote: > On Thu, Apr 14, 2016 at 2:14 PM, Benjamin Zaitlen > wrote: > >> spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files > >> /home/ubuntu/localtest.txt#appSees.txt

Error with --files

2016-04-14 Thread Benjamin Zaitlen
Hi All, I'm trying to use the --files option with yarn: spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files > /home/ubuntu/localtest.txt#appSees.txt I never see the file in HDFS or in the yarn containers. Am I doing something incorrect ? I'm running spark 1.6.0 Thanks, --B

Re: 1.5 Build Errors

2015-10-06 Thread Benjamin Zaitlen
Hi All, Sean patiently worked with me in solving this issue. The problem was entirely my fault in settings MAVEN_OPTS env variable was set and was overriding everything. --Ben On Tue, Sep 8, 2015 at 1:37 PM, Benjamin Zaitlen wrote: > Yes, just reran with the following > > (spark_b

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
ation. You > can run "zinc -J-Xmx4g..." in general, but in the provided script, > ZINC_OPTS seems to be the equivalent, yes. It kind of looks like your > mvn process isn't getting any special memory args there. Is MAVEN_OPTS > really exported? > > FWIW I use my

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
gt;>> + return 1 >>> + exit 1 >> >> >> On Tue, Sep 8, 2015 at 10:03 AM, Sean Owen wrote: >> >>> It might need more memory in certain situations / running certain >>> tests. If 3gb works for your relatively full build, yes you can open a >

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
nge any occurrences of lower recommendations to 3gb. > > On Tue, Sep 8, 2015 at 3:02 PM, Benjamin Zaitlen > wrote: > > Ah, right. Should've caught that. > > > > The docs seem to recommend 2gb. Should that be increased as well? > > > > --Ben > >

Re: 1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
Ah, right. Should've caught that. The docs seem to recommend 2gb. Should that be increased as well? --Ben On Tue, Sep 8, 2015 at 9:33 AM, Sean Owen wrote: > It shows you there that Maven is out of memory. Give it more heap. I use > 3gb. > > On Tue, Sep 8, 2015 at 1:53 PM,

1.5 Build Errors

2015-09-08 Thread Benjamin Zaitlen
Hi All, I'm trying to build a distribution off of the latest in master and I keep getting errors on MQTT and the build fails. I'm running the build on a m1.large which has 7.5 GB of RAM and no other major processes are running. MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=5

Submitting Python Applications from Remote to Master

2014-11-14 Thread Benjamin Zaitlen
Hi All, I'm not quite clear on whether submitting a python application to spark standalone on ec2 is possible. Am I reading this correctly: *A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node

Re: iPython notebook ec2 cluster matlabplot not found?

2014-09-29 Thread Benjamin Zaitlen
HI Andy, I built an anaconda/spark AMI a few months ago. I'm still iterating on it so if things break please report them. If you want to give it awhirl: ./spark-ec2 -k my_key -i ~/.ssh/mykey.rsa -a ami-3ecd0c56 The nice thing about anaconda is that it come pre-baked with ipython-notebook, matp

TimeStamp selection with SparkSQL

2014-09-04 Thread Benjamin Zaitlen
I may have missed this but is it possible to select on datetime in a SparkSQL query jan1 = sqlContext.sql("SELECT * FROM Stocks WHERE datetime = '2014-01-01'") Additionally, is there a guide as to what SQL is valid? The guide says, "Note that Spark SQL currently uses a very basic SQL parser" It

Re: Anaconda Spark AMI

2014-07-12 Thread Benjamin Zaitlen
.html > > Hope that helps, > -Jey > > On Thu, Jul 3, 2014 at 11:54 AM, Benjamin Zaitlen > wrote: > > Hi All, > > > > I'm a dev a Continuum and we are developing a fair amount of tooling > around > > Spark. A few days ago someone expressed int

Anaconda Spark AMI

2014-07-03 Thread Benjamin Zaitlen
Hi All, I'm a dev a Continuum and we are developing a fair amount of tooling around Spark. A few days ago someone expressed interest in numpy+pyspark and Anaconda came up as a reasonable solution. I spent a number of hours yesterday trying to rework the base Spark AMI on EC2 but sadly was defeat