Re: hive on spark query error

2015-09-25 Thread Jimmy Xiang
> Error: Master must start with yarn, spark, mesos, or local

What's your setting for spark.master?

On Fri, Sep 25, 2015 at 9:56 AM, Garry Chen  wrote:

> Hi All,
>
> I am following
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started?
> to setup hive on spark.  After setup/configuration everything startup I am
> able to show tables but when executing sql statement within beeline I got
> error.  Please help and thank you very much.
>
>
>
> Cluster Environment (3 nodes) as following
>
> hadoop-2.7.1
>
> spark-1.4.1-bin-hadoop2.6
>
> zookeeper-3.4.6
>
> apache-hive-1.2.1-bin
>
>
>
> Error from hive log:
>
> 2015-09-25 11:51:03,123 INFO  [HiveServer2-Handler-Pool: Thread-50]:
> client.SparkClientImpl (SparkClientImpl.java:startDriver(375)) - Attempting
> impersonation of oracle
>
> 2015-09-25 11:51:03,133 INFO  [HiveServer2-Handler-Pool: Thread-50]:
> client.SparkClientImpl (SparkClientImpl.java:startDriver(409)) - Running
> client driver with argv:
> /u01/app/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --proxy-user oracle
> --properties-file /tmp/spark-submit.840692098393819749.properties --class
> org.apache.hive.spark.client.RemoteDriver
> /u01/app/apache-hive-1.2.1-bin/lib/hive-exec-1.2.1.jar --remote-host
> ip-10-92-82-229.ec2.internal --remote-port 40476 --conf
> hive.spark.client.connect.timeout=1000 --conf
> hive.spark.client.server.connect.timeout=9 --conf
> hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
>
> 2015-09-25 11:51:03,867 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Warning: Ignoring non-spark config
> property: hive.spark.client.server.connect.timeout=9
>
> 2015-09-25 11:51:03,868 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Warning: Ignoring non-spark config
> property: hive.spark.client.rpc.threads=8
>
> 2015-09-25 11:51:03,868 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Warning: Ignoring non-spark config
> property: hive.spark.client.connect.timeout=1000
>
> 2015-09-25 11:51:03,868 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Warning: Ignoring non-spark config
> property: hive.spark.client.secret.bits=256
>
> 2015-09-25 11:51:03,868 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Warning: Ignoring non-spark config
> property: hive.spark.client.rpc.max.size=52428800
>
> 2015-09-25 11:51:03,876 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Error: Master must start with yarn,
> spark, mesos, or local
>
> 2015-09-25 11:51:03,876 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - Run with --help for usage help or
> --verbose for debug output
>
> 2015-09-25 11:51:03,885 INFO  [stderr-redir-1]: client.SparkClientImpl
> (SparkClientImpl.java:run(569)) - 15/09/25 11:51:03 INFO util.Utils:
> Shutdown hook called
>
> 2015-09-25 11:51:03,889 WARN  [Driver]: client.SparkClientImpl
> (SparkClientImpl.java:run(427)) - Child process exited with code 1.
>
>
>
>
>


Re: Is there any Spark implementation for Item-based Collaborative Filtering?

2014-11-30 Thread Jimmy
The latest version of MLlib has it built in no?
J

Sent from my iPhone

> On Nov 30, 2014, at 9:36 AM, shahab  wrote:
> 
> Hi,
> 
> I just wonder if there is any implementation for Item-based Collaborative 
> Filtering in Spark?
> 
> best,
> /Shahab

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how to convert System.currentTimeMillis to calendar time

2014-11-13 Thread Jimmy McErlain
You could also use the jodatime library, which has a ton of great other
options in it.
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Nov 13, 2014 at 10:40 AM, Akhil Das 
wrote:

> This way?
>
> scala> val epoch = System.currentTimeMillis
> epoch: Long = 1415903974545
>
> scala> val date = new Date(epoch)
> date: java.util.Date = Fri Nov 14 00:09:34 IST 2014
>
>
>
> Thanks
> Best Regards
>
> On Thu, Nov 13, 2014 at 10:17 PM, spr  wrote:
>
>> Apologies for what seems an egregiously simple question, but I can't find
>> the
>> answer anywhere.
>>
>> I have timestamps from the Spark Streaming Time() interface, in
>> milliseconds
>> since an epoch, and I want to print out a human-readable calendar date and
>> time.  How does one do that?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-convert-System-currentTimeMillis-to-calendar-time-tp18856.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Jimmy McErlain
I have used Oozie for all our workflows with Spark apps but you will have
to use a java event as the workflow element.   I am interested in anyones
experience with Luigi and/or any other tools.


On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais <
adamantios.cor...@gmail.com> wrote:

> I have some previous experience with Apache Oozie while I was developing
> in Apache Pig. Now, I am working explicitly with Apache Spark and I am
> looking for a tool with similar functionality. Is Oozie recommended? What
> about Luigi? What do you use \ recommend?
>



-- 


"Nothing under the sun is greater than education. By educating one person
and sending him/her into the society of his/her generation, we make a
contribution extending a hundred generations to come."
-Jigoro Kano, Founder of Judo-


Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Jimmy McErlain
can you be more specific what version of spark, hive, hadoop, etc...
what are you trying to do?  what are the issues you are seeing?
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Nov 6, 2014 at 9:22 AM, tridib  wrote:

> Help please!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark v Redshift

2014-11-04 Thread Jimmy McErlain
This is pretty spot on.. though I would also add that the Spark features
that it touts around speed are all dependent on caching the data into
memory... reading off the disk still takes time..ie pulling the data into
an RDD.  This is the reason that Spark is great for ML... the data is used
over and over again to fit models so its pulled into memory once then
basically analyzed through the algos... other DBs systems are reading and
writing to disk repeatedly and are thus slower, such as mahout (though its
getting ported over to Spark as well to compete with MLlib)...

J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Nov 4, 2014 at 3:51 PM, Matei Zaharia 
wrote:

> Is this about Spark SQL vs Redshift, or Spark in general? Spark in general
> provides a broader set of capabilities than Redshift because it has APIs in
> general-purpose languages (Java, Scala, Python) and libraries for things
> like machine learning and graph processing. For example, you might use
> Spark to do the ETL that will put data into a database such as Redshift, or
> you might pull data out of Redshift into Spark for machine learning. On the
> other hand, if *all* you want to do is SQL and you are okay with the set of
> data formats and features in Redshift (i.e. you can express everything
> using its UDFs and you have a way to get data in), then Redshift is a
> complete service which will do more management out of the box.
>
> Matei
>
> > On Nov 4, 2014, at 3:11 PM, agfung  wrote:
> >
> > I'm in the midst of a heated debate about the use of Redshift v Spark
> with a
> > colleague.  We keep trading anecdotes and links back and forth (eg airbnb
> > post from 2013 or amplab benchmarks), and we don't seem to be getting
> > anywhere.
> >
> > So before we start down the prototype /benchmark road, and in desperation
> > of finding *some* kind of objective third party perspective,  was
> wondering
> > if anyone who has used both in 2014 would care to provide commentary
> about
> > the sweet spot use cases / gotchas for non trivial use (eg a simple
> filter
> > scan isn't really interesting).  Soft issues like operational maintenance
> > and time spent developing v out of the box are interesting too...
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
sampleRDD. cache()

Sent from my iPhone

> On Oct 30, 2014, at 5:01 PM, peng xia  wrote:
> 
> Hi Xiangrui, 
> 
> Can you give me some code example about caching, as I am new to Spark.
> 
> Thanks,
> Best,
> Peng
> 
>> On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng  wrote:
>> Then caching should solve the problem. Otherwise, it is just loading
>> and parsing data from disk for each iteration. -Xiangrui
>> 
>> On Thu, Oct 30, 2014 at 11:44 AM, peng xia  wrote:
>> > Thanks for all your help.
>> > I think I didn't cache the data. My previous cluster was expired and I 
>> > don't
>> > have a chance to check the load balance or app manager.
>> > Below is my code.
>> > There are 18 features for each record and I am using the Scala API.
>> >
>> > import org.apache.spark.SparkConf
>> > import org.apache.spark.SparkContext
>> > import org.apache.spark.SparkContext._
>> > import org.apache.spark.rdd._
>> > import org.apache.spark.mllib.classification.SVMWithSGD
>> > import org.apache.spark.mllib.regression.LabeledPoint
>> > import org.apache.spark.mllib.linalg.Vectors
>> > import java.util.Calendar
>> >
>> > object BenchmarkClassification {
>> > def main(args: Array[String]) {
>> > // Load and parse the data file
>> > val conf = new SparkConf()
>> >  .setAppName("SVM")
>> >  .set("spark.executor.memory", "8g")
>> >  // .set("spark.executor.extraJavaOptions", "-Xms8g -Xmx8g")
>> >val sc = new SparkContext(conf)
>> > val data = sc.textFile(args(0))
>> > val parsedData = data.map { line =>
>> >  val parts = line.split(',')
>> >  LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x =>
>> > x.toDouble)))
>> > }
>> > val testData = sc.textFile(args(1))
>> > val testParsedData = testData .map { line =>
>> >  val parts = line.split(',')
>> >  LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x =>
>> > x.toDouble)))
>> > }
>> >
>> > // Run training algorithm to build the model
>> > val numIterations = 20
>> > val model = SVMWithSGD.train(parsedData, numIterations)
>> >
>> > // Evaluate model on training examples and compute training error
>> > // val labelAndPreds = testParsedData.map { point =>
>> > //   val prediction = model.predict(point.features)
>> > //   (point.label, prediction)
>> > // }
>> > // val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble /
>> > testParsedData.count
>> > // println("Training Error = " + trainErr)
>> > println(Calendar.getInstance().getTime())
>> > }
>> > }
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Best,
>> > Peng
>> >
>> > On Thu, Oct 30, 2014 at 1:23 PM, Xiangrui Meng  wrote:
>> >>
>> >> DId you cache the data and check the load balancing? How many
>> >> features? Which API are you using, Scala, Java, or Python? -Xiangrui
>> >>
>> >> On Thu, Oct 30, 2014 at 9:13 AM, Jimmy  wrote:
>> >> > Watch the app manager it should tell you what's running and taking
>> >> > awhile...
>> >> > My guess it's a "distinct" function on the data.
>> >> > J
>> >> >
>> >> > Sent from my iPhone
>> >> >
>> >> > On Oct 30, 2014, at 8:22 AM, peng xia  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > Previous we have applied SVM algorithm in MLlib to 5 million records
>> >> > (600
>> >> > mb), it takes more than 25 minutes to finish.
>> >> > The spark version we are using is 1.0 and we were running this program
>> >> > on a
>> >> > 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
>> >> >
>> >> > The 5 million records only have two distinct records (One positive and
>> >> > one
>> >> > negative), others are all duplications.
>> >> >
>> >> > Any one has any idea on why it takes so long on this small data?
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> > Best,
>> >> >
>> >> > Peng
>> >
>> >
> 


Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
Watch the app manager it should tell you what's running and taking awhile... My 
guess it's a "distinct" function on the data.
J

Sent from my iPhone

> On Oct 30, 2014, at 8:22 AM, peng xia  wrote:
> 
> Hi,
> 
>  
> 
> Previous we have applied SVM algorithm in MLlib to 5 million records (600 
> mb), it takes more than 25 minutes to finish.
> The spark version we are using is 1.0 and we were running this program on a 4 
> nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
> 
> The 5 million records only have two distinct records (One positive and one 
> negative), others are all duplications.
> 
> Any one has any idea on why it takes so long on this small data?
> 
>  
> 
> Thanks,
> Best,
> 
> Peng


Re: Spark + Tableau

2014-10-30 Thread Jimmy
What ODBC driver are you using? We recently got the Hortonworks JODBC drivers 
working on a Windows box but was having issues with Mac



Sent from my iPhone

> On Oct 30, 2014, at 4:23 AM, Bojan Kostic  wrote:
> 
> I'm testing beta driver from Databricks for Tableua.
> And unfortunately i encounter some issues.
> While beeline connection works without problems, Tableau can't connect to
> spark thrift server.
> 
> Error from driver(Tableau):
> Unable to connect to the ODBC Data Source. Check that the necessary drivers
> are installed and that the connection properties are valid.
> [Simba][SparkODBC] (34) Error from Spark: ETIMEDOUT.
> 
> Unable to connect to the server "test.server.com". Check that the server is
> running and that you have access privileges to the requested database.
> Unable to connect to the server. Check that the server is running and that
> you have access privileges to the requested database.
> 
> Exception on Thrift server:
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
>at
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.thrift.transport.TTransportException
>at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>at
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
>at
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>at
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>at
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>at
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>... 4 more
> 
> Is there anyone else who's testing this driver, or did anyone saw this
> message?
> 
> Best regards
> Bojan Kostić
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Tableau-tp17720.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Exception while reading SendingConnection to ConnectionManagerId

2014-10-16 Thread Jimmy Li
Does anyone know anything re: this error? Thank you!

On Wed, Oct 15, 2014 at 3:38 PM, Jimmy Li  wrote:

> Hi there, I'm running spark on ec2, and am running into an error there
> that I don't get locally. Here's the error:
>
> 11335 [handle-read-write-executor-3] ERROR
> org.apache.spark.network.SendingConnection  - Exception while reading
> SendingConnection to ConnectionManagerId([IP HERE])
> java.nio.channels.ClosedChannelException
>
> Does anyone know what might be causing this? Spark is running on my ec2
> instances.
>
> Thanks,
> Jimmy
>


Re: TaskNotSerializableException when running through Spark shell

2014-10-16 Thread Jimmy McErlain
I actually only ran into this issue recently after we upgraded to Spark
1.1.  Within the REPL for Spark 1.0 everything works fine but within the
REPL for 1.1, it is not.  FYI I am also only doing simple regex matching
functions within an RDD... Now when I am running the same code as App
everything is working fine... it leads me to believe that it is a bug
within the REPL for 1.1

Can anyone else confirm this?

ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Oct 16, 2014 at 7:56 AM, Akshat Aranya  wrote:

> Hi,
>
> Can anyone explain how things get captured in a closure when runing
> through the REPL.  For example:
>
> def foo(..) = { .. }
>
> rdd.map(foo)
>
> sometimes complains about classes not being serializable that are
> completely unrelated to foo.  This happens even when I write it such:
>
> object Foo {
>   def foo(..) = { .. }
> }
>
> rdd.map(Foo.foo)
>
> It also doesn't happen all the time.
>


Exception while reading SendingConnection to ConnectionManagerId

2014-10-15 Thread Jimmy Li
Hi there, I'm running spark on ec2, and am running into an error there that
I don't get locally. Here's the error:

11335 [handle-read-write-executor-3] ERROR
org.apache.spark.network.SendingConnection  - Exception while reading
SendingConnection to ConnectionManagerId([IP HERE])
java.nio.channels.ClosedChannelException

Does anyone know what might be causing this? Spark is running on my ec2
instances.

Thanks,
Jimmy


Re: Spark can't find jars

2014-10-14 Thread Jimmy McErlain
So the only way that I could make this work was to build a fat jar file as
suggested earlier.  To me (and I am no expert) it seems like this is a
bug.  Everything was working for me prior to our upgrade to Spark 1.1 on
Hadoop 2.2 but now it seems to not...  ie packaging my jars locally then
pushing them out to the cluster and pointing them to corresponding
dependent jars

Sorry I cannot be more help!
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Oct 14, 2014 at 4:59 AM, Christophe Préaud <
christophe.pre...@kelkoo.com> wrote:

>  Hello,
>
> I have already posted a message with the exact same problem, and proposed
> a patch (the subject is "Application failure in yarn-cluster mode").
> Can you test it, and see if it works for you?
> I would be glad too if someone can confirm that it is a bug in Spark 1.1.0.
>
> Regards,
> Christophe.
>
>
> On 14/10/2014 03:15, Jimmy McErlain wrote:
>
> BTW this has always worked for me before until we upgraded the cluster to
> Spark 1.1.1...
> J
> ᐧ
>
>
>
>
>  *JIMMY MCERLAIN*
>
> DATA SCIENTIST (NERD)
>
> *. . . . . . . . . . . . . . . . . .*
>
>
>   *IF WE CAN’T DOUBLE YOUR SALES,*
>
>
>
> *ONE OF US IS IN THE WRONG BUSINESS. *
>
> *E*: ji...@sellpoints.com
>
> *M*: *510.303.7751 <510.303.7751>*
>
> On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA <
> aharipriy...@gmail.com> wrote:
>
>> Helo,
>>
>>  Can you check if  the jar file is available in the target->scala-2.10
>> folder?
>>
>>  When you use sbt package to make the jar file, that is where the jar
>> file would be located.
>>
>>  The following command works well for me:
>>
>>  spark-submit --class “Classname"   --master yarn-cluster
>> jarfile(withcomplete path)
>>
>> Can you try checking  with this initially and later add other options?
>>
>> On Mon, Oct 13, 2014 at 7:36 PM, Jimmy  wrote:
>>
>>>  Having the exact same error with the exact same jar Do you work
>>> for Altiscale? :)
>>> J
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 13, 2014, at 5:33 PM, Andy Srine  wrote:
>>>
>>>   Hi Guys,
>>>
>>>
>>>  Spark rookie here. I am getting a file not found exception on the
>>> --jars. This is on the yarn cluster mode and I am running the following
>>> command on our recently upgraded Spark 1.1.1 environment.
>>>
>>>
>>>  ./bin/spark-submit --verbose --master yarn --deploy-mode cluster
>>> --class myEngine --driver-memory 1g --driver-library-path
>>> /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar
>>> --executor-memory 5g --executor-cores 5 --jars
>>> /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4
>>> /home/andy/spark/lib/my-spark-lib_1.0.jar
>>>
>>>
>>>  This is the error I am hitting. Any tips would be much appreciated.
>>> The file permissions looks fine on my local disk.
>>>
>>>
>>>  14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering
>>> ApplicationMaster with FAILED
>>>
>>> 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to
>>> be successfully unregistered.
>>>
>>> Exception in thread "Driver" java.lang.reflect.InvocationTargetException
>>>
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
>>>
>>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>>> failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task
>>> 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com):
>>> java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied)
>>>
>>> java.io.FileOutputStream.open(Native Method)
>>>
>>> java.io.FileOutputStream.(FileOutputStream.java:221)
>>>
>>>
>>> com.google.common.io.Files$FileByteSink.openStream(Files.java:223)
>>>
>>>
>>> com.google.common.io.Files$FileByteSink.openStream(Files.java:211)
>>>
>>>
>>> Thanks,
>>> Andy
>>>
>>>
>>
>>
>>   --
>> Regards,
>> Haripriya Ayyalasomayajula
>>
>>
>
>
> --
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>


Re: Spark can't find jars

2014-10-13 Thread Jimmy McErlain
BTW this has always worked for me before until we upgraded the cluster to
Spark 1.1.1...
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA <
aharipriy...@gmail.com> wrote:

> Helo,
>
> Can you check if  the jar file is available in the target->scala-2.10
> folder?
>
> When you use sbt package to make the jar file, that is where the jar file
> would be located.
>
> The following command works well for me:
>
> spark-submit --class “Classname"   --master yarn-cluster
> jarfile(withcomplete path)
>
> Can you try checking  with this initially and later add other options?
>
> On Mon, Oct 13, 2014 at 7:36 PM, Jimmy  wrote:
>
>> Having the exact same error with the exact same jar Do you work for
>> Altiscale? :)
>> J
>>
>> Sent from my iPhone
>>
>> On Oct 13, 2014, at 5:33 PM, Andy Srine  wrote:
>>
>> Hi Guys,
>>
>>
>> Spark rookie here. I am getting a file not found exception on the --jars.
>> This is on the yarn cluster mode and I am running the following command on
>> our recently upgraded Spark 1.1.1 environment.
>>
>>
>> ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class
>> myEngine --driver-memory 1g --driver-library-path
>> /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar
>> --executor-memory 5g --executor-cores 5 --jars
>> /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4
>> /home/andy/spark/lib/my-spark-lib_1.0.jar
>>
>>
>> This is the error I am hitting. Any tips would be much appreciated. The
>> file permissions looks fine on my local disk.
>>
>>
>> 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering
>> ApplicationMaster with FAILED
>>
>> 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be
>> successfully unregistered.
>>
>> Exception in thread "Driver" java.lang.reflect.InvocationTargetException
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
>>
>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>> failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task
>> 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com):
>> java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied)
>>
>> java.io.FileOutputStream.open(Native Method)
>>
>> java.io.FileOutputStream.(FileOutputStream.java:221)
>>
>> com.google.common.io.Files$FileByteSink.openStream(Files.java:223)
>>
>> com.google.common.io.Files$FileByteSink.openStream(Files.java:211)
>>
>>
>> Thanks,
>> Andy
>>
>>
>
>
> --
> Regards,
> Haripriya Ayyalasomayajula
>
>


Re: Spark can't find jars

2014-10-13 Thread Jimmy McErlain
That didnt seem to work... the jar files are in the target > scala2.10
folder when I package, then I move the jar to the cluster and launch the
app... still the same error...  Thoughts?
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA <
aharipriy...@gmail.com> wrote:

> Helo,
>
> Can you check if  the jar file is available in the target->scala-2.10
> folder?
>
> When you use sbt package to make the jar file, that is where the jar file
> would be located.
>
> The following command works well for me:
>
> spark-submit --class “Classname"   --master yarn-cluster
> jarfile(withcomplete path)
>
> Can you try checking  with this initially and later add other options?
>
> On Mon, Oct 13, 2014 at 7:36 PM, Jimmy  wrote:
>
>> Having the exact same error with the exact same jar Do you work for
>> Altiscale? :)
>> J
>>
>> Sent from my iPhone
>>
>> On Oct 13, 2014, at 5:33 PM, Andy Srine  wrote:
>>
>> Hi Guys,
>>
>>
>> Spark rookie here. I am getting a file not found exception on the --jars.
>> This is on the yarn cluster mode and I am running the following command on
>> our recently upgraded Spark 1.1.1 environment.
>>
>>
>> ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class
>> myEngine --driver-memory 1g --driver-library-path
>> /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar
>> --executor-memory 5g --executor-cores 5 --jars
>> /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4
>> /home/andy/spark/lib/my-spark-lib_1.0.jar
>>
>>
>> This is the error I am hitting. Any tips would be much appreciated. The
>> file permissions looks fine on my local disk.
>>
>>
>> 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering
>> ApplicationMaster with FAILED
>>
>> 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be
>> successfully unregistered.
>>
>> Exception in thread "Driver" java.lang.reflect.InvocationTargetException
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
>>
>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>> failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task
>> 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com):
>> java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied)
>>
>> java.io.FileOutputStream.open(Native Method)
>>
>> java.io.FileOutputStream.(FileOutputStream.java:221)
>>
>> com.google.common.io.Files$FileByteSink.openStream(Files.java:223)
>>
>> com.google.common.io.Files$FileByteSink.openStream(Files.java:211)
>>
>>
>> Thanks,
>> Andy
>>
>>
>
>
> --
> Regards,
> Haripriya Ayyalasomayajula
>
>


Re: Spark can't find jars

2014-10-13 Thread Jimmy
Having the exact same error with the exact same jar Do you work for 
Altiscale? :) 
J

Sent from my iPhone

> On Oct 13, 2014, at 5:33 PM, Andy Srine  wrote:
> 
> Hi Guys,
> 
> Spark rookie here. I am getting a file not found exception on the --jars. 
> This is on the yarn cluster mode and I am running the following command on 
> our recently upgraded Spark 1.1.1 environment.
> 
> ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class 
> myEngine --driver-memory 1g --driver-library-path 
> /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar 
> --executor-memory 5g --executor-cores 5 --jars 
> /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4 
> /home/andy/spark/lib/my-spark-lib_1.0.jar
> 
> This is the error I am hitting. Any tips would be much appreciated. The file 
> permissions looks fine on my local disk.
> 
> 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED
> 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> Exception in thread "Driver" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in 
> stage 1.0 (TID 12, 122-67.vb2.company.com): java.io.FileNotFoundException: 
> ./joda-convert-1.2.jar (Permission denied)
> java.io.FileOutputStream.open(Native Method)
> java.io.FileOutputStream.(FileOutputStream.java:221)
> com.google.common.io.Files$FileByteSink.openStream(Files.java:223)
> com.google.common.io.Files$FileByteSink.openStream(Files.java:211)
> 
> Thanks,
> Andy
> 


Re: Print Decision Tree Models

2014-10-01 Thread Jimmy
Yeah I'm using 1.0.0 and thanks for taking the time to check! 

Sent from my iPhone

> On Oct 1, 2014, at 8:48 PM, Xiangrui Meng  wrote:
> 
> Which Spark version are you using? It works in 1.1.0 but not in 1.0.0. 
> -Xiangrui
> 
>> On Wed, Oct 1, 2014 at 2:13 PM, Jimmy McErlain  wrote:
>> So I am trying to print the model output from MLlib however I am only 
>> getting things like the following:
>> org.apache.spark.mllib.tree.model.DecisionTreeModel@1120c600
>> 0.17171527904439082
>> 0.8282847209556092
>> 5273125.0
>> 2.5435412E7
>> 
>> from the following code:
>>   val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble 
>> / cleanedData2.count
>>   val trainSucc = labelAndPreds.filter(r => r._1 == r._2).count.toDouble 
>> / cleanedData2.count
>>   val trainErrCount = labelAndPreds.filter(r => r._1 != 
>> r._2).count.toDouble
>>   val trainSuccCount = labelAndPreds.filter(r => r._1 == 
>> r._2).count.toDouble
>>   
>>   print(model)
>>   println(trainErr)
>>   println(trainSucc)
>>   println(trainErrCount)
>>   println(trainSuccCount)
>> 
>> I have also tried the following:
>>   val model_string = model.toString()
>>   print(model_string)
>> 
>> And I still do not get the model to print but where it resides in memory.
>> 
>> Thanks,
>> J
>> 
>> 
>> 
>> 
>> 
>> 
>> JIMMY MCERLAIN
>> 
>> DATA SCIENTIST (NERD)
>> 
>> . . . . . . . . . . . . . . . . . . 
>> 
>> 
>> IF WE CAN’T DOUBLE YOUR SALES,
>> 
>> ONE OF US IS IN THE WRONG BUSINESS.
>> 
>> 
>> E: ji...@sellpoints.com   
>> 
>> M: 510.303.7751
>> 
>> ᐧ
> 


Print Decision Tree Models

2014-10-01 Thread Jimmy McErlain
So I am trying to print the model output from MLlib however I am only
getting things like the following:

org.apache.spark.mllib.tree.model.DecisionTreeModel@1120c600

0.17171527904439082
0.8282847209556092
5273125.0
2.5435412E7


from the following code:

  val trainErr = labelAndPreds.filter(r => r._1 !=
r._2).count.toDouble / cleanedData2.count
  val trainSucc = labelAndPreds.filter(r => r._1 ==
r._2).count.toDouble / cleanedData2.count
  val trainErrCount = labelAndPreds.filter(r => r._1 != r._2).count.toDouble
  val trainSuccCount = labelAndPreds.filter(r => r._1 ==
r._2).count.toDouble

  print(model)
  println(trainErr)
  println(trainSucc)
  println(trainErrCount)
  println(trainSuccCount)


I have also tried the following:

  val model_string = model.toString()
  print(model_string)


And I still do not get the model to print but where it resides in memory.


Thanks,

J







*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*
ᐧ


Re: Window comparison matching using the sliding window functionality: feasibility

2014-09-30 Thread Jimmy McErlain
Not sure if this is what you are after but its based on a moving average
within spark...  I was building an ARIMA model on top of spark and this
helped me out a lot:

http://stackoverflow.com/questions/23402303/apache-spark-moving-average
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Sep 30, 2014 at 8:19 AM, nitinkak001  wrote:

> Any ideas guys?
>
> Trying to find some information online. Not much luck so far.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p15404.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>