Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Manku Timma
sc.applicationId gives the yarn appid.

On 11 May 2015 at 08:13, Mridul Muralidharan  wrote:

> We had a similar requirement, and as a stopgap, I currently use a
> suboptimal impl specific workaround - parsing it out of the
> stdout/stderr (based on log config).
> A better means to get to this is indeed required !
>
> Regards,
> Mridul
>
> On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!
>  wrote:
> > Hi,
> >   I used to submit my Spark yarn applications by using
> org.apache.spark.yarn.deploy.Client api so I can get the application id
> after I submit it. The following is the code that I have, but after
> upgrading to 1.3.1, the yarn Client class was made into a private class. Is
> there a particular reason why this Client class was made private?
> >   I know that there’s a new SparkSubmit object that can be used, but
> it’s not clear to me how I can use it to get the application id after
> submitting to the cluster.
> >   Thoughts?
> >
> > Thanks,
> > Ron
> >
> > class SparkLauncherServiceImpl extends SparkLauncherService {
> >
> >   override def runApp(conf: Configuration, appName: String, queue:
> String): ApplicationId = {
> > val ws = SparkLauncherServiceImpl.getWorkspace()
> > val params = Array("--class", //
> > "com.xyz.sparkdb.service.impl.AssemblyServiceImpl", //
> > "--name", appName, //
> > "--queue", queue, //
> > "--driver-memory", "1024m", //
> > "--addJars",
> getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), //
> > "--jar",
> s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar")
> > System.setProperty("SPARK_YARN_MODE", "true")
> > System.setProperty("spark.driver.extraJavaOptions",
> "-XX:PermSize=128m -XX:MaxPermSize=128m
> -Dsun.io.serialization.extendedDebugInfo=true")
> > val sparkConf = new SparkConf()
> > val args = new ClientArguments(params, sparkConf)
> > new Client(args, conf, sparkConf).runApp()
> >   }
> >
> >   private def getListOfDependencyJars(baseDir: String): String = {
> > val files = new
> File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly"))
> > val prependedFiles = files.map(x => "file:" + x.getAbsolutePath())
> > val result = ((prependedFiles.tail.foldLeft(new
> StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(",
> ").append(e)}).toString()
> > result
> >   }
> > }
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Mridul Muralidharan
We had a similar requirement, and as a stopgap, I currently use a
suboptimal impl specific workaround - parsing it out of the
stdout/stderr (based on log config).
A better means to get to this is indeed required !

Regards,
Mridul

On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!
 wrote:
> Hi,
>   I used to submit my Spark yarn applications by using 
> org.apache.spark.yarn.deploy.Client api so I can get the application id after 
> I submit it. The following is the code that I have, but after upgrading to 
> 1.3.1, the yarn Client class was made into a private class. Is there a 
> particular reason why this Client class was made private?
>   I know that there’s a new SparkSubmit object that can be used, but it’s not 
> clear to me how I can use it to get the application id after submitting to 
> the cluster.
>   Thoughts?
>
> Thanks,
> Ron
>
> class SparkLauncherServiceImpl extends SparkLauncherService {
>
>   override def runApp(conf: Configuration, appName: String, queue: String): 
> ApplicationId = {
> val ws = SparkLauncherServiceImpl.getWorkspace()
> val params = Array("--class", //
> "com.xyz.sparkdb.service.impl.AssemblyServiceImpl", //
> "--name", appName, //
> "--queue", queue, //
> "--driver-memory", "1024m", //
> "--addJars", 
> getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), //
> "--jar", 
> s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar")
> System.setProperty("SPARK_YARN_MODE", "true")
> System.setProperty("spark.driver.extraJavaOptions", "-XX:PermSize=128m 
> -XX:MaxPermSize=128m -Dsun.io.serialization.extendedDebugInfo=true")
> val sparkConf = new SparkConf()
> val args = new ClientArguments(params, sparkConf)
> new Client(args, conf, sparkConf).runApp()
>   }
>
>   private def getListOfDependencyJars(baseDir: String): String = {
> val files = new 
> File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly"))
> val prependedFiles = files.map(x => "file:" + x.getAbsolutePath())
> val result = ((prependedFiles.tail.foldLeft(new 
> StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(", 
> ").append(e)}).toString()
> result
>   }
> }
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Change for submitting to yarn in 1.3.1

2015-05-10 Thread Ron's Yahoo!
Hi,
  I used to submit my Spark yarn applications by using 
org.apache.spark.yarn.deploy.Client api so I can get the application id after I 
submit it. The following is the code that I have, but after upgrading to 1.3.1, 
the yarn Client class was made into a private class. Is there a particular 
reason why this Client class was made private?
  I know that there’s a new SparkSubmit object that can be used, but it’s not 
clear to me how I can use it to get the application id after submitting to the 
cluster.
  Thoughts?

Thanks,
Ron

class SparkLauncherServiceImpl extends SparkLauncherService {
  
  override def runApp(conf: Configuration, appName: String, queue: String): 
ApplicationId = {
val ws = SparkLauncherServiceImpl.getWorkspace()
val params = Array("--class", //
"com.xyz.sparkdb.service.impl.AssemblyServiceImpl", //
"--name", appName, //
"--queue", queue, //
"--driver-memory", "1024m", //
"--addJars", 
getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), //
"--jar", 
s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar")
System.setProperty("SPARK_YARN_MODE", "true")
System.setProperty("spark.driver.extraJavaOptions", "-XX:PermSize=128m 
-XX:MaxPermSize=128m -Dsun.io.serialization.extendedDebugInfo=true")
val sparkConf = new SparkConf()
val args = new ClientArguments(params, sparkConf)
new Client(args, conf, sparkConf).runApp()
  }
  
  private def getListOfDependencyJars(baseDir: String): String = {
val files = new 
File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly"))
val prependedFiles = files.map(x => "file:" + x.getAbsolutePath())
val result = ((prependedFiles.tail.foldLeft(new 
StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(", 
").append(e)}).toString()
result
  }
}



RE: [SparkSQL] cannot filter by a DateType column

2015-05-10 Thread Haopu Wang
Sorry, I was using Spark 1.3.x.

 

I cannot reproduce it in master.

 

But should I still open a JIRA because can I request it to be back
ported to 1.3.x branch? Thanks again!

 



From: Michael Armbrust [mailto:mich...@databricks.com] 
Sent: Saturday, May 09, 2015 2:41 AM
To: Haopu Wang
Cc: user; dev@spark.apache.org
Subject: Re: [SparkSQL] cannot filter by a DateType column

 

What version of Spark are you using?  It appears that at least in master
we are doing the conversion correctly, but its possible older versions
of applySchema do not.  If you can reproduce the same bug in master, can
you open a JIRA?

 

On Fri, May 8, 2015 at 1:36 AM, Haopu Wang  wrote:

I want to filter a DataFrame based on a Date column. 

 

If the DataFrame object is constructed from a scala case class, it's
working (either compare as String or Date). But if the DataFrame is
generated by specifying a Schema to an RDD, it doesn't work. Below is
the exception and test code.

 

Do you have any idea about the error? Thank you very much!

 

exception=

java.lang.ClassCastException: java.sql.Date cannot be cast to
java.lang.Integer

at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)

at
org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2$$
anonfun$apply$6.apply(Cast.scala:116)

at
org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$cata
lyst$expressions$Cast$$buildCast(Cast.scala:111)

at
org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2.a
pply(Cast.scala:116)

at
org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:426)

at
org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual.eval(predic
ates.scala:305)

at
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$
apply$1.apply(predicates.scala:30)

at
org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$
apply$1.apply(predicates.scala:30)

at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

 

code=

 

val conf = new
SparkConf().setAppName("DFTest").setMaster("local[*]")

val sc = new SparkContext(conf)

val sqlCtx = new HiveContext(sc)

import sqlCtx.implicits._



case class Test(dt: java.sql.Date)

 

val df = sc.makeRDD(Seq(Test(new java.sql.Date(115,4,7.toDF



var r = df.filter("dt >= '2015-05-06'")

r.explain(true)

r.show

println("==")

var r2 = df.filter("dt >= cast('2015-05-06' as DATE)")

r2.explain(true)

r2.show

println("==")

 

// "df2" doesn't do filter correct!!

val rdd2 = sc.makeRDD(Seq((Row(new java.sql.Date(115,4,7)



val schema = StructType(Array(StructField("dt", DateType, false)))



val df2 = sqlCtx.applySchema(rdd2, schema) 



r = df2.filter("dt >= '2015-05-06'")

r.explain(true)

r.show

println("==")



r2 = df2.filter("dt >= cast('2015-05-06' as DATE)")

r2.explain(true)

r2.show