Re: Change for submitting to yarn in 1.3.1
sc.applicationId gives the yarn appid. On 11 May 2015 at 08:13, Mridul Muralidharan wrote: > We had a similar requirement, and as a stopgap, I currently use a > suboptimal impl specific workaround - parsing it out of the > stdout/stderr (based on log config). > A better means to get to this is indeed required ! > > Regards, > Mridul > > On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo! > wrote: > > Hi, > > I used to submit my Spark yarn applications by using > org.apache.spark.yarn.deploy.Client api so I can get the application id > after I submit it. The following is the code that I have, but after > upgrading to 1.3.1, the yarn Client class was made into a private class. Is > there a particular reason why this Client class was made private? > > I know that there’s a new SparkSubmit object that can be used, but > it’s not clear to me how I can use it to get the application id after > submitting to the cluster. > > Thoughts? > > > > Thanks, > > Ron > > > > class SparkLauncherServiceImpl extends SparkLauncherService { > > > > override def runApp(conf: Configuration, appName: String, queue: > String): ApplicationId = { > > val ws = SparkLauncherServiceImpl.getWorkspace() > > val params = Array("--class", // > > "com.xyz.sparkdb.service.impl.AssemblyServiceImpl", // > > "--name", appName, // > > "--queue", queue, // > > "--driver-memory", "1024m", // > > "--addJars", > getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), // > > "--jar", > s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar") > > System.setProperty("SPARK_YARN_MODE", "true") > > System.setProperty("spark.driver.extraJavaOptions", > "-XX:PermSize=128m -XX:MaxPermSize=128m > -Dsun.io.serialization.extendedDebugInfo=true") > > val sparkConf = new SparkConf() > > val args = new ClientArguments(params, sparkConf) > > new Client(args, conf, sparkConf).runApp() > > } > > > > private def getListOfDependencyJars(baseDir: String): String = { > > val files = new > File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly")) > > val prependedFiles = files.map(x => "file:" + x.getAbsolutePath()) > > val result = ((prependedFiles.tail.foldLeft(new > StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(", > ").append(e)}).toString() > > result > > } > > } > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: Change for submitting to yarn in 1.3.1
We had a similar requirement, and as a stopgap, I currently use a suboptimal impl specific workaround - parsing it out of the stdout/stderr (based on log config). A better means to get to this is indeed required ! Regards, Mridul On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo! wrote: > Hi, > I used to submit my Spark yarn applications by using > org.apache.spark.yarn.deploy.Client api so I can get the application id after > I submit it. The following is the code that I have, but after upgrading to > 1.3.1, the yarn Client class was made into a private class. Is there a > particular reason why this Client class was made private? > I know that there’s a new SparkSubmit object that can be used, but it’s not > clear to me how I can use it to get the application id after submitting to > the cluster. > Thoughts? > > Thanks, > Ron > > class SparkLauncherServiceImpl extends SparkLauncherService { > > override def runApp(conf: Configuration, appName: String, queue: String): > ApplicationId = { > val ws = SparkLauncherServiceImpl.getWorkspace() > val params = Array("--class", // > "com.xyz.sparkdb.service.impl.AssemblyServiceImpl", // > "--name", appName, // > "--queue", queue, // > "--driver-memory", "1024m", // > "--addJars", > getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), // > "--jar", > s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar") > System.setProperty("SPARK_YARN_MODE", "true") > System.setProperty("spark.driver.extraJavaOptions", "-XX:PermSize=128m > -XX:MaxPermSize=128m -Dsun.io.serialization.extendedDebugInfo=true") > val sparkConf = new SparkConf() > val args = new ClientArguments(params, sparkConf) > new Client(args, conf, sparkConf).runApp() > } > > private def getListOfDependencyJars(baseDir: String): String = { > val files = new > File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly")) > val prependedFiles = files.map(x => "file:" + x.getAbsolutePath()) > val result = ((prependedFiles.tail.foldLeft(new > StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(", > ").append(e)}).toString() > result > } > } > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Change for submitting to yarn in 1.3.1
Hi, I used to submit my Spark yarn applications by using org.apache.spark.yarn.deploy.Client api so I can get the application id after I submit it. The following is the code that I have, but after upgrading to 1.3.1, the yarn Client class was made into a private class. Is there a particular reason why this Client class was made private? I know that there’s a new SparkSubmit object that can be used, but it’s not clear to me how I can use it to get the application id after submitting to the cluster. Thoughts? Thanks, Ron class SparkLauncherServiceImpl extends SparkLauncherService { override def runApp(conf: Configuration, appName: String, queue: String): ApplicationId = { val ws = SparkLauncherServiceImpl.getWorkspace() val params = Array("--class", // "com.xyz.sparkdb.service.impl.AssemblyServiceImpl", // "--name", appName, // "--queue", queue, // "--driver-memory", "1024m", // "--addJars", getListOfDependencyJars(s"$ws/ledp/le-sparkdb/target/dependency"), // "--jar", s"file:$ws/ledp/le-sparkdb/target/le-sparkdb-1.0.3-SNAPSHOT.jar") System.setProperty("SPARK_YARN_MODE", "true") System.setProperty("spark.driver.extraJavaOptions", "-XX:PermSize=128m -XX:MaxPermSize=128m -Dsun.io.serialization.extendedDebugInfo=true") val sparkConf = new SparkConf() val args = new ClientArguments(params, sparkConf) new Client(args, conf, sparkConf).runApp() } private def getListOfDependencyJars(baseDir: String): String = { val files = new File(baseDir).listFiles().filter(!_.getName().startsWith("spark-assembly")) val prependedFiles = files.map(x => "file:" + x.getAbsolutePath()) val result = ((prependedFiles.tail.foldLeft(new StringBuilder(prependedFiles.head))) {(acc, e) => acc.append(", ").append(e)}).toString() result } }
RE: [SparkSQL] cannot filter by a DateType column
Sorry, I was using Spark 1.3.x. I cannot reproduce it in master. But should I still open a JIRA because can I request it to be back ported to 1.3.x branch? Thanks again! From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Saturday, May 09, 2015 2:41 AM To: Haopu Wang Cc: user; dev@spark.apache.org Subject: Re: [SparkSQL] cannot filter by a DateType column What version of Spark are you using? It appears that at least in master we are doing the conversion correctly, but its possible older versions of applySchema do not. If you can reproduce the same bug in master, can you open a JIRA? On Fri, May 8, 2015 at 1:36 AM, Haopu Wang wrote: I want to filter a DataFrame based on a Date column. If the DataFrame object is constructed from a scala case class, it's working (either compare as String or Date). But if the DataFrame is generated by specifying a Schema to an RDD, it doesn't work. Below is the exception and test code. Do you have any idea about the error? Thank you very much! exception= java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2$$ anonfun$apply$6.apply(Cast.scala:116) at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$cata lyst$expressions$Cast$$buildCast(Cast.scala:111) at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToString$2.a pply(Cast.scala:116) at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:426) at org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual.eval(predic ates.scala:305) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$ apply$1.apply(predicates.scala:30) at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$ apply$1.apply(predicates.scala:30) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) code= val conf = new SparkConf().setAppName("DFTest").setMaster("local[*]") val sc = new SparkContext(conf) val sqlCtx = new HiveContext(sc) import sqlCtx.implicits._ case class Test(dt: java.sql.Date) val df = sc.makeRDD(Seq(Test(new java.sql.Date(115,4,7.toDF var r = df.filter("dt >= '2015-05-06'") r.explain(true) r.show println("==") var r2 = df.filter("dt >= cast('2015-05-06' as DATE)") r2.explain(true) r2.show println("==") // "df2" doesn't do filter correct!! val rdd2 = sc.makeRDD(Seq((Row(new java.sql.Date(115,4,7) val schema = StructType(Array(StructField("dt", DateType, false))) val df2 = sqlCtx.applySchema(rdd2, schema) r = df2.filter("dt >= '2015-05-06'") r.explain(true) r.show println("==") r2 = df2.filter("dt >= cast('2015-05-06' as DATE)") r2.explain(true) r2.show