Hi, I am trying to get the application id after I use SparkSubmit.main for a
yarn submission. I am able to make it asynchronous using
spark.yarn.watForCompletion=false configuration option, but I can't seem to
figure out how I can get the application id for this job. I read both
, 2017 at 7:57 PM, Keith Chapman <keithgchap...@gmail.com> wrote:
Hi Ron,
You can try using the toDebugString method on the RDD, this will print the RDD
lineage.
Regards,Keith.
http://keith-chapman.com
On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez <zlgonza...@yahoo.com.invalid>
Hi, Can someone point me to a test case or share sample code that is able to
extract the RDD graph from a Spark job anywhere during its lifecycle? I
understand that Spark has UI that can show the graph of the execution so I'm
hoping that is using some API somewhere that I could use. I know
Hi,
After I create a table in spark sql and load infile an hdfs file to
it, the file is no longer queryable if I do hadoop fs -ls.
Is this expected?
Thanks,
Ron
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hi,
Question on using spark sql.
Can someone give an example for creating table from a directory
containing parquet files in HDFS instead of an actual parquet file?
Thanks,
Ron
On 07/21/2015 01:59 PM, Brandon White wrote:
A few questions about caching a table in Spark SQL.
1) Is there
I'd use Random Forest. It will give you better generalizability. There
are also a number of things you can do with RF that allows to train on
samples of the massive data set and then just average over the resulting
models...
Thanks,
Ron
On 07/21/2015 02:17 PM, Olivier Girardot wrote:
depends
-the-thrift-jdbcodbc-server
On Mon, Jul 13, 2015 at 6:31 PM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Well for adhoc queries you can use the CLI
On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez
zlgonza...@yahoo.com.invalid wrote:
Hi,
I have a question for Spark SQL. Is there a way
Hi,
I have a question for Spark SQL. Is there a way to be able to use
Spark SQL on YARN without having to submit a job?
Bottom line here is I want to be able to reduce the latency of
running queries as a job. I know that the spark sql default submission
is like a job, but was wondering if
If you're running on Ubuntu, do ulimit -n, which gives the max number of
allowed open files. You will have to change the value in
/etc/security/limits.conf to something like 1, logout and log back in.
Thanks,
Ron
Sent from my iPad
On Aug 10, 2014, at 10:19 PM, Davies Liu
Hi Vida,
It's possible to save an RDD as a hadoop file using hadoop output formats. It
might be worthwhile to investigate using DBOutputFormat and see if this will
work for you.
I haven't personally written to a db, but I'd imagine this would be one way
to do it.
Thanks,
Ron
Sent from my
Cool thanks!
On Monday, August 4, 2014 8:58 AM, kriskalish k...@kalish.net wrote:
Hey Ron,
It was pretty much exactly as Sean had depicted. I just needed to provide
count an anonymous function to tell it which elements to count. Since I
wanted to count them all, the function is simply
One key thing I forgot to mention is that I changed the avro version to 1.7.7
to get AVRO-1476.
I took a closer look at the jars, and what I noticed is that the assembly jars
that work do not have the org.apache.avro.mapreduce package packaged into the
assembly. For spark-1.0.1,
You have to import org.apache.spark.rdd._, which will automatically make
available this method.
Thanks,
Ron
Sent from my iPhone
On Aug 1, 2014, at 3:26 PM, touchdown yut...@gmail.com wrote:
Hi, I am facing a similar dilemma. I am trying to aggregate a bunch of small
avro files into one
Can you share the mapValues approach you did?
Thanks,
Ron
Sent from my iPhone
On Aug 1, 2014, at 3:00 PM, kriskalish k...@kalish.net wrote:
Thanks for the help everyone. I got the mapValues approach working. I will
experiment with the reduceByKey approach later.
3
-Kris
--
Hi,
I took avro 1.7.7 and recompiled my distribution to be able to fix the issue
when dealing with avro GenericRecord. The issue I got was resolved. I'm
referring to AVRO-1476.
I also enabled kryo registration in SparkConf.
That said, I am still seeing a NotSerializableException for
.
Can you try with cloning the records in the map call? Also look at the
contents and see if they're actually changed, or if the resulting RDD after a
cache is just the last record smeared across all the others.
Cheers,
Andrew
On Thu, Jul 24, 2014 at 2:41 PM, Ron Gonzalez zlgonza
Folks,
I've been able to submit simple jobs to yarn thus far. However, when I did
something more complicated that added 194 dependency jars using --addJars, the
job fails in YARN with no logs. What ends up happening is that no container
logs get created (app master or executor). If I add just
Hi,
I'm doing the following:
def main(args: Array[String]) = {
val sparkConf = new SparkConf().setAppName(AvroTest).setMaster(local[2])
val sc = new SparkContext(sparkConf)
val conf = new Configuration()
val job = new Job(conf)
val path = new Path(/tmp/a.avro);
val
Hi,
I was doing programmatic submission of Spark yarn jobs and I saw code in
ClientBase.getDefaultYarnApplicationClasspath():
val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
MRJobConfig doesn't have this field so the created launch env is incomplete.
Workaround
The idea behind YARN is that you can run different application types like
MapReduce, Storm and Spark.
I would recommend that you build your spark jobs in the main method without
specifying how you deploy it. Then you can use spark-submit to tell Spark how
you would want to deploy to it using
Koert,
Yeah I had the same problems trying to do programmatic submission of spark jobs
to my Yarn cluster. I was ultimately able to resolve it by reviewing the
classpath and debugging through all the different things that the Spark Yarn
client (Client.scala) did for submitting to Yarn (like env
I am able to use Client.scala or LauncherExecutor.scala as my programmatic
entry point for Yarn.
Thanks,
Ron
Sent from my iPad
On Jul 9, 2014, at 7:14 AM, Jerry Lam chiling...@gmail.com wrote:
+1 as well for being able to submit jobs programmatically without using shell
script.
we
Btw, I'm on 0.9.1. Will setting a queue programmatically be available in 1.0?
Thanks,
Ron
Sent from my iPad
On May 20, 2014, at 6:27 PM, Ron Gonzalez zlgonza...@yahoo.com wrote:
Hi Sandy,
Is there a programmatic way? We're building a platform as a service and
need to assign
Hi,
How does one submit a spark job to yarn and specify a queue?
The code that successfully submits to yarn is:
val conf = new SparkConf()
val sc = new SparkContext(yarn-client, Simple App, conf)
Where do I need to specify the queue?
Thanks in advance for any help on this...
Hi,
Can you explain a little more what's going on? Which one submits a job to the
yarn cluster that creates an application master and spawns containers for the
local jobs? I tried yarn-client and submitted to our yarn cluster and it seems
to work that way. Shouldn't Client.scala be running
://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/*ProtoBuf*
or just whatever your using at the moment to open them in a MR job probably
could be re-purposed
On Thu, Apr 3, 2014 at 7:11 AM, Ron Gonzalez zlgonza...@yahoo.com wrote
Hi,
I have a small program but I cannot seem to make it connect to the right
properties of the cluster.
I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.
If I run this scala file, I am seeing that this is never using the
yarn.resourcemanager.address property that I set
27 matches
Mail list logo