showed us.
Yohann Jardin
Le 7/8/2018 à 6:11 PM, kant kodali a écrit :
@yohann sorry I am assuming you meant application master if so I believe spark
is the one that provides application master. Is there anyway to look for how
much resources are being requested and how much yarn is allowed
-am-resource-percent.
Regards,
Yohann Jardin
Le 7/8/2018 à 4:40 PM, kant kodali a écrit :
Hi,
It's on local mac book pro machine that has 16GB RAM 512GB disk and 8 vCpu! I
am not running any code since I can't even spawn spark-shell with yarn as
master as described in my previous email. I just
that you provide correctly the jar based on its
location. I have found it tricky in some cases.
As a debug try, if the jar is not on HDFS, you can copy it there and then
specify the full path in the extraclasspath property.
Regards,
Yohann Jardin
Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
I do
Plenty of documentation is available on Spark website itself:
http://spark.apache.org/docs/latest/#where-to-go-from-here
You’ll find deployment guides, tuning, etc.
Yohann Jardin
Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit :
Learning Spark - ORielly publication as a starter and official
t(1)), sum('amount), max('amount), min('create_time),
max('created_time)).show
Yohann Jardin
Le 10/7/2017 à 7:12 PM, Somasundaram Sekar a écrit :
Hi,
I have a GroupedData object, on which I perform aggregation of few columns
since GroupedData takes in map, I cannot perform multiple aggregate on the
Hello Asmath,
Your list exist inside the driver, but you try to add element in it from
the executors. They are in different processes, on different nodes, they
do not communicate just like that.
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions
There exist an action
For yarn, I'm speaking about the file fairscheduler.xml (if you kept the
default scheduling of Yarn):
https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format
Yohann Jardin
Le 7/28/2017 à 8:00 PM, jeff saremi a écrit :
The only relevant
-10-4-gb-of-10-4-gb-physic
Regards,
Yohann Jardin
Le 7/28/2017 à 6:05 PM, jeff saremi a écrit :
Thanks so much Yohann
I checked the Storage/Memory column in Executors status page. Well below where
I wanted to be.
I will try the suggestion on smaller data sets.
I am also well within the Yarn
.
Yohann Jardin
Le 7/28/2017 à 8:03 AM, jeff saremi a écrit :
I have the simplest job which i'm running against 100TB of data. The job keeps
failing with ExecutorLostFailure's on containers killed by Yarn for exceeding
memory limits
I have varied the executor-memory from 32GB to 96GB
Seen directly in the code:
/**
* Aggregate function: returns the average of the values in a group.
* Alias for avg.
*
* @group agg_funcs
* @since 1.4.0
*/
def mean(e: Column): Column = avg(e)
That's the same when the argument is the column name.
So no difference between
Hello Lukasz,
You can just:
val pairRdd = javapairrdd.rdd();
Then pairRdd will be of type RDD>, with K being
com.vividsolutions.jts.geom.Polygon, and V being
java.util.HashSet[com.vividsolutions.jts.geom.Polygon]
If you really want to continue with Java objects:
val
https://spark.apache.org/docs/2.1.0/building-spark.html#specifying-the-hadoop-version
Version Hadoop v2.2.0 only is the default build version, but other versions can
still be built. The package you downloaded is prebuilt for Hadoop 2.7 as said
on the download page, don't worry.
Yohann Jardin
Which version of Hadoop are you running on?
Yohann Jardin
Le 6/21/2017 à 1:06 AM, N B a écrit :
Ok some more info about this issue to see if someone can shine a light on what
could be going on. I turned on debug logging for
org.apache.spark.streaming.scheduler in the driver process
to argument on this topic.
Yohann Jardin
Le 6/11/2017 à 7:08 PM, vaquar khan a écrit :
Hi Kant,
Kafka is the message broker that using as Producers and Consumers and Spark
Streaming is used as the real time processing ,Kafka and Spark Streaming work
together not competitors.
Spark Streaming
to argument on this topic.
Yohann Jardin
Le 6/11/2017 à 7:08 PM, vaquar khan a écrit :
Hi Kant,
Kafka is the message broker that using as Producers and Consumers and Spark
Streaming is used as the real time processing ,Kafka and Spark Streaming work
together not competitors.
Spark Streaming
Hello everyone,
I'm having a hard time with time zones.
I have a Long representing a timestamp: 149636160, I want the output to be
2017-06-02 00:00:00
Based on
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html
The only function that helps formatting a
but the whole bunch available
that you (can) have.
Then using this information and indications on Spark website
(http://spark.apache.org/docs/latest/hardware-provisioning.html), you will be
able to specify the hardware of one node, and how many nodes you need (at least
3).
Yohann Jardin
Le 4
Hello,
I’m using spark 2.1.
Once a job completes, I want to write a Parquet file to, let’s say, the folder
/user/my_user/final_path/
However, I have other jobs reading files in that specific folder, so I need
those files to be completely written when there are in that folder.
So while the
Hello everyone,
I'm also really interested in the answers as I will be facing the same issue
soon.
Muthu, if you evaluate again Apache Ignite, can you share your results? I also
noticed Alluxio to store spark results in memory that you might want to
investigate.
In my case I want to use them
/spark-examples*.jar
after --class you specify the path, in your provided jar, to the Main you want
to run. You finish by specifying the jar that contains your main class.
Yohann Jardin
Le 2/25/2017 à 9:50 PM, Raymond Xie a écrit :
I am doing a spark streaming on a hortonworks sandbox and am stuck
Hello,
I'm using Spark 2.1.0 and hadoop 2.2.0.
When I launch jobs on Yarn, I can retrieve their information on Spark History
Server, except that the links to stdout/stderr of executors are wrong -> they
lead to their url while the job was running.
We have the flag
Hello everyone,
I'm trying to develop a WebService launching jobs. The WebService is based on
tomcat, and I'm working with Spark 2.1.0.
The SparkLauncher provides two method to launch the job. First
SparkLauncher.launch(), and
SparkLauncher.startApplication(SparkAppHandle.Listener...
22 matches
Mail list logo