from:"yohann jardin"

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin

showed us. Yohann Jardin Le 7/8/2018 à 6:11 PM, kant kodali a écrit : @yohann sorry I am assuming you meant application master if so I believe spark is the one that provides application master. Is there anyway to look for how much resources are being requested and how much yarn is allowed

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin

-am-resource-percent. Regards, Yohann Jardin Le 7/8/2018 à 4:40 PM, kant kodali a écrit : Hi, It's on local mac book pro machine that has 16GB RAM 512GB disk and 8 vCpu! I am not running any code since I can't even spawn spark-shell with yarn as master as described in my previous email. I just

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread yohann jardin

that you provide correctly the jar based on its location. I have found it tricky in some cases. As a debug try, if the jar is not on HDFS, you can copy it there and then specify the full path in the extraclasspath property. Regards, Yohann Jardin Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit : I do

Re: learning Spark

2017-12-04 Thread yohann jardin

Plenty of documentation is available on Spark website itself: http://spark.apache.org/docs/latest/#where-to-go-from-here You’ll find deployment guides, tuning, etc. Yohann Jardin Le 05-Dec-17 à 1:38 AM, Somasundaram Sekar a écrit : Learning Spark - ORielly publication as a starter and official

Re: DataFrame multiple agg on the same column

2017-10-07 Thread yohann jardin

t(1)), sum('amount), max('amount), min('create_time), max('created_time)).show Yohann Jardin Le 10/7/2017 à 7:12 PM, Somasundaram Sekar a écrit : Hi, I have a GroupedData object, on which I perform aggregation of few columns since GroupedData takes in map, I cannot perform multiple aggregate on the

Re: add arraylist to dataframe

2017-08-29 Thread yohann jardin

Hello Asmath, Your list exist inside the driver, but you try to add element in it from the executors. They are in different processes, on different nodes, they do not communicate just like that. https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions There exist an action

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin

For yarn, I'm speaking about the file fairscheduler.xml (if you kept the default scheduling of Yarn): https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format Yohann Jardin Le 7/28/2017 à 8:00 PM, jeff saremi a écrit : The only relevant

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin

-10-4-gb-of-10-4-gb-physic Regards, Yohann Jardin Le 7/28/2017 à 6:05 PM, jeff saremi a écrit : Thanks so much Yohann I checked the Storage/Memory column in Executors status page. Well below where I wanted to be. I will try the suggestion on smaller data sets. I am also well within the Yarn

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin

. Yohann Jardin Le 7/28/2017 à 8:03 AM, jeff saremi a écrit : I have the simplest job which i'm running against 100TB of data. The job keeps failing with ExecutorLostFailure's on containers killed by Yarn for exceeding memory limits I have varied the executor-memory from 32GB to 96GB

RE: Is there a difference between these aggregations

2017-07-24 Thread yohann jardin

Seen directly in the code: /** * Aggregate function: returns the average of the values in a group. * Alias for avg. * * @group agg_funcs * @since 1.4.0 */ def mean(e: Column): Column = avg(e) That's the same when the argument is the column name. So no difference between

RE: [Spark] Working with JavaPairRDD from Scala

2017-07-22 Thread yohann jardin

Hello Lukasz, You can just: val pairRdd = javapairrdd.rdd(); Then pairRdd will be of type RDD>, with K being com.vividsolutions.jts.geom.Polygon, and V being java.util.HashSet[com.vividsolutions.jts.geom.Polygon] If you really want to continue with Java objects: val

Re: Spark 2.1.1 and Hadoop version 2.2 or 2.7?

2017-06-21 Thread yohann jardin

https://spark.apache.org/docs/2.1.0/building-spark.html#specifying-the-hadoop-version Version Hadoop v2.2.0 only is the default build version, but other versions can still be built. The package you downloaded is prebuilt for Hadoop 2.7 as said on the download page, don't worry. Yohann Jardin

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-21 Thread yohann jardin

Which version of Hadoop are you running on? Yohann Jardin Le 6/21/2017 à 1:06 AM, N B a écrit : Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread yohann jardin

to argument on this topic. Yohann Jardin Le 6/11/2017 à 7:08 PM, vaquar khan a écrit : Hi Kant, Kafka is the message broker that using as Producers and Consumers and Spark Streaming is used as the real time processing ,Kafka and Spark Streaming work together not competitors. Spark Streaming

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread yohann jardin

to argument on this topic. Yohann Jardin Le 6/11/2017 à 7:08 PM, vaquar khan a écrit : Hi Kant, Kafka is the message broker that using as Producers and Consumers and Spark Streaming is used as the real time processing ,Kafka and Spark Streaming work together not competitors. Spark Streaming

Spark SQL, formatting timezone in UTC

2017-06-02 Thread yohann jardin

Hello everyone, I'm having a hard time with time zones. I have a Long representing a timestamp: 149636160, I want the output to be 2017-06-02 00:00:00 Based on https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html The only function that helps formatting a

Re: Recommended cluster parameters

2017-04-30 Thread yohann jardin

but the whole bunch available that you (can) have. Then using this information and indications on Spark website (http://spark.apache.org/docs/latest/hardware-provisioning.html), you will be able to specify the hardware of one node, and how many nodes you need (at least 3). Yohann Jardin Le 4

Writing dataframe to a final path using another temporary path

2017-03-28 Thread yohann jardin

Hello, I’m using spark 2.1. Once a job completes, I want to write a Parquet file to, let’s say, the folder /user/my_user/final_path/ However, I have other jobs reading files in that specific folder, so I need those files to be completely written when there are in that folder. So while the

RE: RE: Fast write datastore...

2017-03-16 Thread yohann jardin

Hello everyone, I'm also really interested in the answers as I will be facing the same issue soon. Muthu, if you evaluate again Apache Ignite, can you share your results? I also noticed Alluxio to store spark results in memory that you might want to investigate. In my case I want to use them

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread yohann jardin

/spark-examples*.jar after --class you specify the path, in your provided jar, to the Main you want to run. You finish by specifying the jar that contains your main class. Yohann Jardin Le 2/25/2017 à 9:50 PM, Raymond Xie a écrit : I am doing a spark streaming on a hortonworks sandbox and am stuck

Executor links in Job History

2017-02-22 Thread yohann jardin

Hello, I'm using Spark 2.1.0 and hadoop 2.2.0. When I launch jobs on Yarn, I can retrieve their information on Spark History Server, except that the links to stdout/stderr of executors are wrong -> they lead to their url while the job was running. We have the flag

Issues launching job dynamically in Java

2017-02-08 Thread yohann jardin

Hello everyone, I'm trying to develop a WebService launching jobs. The WebService is based on tomcat, and I'm working with Spark 2.1.0. The SparkLauncher provides two method to launch the job. First SparkLauncher.launch(), and SparkLauncher.startApplication(SparkAppHandle.Listener...

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: Spark LOCAL mode and external jar (extraClassPath)

Re: learning Spark

Re: DataFrame multiple agg on the same column

Re: add arraylist to dataframe

Re: How to configure spark on Yarn cluster

Re: How to configure spark on Yarn cluster

Re: How to configure spark on Yarn cluster

RE: Is there a difference between these aggregations

RE: [Spark] Working with JavaPairRDD from Scala

Re: Spark 2.1.1 and Hadoop version 2.2 or 2.7?

Re: Flume DStream produces 0 records after HDFS node killed

Re: What is the real difference between Kafka streaming and Spark Streaming?

Re: What is the real difference between Kafka streaming and Spark Streaming?

Spark SQL, formatting timezone in UTC

Re: Recommended cluster parameters

Writing dataframe to a final path using another temporary path

RE: RE: Fast write datastore...

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

Executor links in Job History

Issues launching job dynamically in Java

22 matches

Site Navigation

Mail list logo

Footer information