date:20160405

Re: Sqoop on Spark

2016-04-05 Thread ayan guha

Thanks guys for feedback. On Wed, Apr 6, 2016 at 3:44 PM, Jörn Franke wrote: > I do not think you can be more resource efficient. In the end you have to > store the data anyway on HDFS . You have a lot of development effort for > doing something like sqoop. Especially with

Sqoop on Spark

2016-04-05 Thread ayan guha

Hi All Asking opinion: is it possible/advisable to use spark to replace what sqoop does? Any existing project done in similar lines? -- Best Regards, Ayan Guha

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Nick Pentreath

+1 for this proposal - as you mention I think it's the defacto current situation anyway. Note that from a developer view it's just the user-facing API that will be only "ml" - the majority of the actual algorithms still operate on RDDs under the good currently. On Wed, 6 Apr 2016 at 05:03, Chris

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Chris Fregly

perhaps renaming to Spark ML would actually clear up code and documentation confusion? +1 for rename > On Apr 5, 2016, at 7:00 PM, Reynold Xin wrote: > > +1 > > This is a no brainer IMO. > > >> On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Reynold Xin

+1 This is a no brainer IMO. On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia

Re: lost executor due to large shuffle spill memory

2016-04-05 Thread Michael Slavitch

Do you have enough disk space for the spill? It seems it has lots of memory reserved but not enough for the spill. You will need a disk that can handle the entire data partition for each host. Compression of the spilled data saves about 50% in most if not all cases. Given the large data set I

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Holden Karau

I'm very much in favor of this, the less porting work there is the better :) On Tue, Apr 5, 2016 at 5:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Joseph Bradley

+1 By the way, the JIRA for tracking (Scala) API parity is: https://issues.apache.org/jira/browse/SPARK-4591 On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia wrote: > This sounds good to me as well. The one thing we should pay attention to > is how we update the docs so

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Matei Zaharia

This sounds good to me as well. The one thing we should pay attention to is how we update the docs so that people know to start with the spark.ml classes. Right now the docs list spark.mllib first and also seem more comprehensive in that area than in spark.ml, so maybe people naturally move

Re: I want to unsubscribe

2016-04-05 Thread Jakob Odersky

to unsubscribe, send an email to user-unsubscr...@spark.apache.org On Tue, Apr 5, 2016 at 4:50 PM, Ranjana Rajendran wrote: > I get to see the threads in the public mailing list. I don;t want so many > messages in my inbox. I want to unsubscribe.

I want to unsubscribe

2016-04-05 Thread Ranjana Rajendran

I get to see the threads in the public mailing list. I don;t want so many messages in my inbox. I want to unsubscribe.

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson

Based on my experience if you use ³simple² streaming. (I.E. You do not use windows) after every mini batch you will ³save² This will cause a dir in hdfs with the timestamp as part of the path. With in the dir, a separate part file will be created for each partition. If you used windowing you

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Mich Talebzadeh

I agree every time an OS file is created, it requires a context switch plus a file descriptor. It is probably more time consuming to open and close these files than actually doing the work. I always wondered about performance implication of Spark streaming and although there are some early days

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson

In my experience my streaming I was getting tens of thousands of empty files created in HDFS. This was crushing my systems performance when my batch jobs ran over the data sets. There is a lot of over head open and closing empty files. I think creating empty files or keeping empty partitions

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Mich Talebzadeh

Thanks Andy. Do we know if this is a known bug or simply a feature that on the face of it Spark cannot save RDD output to a text file? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

lost executor due to large shuffle spill memory

2016-04-05 Thread lllll

I have a task to remap the index to actual uuid in ALS prediction results. But it consistently fail due to lost executors. I noticed there's large shuffle spill memory but I don't know how to improve it. I've tried to

Re: Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Andy Davidson

Hi Mich Yup I was surprised to find empty files. Its easy to work around. Note I should probably use coalesce() and not repartition() In general I found I almost always need to reparation. I was getting thousands of empty partitions. It was really slowing my system down. private static void

Saving Spark streaming RDD with saveAsTextFiles ends up creating empty files on HDFS

2016-04-05 Thread Mich Talebzadeh

Spark 1.6.1 The following creates empty files. It prints lines OK with println val result = lines.filter(_.contains("ASE 15")).flatMap(line => line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + _) result.saveAsTextFiles("/tmp/rdd_stuff") I am getting zero length files drwxr-xr-x -

Re: Spark error with checkpointing

2016-04-05 Thread Cody Koeninger

http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-and-broadcast-variables On Tue, Apr 5, 2016 at 3:51 PM, Akhilesh Pathodia wrote: > Hi, > > I am running spark jobs on yarn in cluster mode. The job reads the messages > from kafka

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh

If I go through each RDD I get val result = lines.filter(_.contains("Sending messages")).flatMap(line => line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + _) scala> result.foreachRDD( rdd => { | for(item <- rdd.collect().toArray) { | println(item); | } | }) Rather

RE: Partition pruning in spark 1.5.2

2016-04-05 Thread Yong Zhang

Hi, Michael: I would like to ask the same question, if the DF hash partitioned, then cache, now query/filter by the column which hashed for partition, will Spark be smart enough to do the Partition pruning in this case, instead of depending on Parquet's partition pruning. I think that is the

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng

Yes, DB (cc'ed) is working on porting the local linear algebra library over (SPARK-13944). There are also frequent pattern mining algorithms we need to port over in order to reach feature parity. -Xiangrui On Tue, Apr 5, 2016 at 12:08 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote:

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Darshan Singh

Thanks a lot. I will try this one as well. On Tue, Apr 5, 2016 at 9:28 PM, Michael Armbrust wrote: > The following should ensure partition pruning happens: > > df.write.partitionBy("country").save("/path/to/data") > sqlContext.read.load("/path/to/data").where("country =

Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg

Yes, I realize that there's a standard way and then there's the way where client asks 'how fast can it write the data'. That is what I'm trying to figure out. At the moment I'm far from disks teorethical write speed when combining all the disks together. On 05 Apr 2016, at 23:21, Mich

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Michael Armbrust

The following should ensure partition pruning happens: df.write.partitionBy("country").save("/path/to/data") sqlContext.read.load("/path/to/data").where("country = 'UK'") On Tue, Apr 5, 2016 at 1:13 PM, Darshan Singh wrote: > Thanks for the reply. > > Now I saved the

Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg

Yep, I used dfsio and also Teragen but I would like to experiment with ad-hoc Spark prog. -jan On 05 Apr 2016, at 23:13, Sebastian Piu > wrote: You could they using TestDFSIO for raw hdfs performance, but we found it not very relevant

Re: Stress testing hdfs with Spark

2016-04-05 Thread Mich Talebzadeh

so that throughput per second. You can try Spark streaming saving it to HDFS and increase the throttle. The general accepted form is to measure service time which is the average service time for IO requests in ms Dr Mich Talebzadeh LinkedIn *

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Darshan Singh

Thanks for the reply. Now I saved the part_movies as parquet file. Then created new dataframe from the saved parquet file and I did not persist it. The i ran the same query. It still read all 20 partitions and this time from hdfs. So what will be exact scenario when it will prune partitions. I

Re: Stress testing hdfs with Spark

2016-04-05 Thread Sebastian Piu

You could they using TestDFSIO for raw hdfs performance, but we found it not very relevant Another way could be to either generate a file and then read it and write it back. For some of our use cases we are populated a Kafka queue on the cluster (on different disks) and used spark streaming to do

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Michael Armbrust

For the in-memory cache, we still launch tasks, we just skip blocks when possible using statistics about those blocks. On Tue, Apr 5, 2016 at 12:14 PM, Darshan Singh wrote: > Thanks. It is not my exact scenario but I have tried to reproduce it. I > have used 1.5.2. > > I

Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg

I'm trying to get rough estimate how much data I can write within certain time period (GB/sec). -jan On 05 Apr 2016, at 22:49, Mich Talebzadeh > wrote: Hi Jan, What is the definition of stress test in here? What are the matrices?

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread Mich Talebzadeh

Hi Ashok. I forgot to mention that test of mine was from Sybase ASE. So I would say if the hash join start spilling to disk then the performance will start degrading. My hunch is either the Optimizer does not cater for it or Optimizer decides that it would be cheaper to use Nested Loop Join as

Re: Stress testing hdfs with Spark

2016-04-05 Thread Mich Talebzadeh

Hi Jan, What is the definition of stress test in here? What are the matrices? Throughput of data, latency, velocity, volume? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg

Hi, I'm trying to figure out how to write lots of data from each worker. I tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. Increasing worker memory would mean that I should drop the number of workers. Soo, any idea how to write ex. 1gb file from each worker?

Re: how to use custom properties in spark app

2016-04-05 Thread Haroon Rasheed

Hi, You can have a custom properties file with Map like entries Key, Value pairs "URL"-> "IPaddress:port/user/" etc and put this file on HDFS or any location where Spark can access. Read the file as RDD as Map and read the values in program. You can also broadcast this in program if you need

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Darshan Singh

Thanks. It is not my exact scenario but I have tried to reproduce it. I have used 1.5.2. I have a part-movies data-frame which has 20 partitions 1 each for a movie. I created following query val part_sql = sqlContext.sql("select * from part_movies where movie = 10") part_sql.count() I expect

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Shivaram Venkataraman

Overall this sounds good to me. One question I have is that in addition to the ML algorithms we have a number of linear algebra (various distributed matrices) and statistical methods in the spark.mllib package. Is the plan to port or move these to the spark.ml namespace in the 2.x series ? Thanks

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Sean Owen

FWIW, all of that sounds like a good plan to me. Developing one API is certainly better than two. On Tue, Apr 5, 2016 at 7:01 PM, Xiangrui Meng wrote: > Hi all, > > More than a year ago, in Spark 1.2 we introduced the ML pipeline API built > on top of Spark SQL’s DataFrames.

Re: Can spark somehow help with this usecase?

2016-04-05 Thread Marco Mistroni

Many thanks for suggestion Andy! Kr Marco On 5 Apr 2016 7:25 pm, "Andy Davidson" wrote: > Hi Marco > > You might consider setting up some sort of ELT pipe line. One of your > stages might be to create a file of all the FTP URL. You could then write > a spark app

Re: dataframe sorting and find the index of the maximum element

2016-04-05 Thread Michael Armbrust

You should generally think of a DataFrame as unordered, unless you are explicitly asking for an order. One way to order and assign an index is with window functions . On Tue, Apr 5, 2016 at 4:17 AM, Angel

Re: Partition pruning in spark 1.5.2

2016-04-05 Thread Michael Armbrust

Can you show your full code. How are you partitioning the data? How are you reading it? What is the resulting query plan (run explain() or EXPLAIN). On Tue, Apr 5, 2016 at 10:02 AM, dsing001 wrote: > HI, > > I am using 1.5.2. I have a dataframe which is partitioned

Re: Can spark somehow help with this usecase?

2016-04-05 Thread Andy Davidson

Hi Marco You might consider setting up some sort of ELT pipe line. One of your stages might be to create a file of all the FTP URL. You could then write a spark app that just fetches the urls and stores the data in some sort of data base or on the file system (hdfs?) My guess would be to maybe

Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng

Hi all, More than a year ago, in Spark 1.2 we introduced the ML pipeline API built on top of Spark SQL’s DataFrames. Since then the new DataFrame-based API has been developed under the spark.ml package, while the old RDD-based API has been developed in parallel under the spark.mllib package.

RE: Plan issue with spark 1.5.2

2016-04-05 Thread Yong Zhang

You need to show us the execution plan, so we can understand what is your issue. Use the spark shell code to show how your DF is built, how you partition them, then use explain(true) on your join DF, and show the output here, so we can better help you. Yong > Date: Tue, 5 Apr 2016 09:46:59

Re: dataframe sorting and find the index of the maximum element

2016-04-05 Thread Ted Yu

The error was due to REPL expecting an integer (index to the Array) whereas "MAX(count)" was a String. What do you want to achieve ? On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote: > Hello, > > i am writing one spark application i which i need the index of the

Partition pruning in spark 1.5.2

2016-04-05 Thread dsing001

HI, I am using 1.5.2. I have a dataframe which is partitioned based on the country. So I have around 150 partition in the dataframe. When I run sparksql and use country = 'UK' it still reads all partitions and not able to prune other partitions. Thus all the queries run for similar times

Fwd: Facing Unusual Behavior with the executors in spark streaming

2016-04-05 Thread Abhishek Anand

Hi , Needed inputs for a couple of issue that I am facing in my production environment. I am using spark version 1.4.0 spark streaming. 1) It so happens that the worker is lost on a machine and the executor still shows up in the executor's tab in the UI. Even when I kill a worker using kill -9

Plan issue with spark 1.5.2

2016-04-05 Thread dsing001

I am using spark 1.5.2. I have a question regarding plan generated by spark. I have 3 data-frames which has the data for different countries. I have around 150 countries and data is skewed. My 95% queries will have country as criteria. However, I have seen issues with the plans generated for

Re: --packages configuration equivalent item name?

2016-04-05 Thread Russell Jurney

Thanks! These aren't in the docs, I will make a JIRA to add them. On Monday, April 4, 2016, Saisai Shao wrote: > spark.jars.ivy, spark.jars.packages, spark.jars.excludes is the > configurations you can use. > > Thanks > Saisai > > On Sun, Apr 3, 2016 at 1:59 AM, Russell

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread ashokkumar rajendran

Hi Mich, Yes, Optimizer just chooses this on its own. The point of concern here is, this optimization does not work good in Large set Vs Small Set case. Nested Join is almost 10 times costlier than Hashed join or union join of 3 conditions. So is this a bug on optimizer or is this a request to

Can spark somehow help with this usecase?

2016-04-05 Thread Marco Mistroni

Hi I m currently using spark to process a file containing a million of rows(edgar quarterly filings files) Each row contains some infos plus a location of a remote file which I need to retrieve using FTP and then process it's content. I want to do all 3 operations ( process filing file, fetch

Re: RDD Partitions not distributed evenly to executors

2016-04-05 Thread Khaled Ammar

I have a similar experience. Using 32 machines, I can see than number of tasks (partitions) assigned to executors (machines) is not even. Moreover, the distribution change every stage (iteration). I wonder why Spark needs to move partitions around any way, should not the scheduler reduce network

Question around spark on EMR

2016-04-05 Thread Natu Lauchande

Hi, I am setting up a Scala spark streaming app in EMR . I wonder if anyone in the list can help me with the following question : 1. What's the approach that you guys have been using to submit in an EMR job step environment variables that will be needed by the Spark application ? 2. Can i have

GraphX replication factor

2016-04-05 Thread Khaled Ammar

Hi, I wonder if it is possible to figure out the replication factor used in GraphX partitioning from its log files. -- Thanks, -Khaled

Re: dataframe sorting and find the index of the maximum element

2016-04-05 Thread Ted Yu

Did you define idxmax() method yourself ? Thanks On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote: > Hello, > > i am writing one spark application i which i need the index of the maximum > element. > > My table has one column only and i want the index of the maximum

HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

2016-04-05 Thread Shiva Achari

Hi, I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java HotSpot(TM) 64 - Bit

Re: Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-05 Thread Mayur Pawashe

Hi. I am using 2.10.4 for Scala. 1.6.0 for Spark related dependencies. I am also using spark-streaming-kafka and including kafka (0.8.1.1) which apparently is needed for deserializers. > On Apr 4, 2016, at 6:18 PM, Ted Yu wrote: > > bq. I'm on version 2.10 for spark >

how to use custom properties in spark app

2016-04-05 Thread yaoxiaohua

Hi bro, I am new in spark application develop, I need develop two app running on spark cluster. Now I have some arguments for the application. I can pass them as the program arguments when spark-submit, I want to find a new way. I

Re: Could not load shims in class org.apache.hadoop.hive.schshim.FairSchedulerShim

2016-04-05 Thread ram kumar

I am facing this same issue. Can any1 help me with this Thanks On Mon, Dec 7, 2015 at 9:14 AM, Shige Song wrote: > Hard to tell. > > On Mon, Dec 7, 2015 at 11:35 AM, zhangjp <592426...@qq.com> wrote: > >> Hi all, >> >> I'm using saprk prebuild version 1.5.2+hadoop2.6 and

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh

This is the idea I have in mind I want to go through every line result.foreachRDD(rdd => rdd.foreach(println)) rather than print each line I want to save them temporarily and then add/append the result set (lines in RDD ) to a table for further analyses. It could be a Parquet or Hive table. So

Re: Can't able to access temp table via jdbc client

2016-04-05 Thread ram kumar

Thanks for you input. But, the jdbc client should be something like this, {{{ $ *./bin/beeline* Beeline version 1.5.2 by Apache Hive beeline>*!connect jdbc:hive2://ip:1* *show tables;* ++--+--+ | tableName | isTemporary | ++--+--+ | check

dataframe sorting and find the index of the maximum element

2016-04-05 Thread Angel Angel

Hello, i am writing one spark application i which i need the index of the maximum element. My table has one column only and i want the index of the maximum element. MAX(count) 23 32 3 Here is my code the data type of the array is org.apache.spark.sql.Dataframe. Thanks in advance. Also please

RE: [MARKETING] Timeout in mapWithState

2016-04-05 Thread Iain Cundy

Hi Abhi The concept is what you want – if you set StateSpec timeout to a Duration of 10 minutes then any keys no seen for more than 10 minutes will be deleted. However you did say “exactly” and specifically mention “removed from memory” in which case you may be interested in the much more

Profiling a spark job

2016-04-05 Thread Dmitry Olshansky

Hi list, I'm curious as to what are the best practices of profiling spark apps? So far I tried following this guide with hprof and/or yourkit but the profile looks strange: https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit 55% of time spent in

Re: Registering Metrics Source in Spark

2016-04-05 Thread Gideon

Hi, I don't have a specific solution to your problem but I was having some problems writing my own metrics with Spark a few months back I don't know if it helps but you can try and look at this thread

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal

sure, this will be help full try this https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_an_rdd_to_a_database.html On Tue, Apr 5, 2016 at 1:56 PM, Mich Talebzadeh wrote: > Thanks Sachin. Will test it > > I

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh

Thanks Sachin. Will test it I guess I can modify it to save the output to a Hive table as opposed to terminal Regards Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal

Hey , I have changed your example itself try this , it should work in terminal val result = lines.filter(_.contains("ASE 15")).filter(_ contains("UPDATE INDEX STATISTICS")).flatMap(line => line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + _) result.foreachRDD(rdd => rdd.foreach(println))

DataSet with Array member

2016-04-05 Thread JH P

Hi everyone. I have such class case class DistinctValues(statType: Int, dataType: Int, _id: Int, values: Array[(String, Long)], category: String) extends Serializable { I think this class won't work in case of DistinctValues. values.length > Int.MaxValue. Moreover I instantiate this class by

Re: Detecting application restart when running in supervised cluster mode

2016-04-05 Thread Saisai Shao

Hi Deepak, I don't think supervise can be worked with yarn, it is a standalone and Mesos specific feature. Thanks Saisai On Tue, Apr 5, 2016 at 3:23 PM, Deepak Sharma wrote: > Hi Rafael > If you are using yarn as the engine , you can always use RM UI to see the >

Re: Can't able to access temp table via jdbc client

2016-04-05 Thread Mich Talebzadeh

Hi temp tables are session specific and private to the session. You will not be able to see temp tables created by another session in HiveContext. Likewise creating a table in Hive using a syntax similar to below CREATE TEMPORARY TABLE tmp AS SELECT t.calendar_month_desc, c.channel_desc,

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh

Thanks. Currently this is what I am doing // Get the lines // val lines = messages.map(_._2) // Check for message val showResults = lines.filter(_.contains("Sending messages")).flatMap(line => line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + _).print(1000) So it prints max of 1000

Re: Detecting application restart when running in supervised cluster mode

2016-04-05 Thread Deepak Sharma

Hi Rafael If you are using yarn as the engine , you can always use RM UI to see the application progress. Thanks Deepak On Tue, Apr 5, 2016 at 12:18 PM, Rafael Barreto wrote: > Hello, > > I have a driver deployed using `spark-submit` in supervised cluster mode. >

73 matches

Mail list logo