Python - SQL (geonames dataset)

2015-05-11 Thread Tyler Mitchell
I'm using Python to setup a dataframe, but for some reason it is not being made available to SQL. Code (from Zeppelin) below. I don't get any error when loading/prepping the data or dataframe. Any tips? (Originally I was not hardcoding the Row() structure, as my other tutorial added it by

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-11 Thread Akhil Das
Try SparkConf.set(spark.akka.extensions,Whatever), underneath i think spark won't ship properties which don't start with spark.* to the executors. Thanks Best Regards On Mon, May 11, 2015 at 8:33 AM, Terry Hole hujie.ea...@gmail.com wrote: Hi all, I'd like to monitor the akka using kamon,

Re: Cassandra number of Tasks

2015-05-11 Thread Akhil Das
Did you try repartitioning? You might end up with a lot of time spending on GC though. Thanks Best Regards On Fri, May 8, 2015 at 11:59 PM, Vijay Pawnarkar vijaypawnar...@gmail.com wrote: I am using the Spark Cassandra connector to work with a table with 3 million records. Using .where() API

Re: Python - SQL (geonames dataset)

2015-05-11 Thread ayan guha
Try this Res = ssc.sql(your SQL without limit) Print red.first() Note: your SQL looks wrong as count will need a group by clause. Best Ayan On 11 May 2015 16:22, Tyler Mitchell tyler.mitch...@actian.com wrote: I'm using Python to setup a dataframe, but for some reason it is not being made

Re: EVent generation

2015-05-11 Thread Akhil Das
Have a look over here https://storm.apache.org/community.html Thanks Best Regards On Sun, May 10, 2015 at 3:21 PM, anshu shukla anshushuk...@gmail.com wrote: http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps -- Thanks Regards, Anshu Shukla

Re: Spark can not access jar from HDFS !!

2015-05-11 Thread Ravindra
Hi All, Thanks for suggestions. What I tried is - hiveContext.sql (add jar ) and that helps to complete the create temporary function but while using this function I get ClassNotFound for the class handling this function. The same class is present in the jar added . Please note that the same

[SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Haopu Wang
I try to get the result schema of aggregate functions using DataFrame API. However, I find the result field of groupBy columns are always nullable even the source field is not nullable. I want to know if this is by design, thank you! Below is the simple code to show the issue. == import

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-11 Thread Terry Hole
Hi, Akhil, I tried this. It did not work. I also tried SparkConf.set(akka. extensions,[\kamon.system.SystemMetrics\, \kamon.statsd.StatsD\]), it also did not work. Thanks On Mon, May 11, 2015 at 2:56 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Try

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-11 Thread Michal Haris
This is the stack trace of the worker thread: org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150) org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)

how to load some of the files in a dir and monitor new file in that dir in spark streaming without missing?

2015-05-11 Thread lisendong
I have one hdfs dir, which contains many files: /user/root/1.txt /user/root/2.txt /user/root/3.txt /user/root/4.txt and there is a daemon process which add one file per minute to this dir. (e.g., 5.txt, 6.txt, 7.txt...) I want to start a spark streaming job which load 3.txt, 4.txt and then

spark mllib kmeans

2015-05-11 Thread Pa Rö
hi, it is possible to use a custom distance measure and a other data typ as vector? i want cluster temporal geo datas. best regards paul

RE: Spark SQL and java.lang.RuntimeException

2015-05-11 Thread Wang, Daoyuan
Hi, Are you creating the table from hive? Which version of hive are you using? Thanks, Daoyuan -Original Message- From: Nick Travers [mailto:n.e.trav...@gmail.com] Sent: Sunday, May 10, 2015 10:34 AM To: user@spark.apache.org Subject: Spark SQL and java.lang.RuntimeException I'm

Re: spark mllib kmeans

2015-05-11 Thread Driesprong, Fokko
Hi Paul, I would say that it should be possible, but you'll need a different distance measure which conforms to your coordinate system. 2015-05-11 14:59 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com: hi, it is possible to use a custom distance measure and a other data typ as vector? i

Reading Nested Fields in DataFrames

2015-05-11 Thread Ashish Kumar Singh
Hi , I am trying to read Nested Avro data in Spark 1.3 using DataFrames. I need help to retrieve the Inner element data in the Structure below. Below is the schema when I enter df.printSchema : |-- THROTTLING_PERCENTAGE: double (nullable = false) |-- IMPRESSION_TYPE: string (nullable = false)

It takes too long (30 seconds) to create Spark Context with SPARK/YARN

2015-05-11 Thread stanley
I am running Spark jobs on YARN cluster. It took ~30 seconds to create a spark context, while it takes only 1-2 seconds running Spark in local mode. The master is set as yarn-client, and both the machine that submits the Spark job and the YARN cluster are in the same domain. Originally I

Re: Python - SQL can't find table (Zeppelin)

2015-05-11 Thread Tyler Mitchell
Thanks for the suggestion Ayan, it has not solved my problem but I did get sqlContext to execute the SQL and return dataframe object. SQL is running fine in the pyspark interpreter but not passing to SQL note (though it works fine for a different dataset) - guess I'll take this question to the

Re: It takes too long (30 seconds) to create Spark Context with SPARK/YARN

2015-05-11 Thread Silvio Fiorito
Could you upload the spark assembly to HDFS and then set spark.yarn.jar to the path where you uploaded it? That can help minimize start-up time. How long if you start just a spark shell? On 5/11/15, 11:15 AM, stanley wangshua...@yahoo.com wrote: I am running Spark jobs on YARN cluster. It

Does long-lived SparkContext hold on to executor resources?

2015-05-11 Thread stanley
I am building an analytics app with Spark. I plan to use long-lived SparkContexts to minimize the overhead for creating Spark contexts, which in turn reduces the analytics query response time. The number of queries that are run in the system is relatively small each day. Would long lived contexts

can we start a new thread in foreachRDD in spark streaming?

2015-05-11 Thread hotdog
I want to start a child-thread in foreachRDD. My situation is: the job is reading from a hdfs dir continuously, and every 100 batches, I want to launch a model training task (I will make a snapshot of the rdds at that time and start the training task. the training task takes a very long time(2

Re: Reading Nested Fields in DataFrames

2015-05-11 Thread ayan guha
Typically you would use . notation to access, same way you would access a map. On 12 May 2015 00:06, Ashish Kumar Singh ashish23...@gmail.com wrote: Hi , I am trying to read Nested Avro data in Spark 1.3 using DataFrames. I need help to retrieve the Inner element data in the Structure below.

Re: SQL UserDefinedType can't be saved in parquet file when using assembly jar

2015-05-11 Thread Jaonary Rabarisoa
In this example, every thing work expect save to parquet file. On Mon, May 11, 2015 at 4:39 PM, Jaonary Rabarisoa jaon...@gmail.com wrote: MyDenseVectorUDT do exist in the assembly jar and in this example all the code is in a single file to make sure every thing is included. On Tue, Apr 21,

Re: SQL UserDefinedType can't be saved in parquet file when using assembly jar

2015-05-11 Thread Jaonary Rabarisoa
MyDenseVectorUDT do exist in the assembly jar and in this example all the code is in a single file to make sure every thing is included. On Tue, Apr 21, 2015 at 1:17 AM, Xiangrui Meng men...@gmail.com wrote: You should check where MyDenseVectorUDT is defined and whether it was on the classpath

Re: can we start a new thread in foreachRDD in spark streaming?

2015-05-11 Thread ayan guha
It depends on how you want to run your application. You can always save 100 batch as a data file and run another app to read those files. In that case you have separated contexts and you will find both application running simultaneously in the cluster but on different JVMs. But if you do not want

Re: EVent generation

2015-05-11 Thread Tyler Mitchell
I've had good success with splunk generator. https://github.com/coccyx/eventgen/blob/master/README.md On May 11, 2015, at 00:05, Akhil Das ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com wrote: Have a look over here https://storm.apache.org/community.html Thanks Best Regards On

Stratified sampling with DataFrames

2015-05-11 Thread Karthikeyan Muthukumar
Hi, I'm in Spark 1.3.0 and my data is in DataFrames. I need operations like sampleByKey(), sampleByKeyExact(). I saw the JIRA Add approximate stratified sampling to DataFrame ( https://issues.apache.org/jira/browse/SPARK-7157). That's targeted for Spark 1.5, till that comes through, whats the

Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Wei Yan
Hi, devs, I met a problem when using spark to read to parquet files with two different versions of schemas. For example, the first file has one field with int type, while the same field in the second file is a long. I thought spark would automatically generate a merged schema long, and use that

Looking inside the 'mapPartitions' transformation, some confused observations

2015-05-11 Thread myasuka
As we all know, a partition in Spark is actually an Iterator[T]. For some purpose, I want to treat each partition not an Iterator but one whole object. For example, treat Iterator[Int] to a breeze.linalg.DenseVector[Int]. Thus I use 'mapPartitions' API to achieve this, however, during the

Re: Getting error running MLlib example with new cluster

2015-05-11 Thread Su She
Got it to work on the cluster by changing the master to yarn-cluster instead of local! I do have a couple follow up questions... This is the example I was trying to

Re: JAVA for SPARK certification

2015-05-11 Thread Paco Nathan
Note that O'Reilly Media has test prep materials in development. The exam does include questions in Scala, Python, Java, and SQL -- and frankly a number of the questions are about comparing or identifying equivalent Spark techniques between two of those different languages. The questions do not

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Ted Yu
You can use '--jars ' option of spark-submit to ship metrics-core jar. Cheers On Mon, May 11, 2015 at 2:04 PM, Lee McFadden splee...@gmail.com wrote: Thanks Ted, The issue is that I'm using packages (see spark-submit definition) and I do not know how to add com.yammer.metrics:metrics-core

Re: JAVA for SPARK certification

2015-05-11 Thread Kartik Mehta
Awesome!! Thank you Mr. Nathan, Great to have a guide like you and helping us all, Regards, Kartik On May 11, 2015 5:07 PM, Paco Nathan cet...@gmail.com wrote: Note that O'Reilly Media has test prep materials in development. The exam does include questions in Scala, Python, Java, and SQL

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Sean, How does this model actually work? Let's say we want to run one job as N threads executing one particular task, e.g. streaming data out of Kafka into a search engine. How do we configure our Spark job execution? Right now, I'm seeing this job running as a single thread. And it's quite a

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Sean Owen
BTW I think my comment was wrong as marcelo demonstrated. In standalone mode you'd have one worker, and you do have one executor, but his explanation is right. But, you certainly have execution slots for each core. Are you talking about your own user code? you can make threads, but that's nothing

Re: Is the AMP lab done next February?

2015-05-11 Thread Reynold Xin
Relaying an answer from AMP director Mike Franklin: One year into the lab we got a 5 yr Expeditions in Computing Award as part of the White House Big Data initiative in 2012, so we extend the lab for a year. We intend to start winding it down at the end of 2016, while supporting existing

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Thanks, Sean. This was not yet digested data for me :) The number of partitions in a streaming RDD is determined by the block interval and the batch interval. I have seen the bit on spark.streaming.blockInterval in the doc but I didn't connect it with the batch interval and the number of

Re: Reading Nested Fields in DataFrames

2015-05-11 Thread Michael Armbrust
Since there is an array here you are probably looking for HiveQL's LATERAL VIEW explode https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView . On Mon, May 11, 2015 at 7:12 AM, ayan guha guha.a...@gmail.com wrote: Typically you would use . notation to access, same way you

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Michael Armbrust
BTW, I use spark 1.3.1, and already set spark.sql.parquet.useDataSourceApi to false. Schema merging is only supported when this flag is set to true (setting it to false uses old code that will be removed once the new code is proven).

Re: Get a list of temporary RDD tables via Thrift

2015-05-11 Thread Michael Armbrust
Temporary tables are not displayed by SHOW TABLES until Spark 1.3. On Mon, May 11, 2015 at 12:54 PM, Judy Nash judyn...@exchange.microsoft.com wrote: Hi, How can I get a list of temporary tables via Thrift? Have used thrift’s startWithContext and registered a temp table, but not

Re: Looking inside the 'mapPartitions' transformation, some confused observations

2015-05-11 Thread Richard Marscher
I believe the issue in b and c is that you call iter.size which actually is going to flush the iterator so the subsequent attempt to put it into a vector will yield 0 items. You could use an ArrayBuilder for example and not need to rely on knowing the size of the iterator. On Mon, May 11, 2015 at

Re: Getting error running MLlib example with new cluster

2015-05-11 Thread Sean Owen
That is mostly the YARN overhead. You're starting up a container for the AM and executors, at least. That still sounds pretty slow, but the defaults aren't tuned for fast startup. On May 11, 2015 7:00 PM, Su She suhsheka...@gmail.com wrote: Got it to work on the cluster by changing the master to

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-11 Thread Reynold Xin
Looks like it is spending a lot of time doing hash probing. It could be a number of the following: 1. hash probing itself is inherently expensive compared with rest of your workload 2. murmur3 doesn't work well with this key distribution 3. quadratic probing (triangular sequence) with a

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Olivier Girardot
Hi Haopu, actually here `key` is nullable because this is your input's schema : scala result.printSchema root |-- key: string (nullable = true) |-- SUM(value): long (nullable = true) scala df.printSchema root |-- key: string (nullable = true) |-- value: long (nullable = false) I tried it with a

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Ted Yu
com.yammer.metrics.core.Gauge is in metrics-core jar e.g., in master branch: [INFO] | \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile [INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile Please make sure metrics-core jar is on the classpath. On Mon, May 11, 2015 at 1:32 PM, Lee

Spark and RabbitMQ

2015-05-11 Thread dgoldenberg
Are there existing or under development versions/modules for streaming messages out of RabbitMQ with SparkStreaming, or perhaps a RabbitMQ RDD? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-RabbitMQ-tp22852.html Sent from the Apache Spark User

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Sean Owen
You have one worker with one executor with 32 execution slots. On Mon, May 11, 2015 at 9:52 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, Is there anything special one must do, running locally and submitting a job like so: spark-submit \ --class com.myco.Driver \

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Thanks Ted, The issue is that I'm using packages (see spark-submit definition) and I do not know how to add com.yammer.metrics:metrics-core to my classpath so Spark can see it. Should metrics-core not be part of the org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can work

Get a list of temporary RDD tables via Thrift

2015-05-11 Thread Judy Nash
Hi, How can I get a list of temporary tables via Thrift? Have used thrift's startWithContext and registered a temp table, but not seeing the temp table/rdd when running show tables. Thanks, Judy

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Reynold Xin
Thanks for catching this. I didn't read carefully enough. It'd make sense to have the udaf result be non-nullable, if the exprs are indeed non-nullable. On Mon, May 11, 2015 at 1:32 PM, Olivier Girardot ssab...@gmail.com wrote: Hi Haopu, actually here `key` is nullable because this is your

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Reynold Xin
Not by design. Would you be interested in submitting a pull request? On Mon, May 11, 2015 at 1:48 AM, Haopu Wang hw...@qilinsoft.com wrote: I try to get the result schema of aggregate functions using DataFrame API. However, I find the result field of groupBy columns are always nullable even

Specify Python interpreter

2015-05-11 Thread Bin Wang
Hey there, I have installed a python interpreter in certain location, say /opt/local/anaconda. Is there anything that I can specify the Python interpreter while developing in iPython notebook? Maybe a property in the while creating the Sparkcontext? I know that I can put #!/opt/local/anaconda

Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Hi, We've been having some issues getting spark streaming running correctly using a Kafka stream, and we've been going around in circles trying to resolve this dependency. Details of our environment and the error below, if anyone can help resolve this it would be much appreciated. Submit

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Marcelo Vanzin
Are you actually running anything that requires all those slots? e.g., locally, I get this with local[16], but only after I run something that actually uses those 16 slots: Executor task launch worker-15 daemon prio=10 tid=0x7f4c80029800 nid=0x8ce waiting on condition [0x7f4c62493000]

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Understood. We'll use the multi-threaded code we already have.. How are these execution slots filled up? I assume each slot is dedicated to one submitted task. If that's the case, how is each task distributed then, i.e. how is that task run in a multi-node fashion? Say 1000 batches/RDD's are

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Sean Owen
Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd have to provide it and all its dependencies with your app. You could also build this into your own app jar. Tools like Maven will add in the transitive dependencies. On Mon, May 11, 2015 at 10:04 PM, Lee McFadden

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Wei Yan
Creating dataframes and union them looks reasonable. thanks, Wei On Mon, May 11, 2015 at 6:39 PM, Michael Armbrust mich...@databricks.com wrote: Ah, yeah sorry. I should have read closer and realized that what you are asking for is not supported. It might be possible to add simple

Re: Spark SQL: STDDEV working in Spark Shell but not in a standalone app

2015-05-11 Thread Michael Armbrust
I doubt that will make it as we are pretty slammed with other things and the author needs to address the comments / merge conflict still. I'll add that in general I recommend users use the HiveContext, even if they aren't using Hive at all. Its a strict super set of the functionality provided by

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Seems to be running OK with 4 threads, 16 threads... While running with 32 threads I started getting the below. 15/05/11 19:48:46 WARN executor.Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message =

Re: spark : use the global config variables in executors

2015-05-11 Thread Marcelo Vanzin
Note that `object` is equivalent to a class full of static fields / methods (in Java), so the data it holds will not be serialized, ever. What you want is a config class instead, so you can instantiate it, and that instance can be serialized. Then you can easily do (1) or (3). On Mon, May 11,

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Ted, many thanks. I'm not used to Java dependencies so this was a real head-scratcher for me. Downloading the two metrics packages from the maven repository (metrics-core, metrics-annotation) and supplying it on the spark-submit command line worked. My final spark-submit for a python project

Can standalone cluster manager provide I/O information on worker nodes?

2015-05-11 Thread Shiyao Ma
Hi, Can standalone cluster manager provide I/O information on worker nodes? If not, possible to point out what's the proper file to modify to achieve that functionality? Besides, does Mesos support that? Regards. - To

DStream Union vs. StreamingContext Union

2015-05-11 Thread Vadim Bichutskiy
Can someone explain to me the difference between DStream union and StreamingContext union? When do you use one vs the other? Thanks, Vadim ᐧ

RE: Spark SQL: STDDEV working in Spark Shell but not in a standalone app

2015-05-11 Thread Oleg Shirokikh
Michael – Thanks for the response – that’s right, I haven’t noticed that Spark Shell instantiates sqlContext as a HiveContext, not actual Spark SQL Context… I’ve seen the PR to add STDDEV to data frames.. Can I expect this to be added to Spark SQL in Spark 1.4 or it’s still uncertain? It would

Re: Reading Nested Fields in DataFrames

2015-05-11 Thread Ruslan Dautkhanov
Had the same question on stackoverflow recently http://stackoverflow.com/questions/30008127/how-to-read-a-nested-collection-in-spark Lomig Mégard had a detailed answer of how to do this without using LATERAL VIEW. On Mon, May 11, 2015 at 8:05 AM, Ashish Kumar Singh ashish23...@gmail.com wrote:

How to get Master UI with ZooKeeper HA setup?

2015-05-11 Thread Rex Xiong
Hi, We have a 3-node master setup with ZooKeeper HA. Driver can find the master with spark://xxx:xxx,xxx:xxx,xxx:xxx But how can I find out the valid Master UI without looping through all 3 nodes? Thanks

Re: Met a problem when using spark to load parquet files with different version schemas

2015-05-11 Thread Michael Armbrust
Ah, yeah sorry. I should have read closer and realized that what you are asking for is not supported. It might be possible to add simple coercions such as this one, but today, compatible schemas must only add/remove columns and cannot change types. You could try creating different dataframes

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
I opened a ticket on this (without posting here first - bad etiquette, apologies) which was closed as 'fixed'. https://issues.apache.org/jira/browse/SPARK-7538 I don't believe that because I have my script running means this is fixed, I think it is still an issue. I downloaded the spark source,

Re: Spark can not access jar from HDFS !!

2015-05-11 Thread Ravindra
After upgrading to spark 1.3, these statements on hivecontext are working fine. Thanks On Mon, May 11, 2015, 12:15 Ravindra ravindra.baj...@gmail.com wrote: Hi All, Thanks for suggestions. What I tried is - hiveContext.sql (add jar ) and that helps to complete the create temporary

RE: Does long-lived SparkContext hold on to executor resources?

2015-05-11 Thread Ganelin, Ilya
Also check out the spark.cleaner.ttl property. Otherwise, you will accumulate shuffle metadata in the memory of the driver. Sent with Good (www.good.com) -Original Message- From: Silvio Fiorito [silvio.fior...@granturing.commailto:silvio.fior...@granturing.com] Sent: Monday, May 11,

Unread block data

2015-05-11 Thread Guy Needham
Hi, I'm trying to compile and use Spark 1.3.1 with Hadoop 2.2.0. I compiled from course with the Maven command: mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0 -Phive-thriftserver -DskipTests clean package When I run this with a local master (bin/spark-shell --master local[2]) I

Re: Does long-lived SparkContext hold on to executor resources?

2015-05-11 Thread Silvio Fiorito
You want to look at dynamic resource allocation, here: http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation On 5/11/15, 11:23 AM, stanley wangshua...@yahoo.com wrote: I am building an analytics app with Spark. I plan to use long-lived SparkContexts to minimize

TwitterPopularTags Long Processing Delay

2015-05-11 Thread Seyed Majid Zahedi
Hi, I'm running TwitterPopularTags.scala on a single node. Everything works fine for a while (about 30min), but after a while I see a long processing delay for tasks, and it keeps increasing. Has anyone experienced the same issue? Here is my configurations: spark.driver.memory