Understanding life cycle of RpcEndpoint: CoarseGrainedExecutorBackend

2019-12-18 Thread S
requests? Is it referring to the "set of tasks on assigned to this particular RPCEndpoint" from a stage of a spark RDD on its individual partitions?* *Q3: If the receive method is indeed called multiple times through the course of a spark job where each request refers to the set of task

how can i write spark addListener metric to kafka

2020-06-09 Thread a s
hi Guys, I am building a structured streaming app for google analytics data i want to capture the number of rows read and processed i am able to see it in log how can i send it to kafka Thanks Alis

Re: Spark Structured Streaming

2021-05-31 Thread S
any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 31 May 2021 at 19:32, S wrote: &

Spark Structured Streaming

2021-05-31 Thread S
Hi, I am using Structured Streaming on Azure HdInsight. The version is 2.4.6. I am trying to understand the microbatch mode - default and fixed intervals. Does the fixed interval microbatch follow something similar to receiver based model where records keep getting pulled and stored into blocks

Spark Structured Streaming Continuous Trigger on multiple sinks

2021-08-25 Thread S
Hello, I have a structured streaming job that needs to be able to write to multiple sinks. We are using *Continuous* Trigger *and not* *Microbatch* Trigger. 1. When we use the foreach method using: *dataset1.writeStream.foreach(kafka ForEachWriter

Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
Hi, We have a spark job that calls a microservice in the lambda function of the flatmap transformation -> passes to this microservice, the inbound element in the lambda function and returns the transformed value or "None" from the microservice as an output of this flatMap transform. Of course

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
e processing? > > On Wed, Feb 16, 2022 at 7:58 AM S wrote: > >> Retries have been already implemented. The question is how to stop the >> spark job by having an executor JVM send a signal to the driver JVM. e.g. I >> have a microbatch of 30 messages; 10 in each of the 3 partition

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
nt? you actually want > to retry the failed attempts, not just avoid calling the microservice. > > On Wed, Feb 16, 2022 at 3:18 AM S wrote: > >> Hi, >> >> We have a spark job that calls a microservice in the lambda function of >> the flatmap transformation -> passe

Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
Hi All., I am new to Spark and I am trying to run LogisticRegression (with SGD) using MLLib on a beefy single machine with about 128GB RAM. The dataset has about 80M rows with only 4 features so it barely occupies 2Gb on disk. I am running the code using all 8 cores with 20G memory using

Re: Logistic Regression MLLib Slow

2014-06-04 Thread Srikrishna S
it somehow becomes larger, though it seems unlikely that it would exceed 20 GB) and 2) how many parallel tasks run in each iteration. Matei On Jun 4, 2014, at 6:56 PM, Srikrishna S srikrishna...@gmail.com wrote: I am using the MLLib one (LogisticRegressionWithSGD) with PySpark. I am

Spark Installation

2014-07-07 Thread Srikrishna S
Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries in the exact same way that was used for that provided on the website. I have tried

Re: Spark Installation

2014-07-08 Thread Srikrishna S
/ On Mon, Jul 7, 2014 at 8:07 PM, Srikrishna S srikrishna...@gmail.com wrote: Hi All, Does anyone know what the command line arguments to mvn are to generate the pre-built binary for spark on Hadoop 2-CHD5. I would like to pull in a recent bug fix in spark-master and rebuild the binaries

Job getting killed

2014-07-11 Thread Srikrishna S
I am trying to run Logistic Regression on the url dataset (from libsvm) using the exact same code as the example on a 5 node Yarn-Cluster. I get a pretty cryptic error that says Killed Nothing more Settings: --master yarn-client --verbose --driver-memory 24G --executor-memory 24G

Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am run logistic regression with SGD on a problem with about 19M parameters (the kdda dataset from the libsvm library) I consistently see that the nodes on my computer get disconnected and soon the whole job goes to a grinding halt. 14/07/12 03:05:16 ERROR cluster.YarnClientClusterScheduler:

Re: Akka Client disconnected

2014-07-12 Thread Srikrishna S
I am using the master that I compiled 2 days ago. Can you point me to the JIRA? On Sat, Jul 12, 2014 at 9:13 AM, DB Tsai dbt...@dbtsai.com wrote: Are you using 1.0 or current master? A bug related to this is fixed in master. On Jul 12, 2014 8:50 AM, Srikrishna S srikrishna...@gmail.com wrote

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
If you use Scala, you can do: val conf = new SparkConf() .setMaster(yarn-client) .setAppName(Logistic regression SGD fixed) .set(spark.akka.frameSize, 100) .setExecutorEnv(SPARK_JAVA_OPTS, -Dspark.akka.frameSize=100) var sc = new

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
That is exactly the same error that I got. I am still having no success. Regards, Krishna On Mon, Jul 14, 2014 at 11:50 AM, crater cq...@ucmerced.edu wrote: Hi Krishna, Thanks for your help. Are you able to get your 29M data running yet? I fix the previous problem by setting larger

Re: Error when testing with large sparse svm

2014-07-14 Thread Srikrishna S
) number of partitions, which should match the number of cores 2) driver memory (you can see it from the executor tab of the Spark WebUI and set it with --driver-memory 10g 3) the version of Spark you were running Best, Xiangrui On Mon, Jul 14, 2014 at 12:14 PM, Srikrishna S srikrishna...@gmail.com

Spark Performance Bench mark

2014-07-15 Thread Malligarjunan S
an decision to move the project to use Spark. Hence help me. I highly appreciate your help. I am planning to use --instance-type m1.xlarge --instance-count 3 Thanks and Regards, Malligarjunan S.

Spark Performance issue

2014-07-15 Thread Malligarjunan S
? Thanks and Regards, Malligarjunan S.

Need help on Spark UDF (Join) Performance tuning .

2014-07-17 Thread S Malligarjunan
Time taken: 3718.23 seconds The above UDF query takes more time to run.  Where testCompare is an udf function, The function just does the pvc1.col1 = pvc2.col1 OR pvc1.col1 = pvc2.col2 Please let me know what is the issue here?   Thanks and Regards, Sankar S.  

Re: Need help on Spark UDF (Join) Performance tuning .

2014-07-18 Thread S Malligarjunan
Hello Experts, Appreciate your input highly, please suggest/ give me hint, what would be the issue here?   Thanks and Regards, Malligarjunan S.   On Thursday, 17 July 2014, 22:47, S Malligarjunan smalligarju...@yahoo.com wrote: Hello Experts, I am facing performance problem when I use

Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
, Rafeeq S *(“What you do is what matters, not what you think or say or plan.” )*

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
*as parquet file on HDFS ?* *Please give your suggestion.* Regards, Rafeeq S *(“What you do is what matters, not what you think or say or plan.” )* On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: You can try this Kafka Spark Consumer which I

Spark RuntimeException due to Unsupported datatype NullType

2014-08-11 Thread rafeeq s
$ParquetOperations$.apply(SparkStrategies.scala:156) Please provide your valuable solution for above issue. Thanks in Advance!. Regards, Rafeeq S *(“What you do is what matters, not what you think or say or plan.” )*

how to access workers from spark context

2014-08-12 Thread S. Zhou
I tried to access worker info from spark context but it seems spark context does no expose such API. The reason of doing that is: it seems spark context itself does not have logic to detect if its workers are in dead status. So I like to add such logic by myself.  BTW, it seems spark web UI

Re: how to access workers from spark context

2014-08-12 Thread S. Zhou
Sometimes workers are dead but spark context does not know it and still send jobs. On Tuesday, August 12, 2014 7:14 PM, Stanley Shi s...@pivotal.io wrote: Why do you need to detect the worker status in the application? you application generally don't need to know where it is executed

Re: how to access workers from spark context

2014-08-12 Thread S. Zhou
actually if you search the spark mail archives you will find many similar topics. At this time, I just want to manage it by myself. On Tuesday, August 12, 2014 8:46 PM, Stanley Shi s...@pivotal.io wrote: This seems a bug, right? It's not the user's responsibility to manage the workers

Spark SQL Parser error

2014-08-21 Thread S Malligarjunan
spark from github. Using Hadoop 1.0.3    Thanks and Regards, Sankar S.  

Re: Spark SQL Parser error

2014-08-22 Thread S Malligarjunan
Hello Yin, I have tried  the create external table command as well. I get the same error. Please help me to find the root cause.   Thanks and Regards, Sankar S.   On Friday, 22 August 2014, 22:43, Yin Huai huaiyin@gmail.com wrote: Hi Sankar, You need to create an external table

Re: Spark SQL Parser error

2014-08-22 Thread S Malligarjunan
Hello Yin, Forgot to mention one thing, the same query works fine in Hive and Shark..   Thanks and Regards, Sankar S.   On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried  the create external table command as well. I get the same error. Please help me to find

How to make Spark Streaming write its output so that Impala can read it?

2014-08-24 Thread rafeeq s
directory or otherwise in a form that is instantly readable by Impala? Regards, Rafeeq S *(“What you do is what matters, not what you think or say or plan.” )*

Re: Spark SQL Parser error

2014-08-24 Thread S Malligarjunan
Hello Yin, Additional note: In ./bin/spark-shell --jars s3n:/mybucket/myudf.jar  I got the following message in console. Waring skipped external jar..   Thanks and Regards, Sankar S.   On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried use sc.addJar

SPARK Hive Context UDF Class Not Found Exception,

2014-08-25 Thread S Malligarjunan
trying to create a temporary function. What would be the issue here?   Thanks and Regards, Sankar S.  

Re: SPARK Hive Context UDF Class Not Found Exception,

2014-08-26 Thread S Malligarjunan
Hello Michel, I have executed git pull now, As per pom, version entry it is 1.1.0-SNAPSHOT.   Thanks and Regards, Sankar S.   On Tuesday, 26 August 2014, 1:00, Michael Armbrust mich...@databricks.com wrote: Which version of Spark SQL are you using?  Several issues with custom hive UDFs

Spark 1.1. doesn't work with hive context

2014-08-26 Thread S Malligarjunan
) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 72 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found   Thanks and Regards, Sankar S.  

Is there a way to insert data into existing parquet file using spark ?

2014-08-27 Thread rafeeq s
it to impala. I want to insert data into existing parquet file instead of creating new parquet file. I have tried with INSERT statement but it makes performance too slow. Please suggest is there any way to insert or append data into existing parquet file. Regards, Rafeeq S *(“What you do is what

Re: Spark 1.1. doesn't work with hive context

2014-08-27 Thread S Malligarjunan
It is my mistake, some how I have added the io.compression.codec property value as the above mentioned class. Resolved the problem now   Thanks and Regards, Sankar S.   On Wednesday, 27 August 2014, 1:23, S Malligarjunan smalligarju...@yahoo.com wrote: Hello all, I have just checked out

Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-18 Thread Rafeeq S
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Please suggest your answer Regards, Rafeeq S *(“What you do is what matters, not what you think or say or plan.” )*

Re: Spark Streaming: Sentiment Analysis of Twitter streams

2014-10-15 Thread S Krishna
Hi, I am using 1.1.0. I did set my twitter credentials and I am using the full path. I did not paste this in the public post. I am running on a cluster and getting the exception. Are you running in local or standalone mode? Thanks On Oct 15, 2014 3:20 AM, Akhil Das ak...@sigmoidanalytics.com

Spark executor lost

2014-12-03 Thread S. Zhou
We are using Spark job server to submit spark jobs (our spark version is 0.91). After running the spark job server for a while, we often see the following errors (executor lost) in the spark job server log. As a consequence, the spark driver (allocated inside spark job server) gradually loses

Issue when upgrading from Spark 1.1.0 to 1.1.1: Exception of java.lang.NoClassDefFoundError: io/netty/util/TimerTask

2014-12-10 Thread S. Zhou
Everything worked fine on Spark 1.1.0 until we upgrade to 1.1.1. For some of our unit tests we saw the following exceptions. Any idea how to solve it? Thanks! java.lang.NoClassDefFoundError: io/netty/util/TimerTask        at org.apache.spark.storage.BlockManager.init(BlockManager.scala:72)     

Re: column expression in left outer join for DataFrame

2015-03-25 Thread S Krishna
Hi, Thanks for your response. I modified my code as per your suggestion, but now I am getting a runtime error. Here's my code: val df_1 = df.filter( df(event) === 0) . select(country, cnt) val df_2 = df.filter( df(event) === 3) . select(country, cnt)

Re: column expression in left outer join for DataFrame

2015-03-25 Thread S Krishna
= df.filter( df(event) === 0) . select(country, cnt).as(a) val df_2 = df.filter( df(event) === 3) . select(country, cnt).as(b) val both = df_2.join(df_1, $a.country === $b.country), left_outer) On Tue, Mar 24, 2015 at 11:57 PM, S Krishna skrishna

Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread S. Zhou
I did some experiments and it seems not. But I like to get confirmation (or perhaps I missed something). If it does support, could u let me know how to specify multiple folders? Thanks. Senqiang 

Re: Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread S. Zhou
2:40 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at scaladoc:  /** Get an RDD for a Hadoop file with an arbitrary new API InputFormat. */  def newAPIHadoopFile[K, V, F : NewInputFormat[K, V]] Your conclusion is confirmed. On Tue, Mar 3, 2015 at 1:59 PM, S. Zhou myx...@yahoo.com.invalid

Re: Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread S. Zhou
syntax that works with Hadoop's FileInputFormat should work. I thought you could specify a comma-separated list of paths? maybe I am imagining that. On Tue, Mar 3, 2015 at 10:57 PM, S. Zhou myx...@yahoo.com.invalid wrote: Thanks Ted. Actually a follow up question. I need to read multiple HDFS files

MLlib: save models to HDFS?

2015-04-03 Thread S. Zhou
I am new to MLib so I have a basic question: is it possible to save MLlib models (particularly CF models) to HDFS and then reload it later? If yes, could u share some sample code (I could not find it in MLlib tutorial). Thanks!

SparkR and Spark Mlib

2015-07-03 Thread praveen S
Hi, Is sparkR and spark Mlib same?

Meaning of local[2]

2015-08-17 Thread praveen S
What does this mean in .setMaster(local[2]) Is this applicable only for standalone Mode? Can I do this in a cluster setup, eg: . setMaster(hostname:port[2]).. Is it number of threads per worker node?

Regarding rdd.collect()

2015-08-18 Thread praveen S
When I do an rdd.collect().. The data moves back to driver Or is still held in memory across the executors?

Difference between RandomForestModel and RandomForestClassificationModel

2015-07-29 Thread praveen S
Hi Wanted to know what is the difference between RandomForestModel and RandomForestClassificationModel? in Mlib.. Will they yield the same results for a given dataset?

Lambda serialization

2015-07-29 Thread Subshiri S
Hi, I have tried to use lambda expression in spark task, And it throws java.lang.IllegalArgumentException: Invalid lambda deserialization exception. It exception is thrown when I used the code like transform(pRDD-pRDD.map(t-t._2)) . The code snippet is below. JavaPairDStreamString,Integer

Spark MLib v/s SparkR

2015-08-05 Thread praveen S
I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?

Combine code for RDD and DStream

2015-08-03 Thread Sidd S
Hello! I am developing a Spark program that uses both batch and streaming (separately). They are both pretty much the exact same programs, except the inputs come from different sources. Unfortunately, RDD's and DStream's define all of their transformations in their own files, and so I have two

Re: Combine code for RDD and DStream

2015-08-03 Thread Sidd S
DStreams transform function helps me solve this issue elegantly. Thanks! On Mon, Aug 3, 2015 at 1:42 PM, Sidd S ssinga...@gmail.com wrote: Hello! I am developing a Spark program that uses both batch and streaming (separately). They are both pretty much the exact same programs, except

Is it Spark Serialization bug ?

2015-07-30 Thread Subshiri S
= JavaStreamingFactory.getInstance(); JavaReceiverInputDStreamString lines = ssc.socketTextStream(localhost, ); JavaDStreamString words = lines.flatMap(s-{return Arrays.asList(s.split( ));}); JavaPairDStreamString,Integer pairRDD = words.mapToPair(x-new Tuple2String,Integer(x,1

Re: Spark MLib v/s SparkR

2015-08-06 Thread praveen S
you are trying to solve, and then the selection may be evident. On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote: I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data

StringIndexer + VectorAssembler equivalent to HashingTF?

2015-08-07 Thread praveen S
Is StringIndexer + VectorAssembler equivalent to HashingTF while converting the document for analysis?

Topic Modelling- LDA

2015-09-23 Thread Subshiri S
Hi, I am experimenting with Spark LDA. How do I create Topic Model for Prediction in Spark ? How do I evaluate the topics modelled in Spark ? Could you point some examples. Regards, Subshiri

Create a n x n graph given only the vertices

2016-01-08 Thread praveen S
Is it possible in graphx to create/generate a graph n x n given n vertices?

Re: Create a n x n graph given only the vertices no

2016-01-10 Thread praveen S
Is it possible in graphx to create/generate graph of n x n given only the vertices. On 8 Jan 2016 23:57, "praveen S" <mylogi...@gmail.com> wrote: > Is it possible in graphx to create/generate a graph n x n given n > vertices? >

choice of RDD function

2016-06-14 Thread Sivakumaran S
e may be more fields added to the json at a later stage. There will be a lot of “id”s at a later stage. Q2. If it can be done using either, which one would render to be more efficient and fast? As of now, the entire set up is in a single laptop. Thanks in advance. Regards, Siva

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Jun 15, 2016 at 5:03 PM, Sivakumaran S <siva.kuma...@me.com> wrote: >> Thanks Jacek, >> >> Job completed!! :) Ju

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
Thanks Jacek, Job completed!! :) Just used data frames and sql query. Very clean and functional code. Siva > On 15-Jun-2016, at 3:10 PM, Jacek Laskowski wrote: > > mapWithState

Re: choice of RDD function

2016-06-16 Thread Sivakumaran S
ki > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Jun 15, 2016 at 11:55 PM, Sivakumaran S <siva.kuma...@me.com> wrote: >> Cody, >> >&g

Re: Python to Scala

2016-06-18 Thread Sivakumaran S
If you can identify a suitable java example in the spark directory, you can use that as a template and convert it to scala code using http://javatoscala.com/ Siva > On 18-Jun-2016, at 6:27 AM, Aakash Basu wrote: > > I don't have a sound

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
Jun 15, 2016 at 11:19 AM, Sivakumaran S <siva.kuma...@me.com> wrote: >> Of course :) >> >> object sparkStreaming { >> def main(args: Array[String]) { >>StreamingExamples.setStreamingLogLevels() //Set reasonable logging >> levels for streaming if the user has

ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-15 Thread S Sarkar
.4.0" ) resolvers += "Akka Repository" at "http://repo.akka.io/releases/; I am getting TaskResultGetter error with ClassNotFoundException for scala.Some . Can I please get some help how to fix it? Thanks, S. Sarkar -- View this message in context: http://apache-spark-use

Re: choice of RDD function

2016-06-16 Thread Sivakumaran S
direction":8.50031} In my Spark app, I have set the batch duration as 60 seconds. Now, as per the 1.6.1 documentation, "Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SQLContext.read.json() on either an R

Re: Create a n x n graph given only the vertices no

2016-01-11 Thread praveen S
in East > *Spark GraphX in Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 11 Jan 2016, at 03:19, praveen S <mylogi...@gmail.com> wrote: > > Is it possible in graphx to creat

Usage of SparkContext within a Web container

2016-01-13 Thread praveen S
Is use of SparkContext from a Web container a right way to process spark jobs or should we use spark-submit in a processbuilder? Are there any pros or cons of using SparkContext from a Web container..? How does zeppelin trigger spark jobs from the Web context?

AM creation in yarn client mode

2016-02-09 Thread praveen S
Hi, I have 2 questions when running the spark jobs on yarn in client mode : 1) Where is the AM(application master) created : A) is it created on the client where the job was submitted? i.e driver and AM on the same client? Or B) yarn decides where the the AM should be created? 2) Driver and AM

Re: AM creation in yarn-client mode

2016-02-09 Thread praveen S
Can you explain what happens in yarn client mode? Regards, Praveen On 10 Feb 2016 10:55, "ayan guha" <guha.a...@gmail.com> wrote: > It depends on yarn-cluster and yarn-client mode. > > On Wed, Feb 10, 2016 at 3:42 PM, praveen S <mylogi...@gmail.com> wrote: > &g

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
. --- Robin East *Spark GraphX in Action* Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action On 11 Jan 2016, at 12:30, praveen S <mylogi...@gmail.com> wrote: Yes I was looking som

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
Sorry.. Found the api.. On 21 Jan 2016 10:17, "praveen S" <mylogi...@gmail.com> wrote: > Hi Robin, > > I am using Spark 1.3 and I am not able to find the api > Graph.fromEdgeTuples(edge RDD, 1) > > Regards, > Praveen > Well you can use a similar tech

Re: Reuse Executor JVM across different JobContext

2016-01-19 Thread praveen S
Can you give me more details on Spark's jobserver. Regards, Praveen On 18 Jan 2016 03:30, "Jia" wrote: > I guess all jobs submitted through JobServer are executed in the same JVM, > so RDDs cached by one job can be visible to all other jobs executed later. > On Jan 17,

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread praveen S
Even i was trying to launch spark jobs from webservice : But I thought you could run spark jobs in yarn mode only through spark-submit. Is my understanding not correct? Regards, Praveen On 15 Feb 2016 08:29, "Sabarish Sasidharan" wrote: > Yes you can look at

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
On Thu, Feb 18, 2016 at 9:40 AM, praveen S <mylogi...@gmail.com> wrote: > >> Have a look at >> >> spark.streaming.backpressure.enabled >> Property >> >> Regards, >> Praveen >> On 18 Feb 2016 00:13, "Abhishek Anand" <abhis.ana

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" <mylogi...@gmail.com> wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb

Use only latest values

2016-04-09 Thread Daniela S
Hi, I would like to cache values and to use only the latest "valid" values to build a sum. In more detail, I receive values from devices periodically. I would like to add up all the valid values each minute. But not every device sends a new value every minute. And as long as there is no new

Grouping in Spark Streaming / batch size = time window?

2016-04-11 Thread Daniela S
Hi,   I am a newbie in Spark Streaming and have some questions.   1) Is it possible to group a stream in Spark Streaming like in Storm (field grouping)?   2) Could the batch size be used instead of a time window?   Thank you in advance.   Regards, Daniela    

Re: spark-ec2 hitting yum install issues

2016-04-18 Thread Anusha S
Yes, it does not work manually. I am not able to really do 'yum search' to find exact package names to try others, but I tried python-pip and it gave same error. I will post this in the link you pointed out. Thanks! On Thu, Apr 14, 2016 at 6:11 PM, Nicholas Chammas < nicholas.cham...@gmail.com>

No of Spark context per jvm

2016-05-09 Thread praveen S
Hi, As far as I know you can create one SparkContext per jvm, but wanted to confirm if it's one per jvm or one per classloader. As in one SparkContext created per *. war, all deployment under one tomcat instance Regards, Praveen

Is spark-submit a single point of failure?

2016-07-22 Thread Sivakumaran S
fails and has to be restarted. Is there any way to obviate this? Is my understanding correct that the spark-submit in its current form is a Single Point of Vulnerability, much akin to the NameNode in HDFS? regards Sivakumaran S

Re: Visualization of data analysed using spark

2016-07-31 Thread Sivakumaran S
Hi Tony, If your requirement is browser based plotting (real time or other wise), you can load the data and display it in a browser using D3. Since D3 has very low level plotting routines, you can look at C3 ( provided by www.pubnub.com) or Rickshaw (https://github.com/shutterstock/rickshaw

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Sivakumaran S
> wrote: > > Hi Siva, > > Does topic has partitions? which version of Spark you are using? > > On Wed, Aug 10, 2016 at 2:38 AM, Sivakumaran S <siva.kuma...@me.com > <mailto:siva.kuma...@me.com>> wrote: > Hi, > > Here is a working example I did. >

Re: Question on Spark shell

2016-07-11 Thread Sivakumaran S
ves you a Spark Context to play with straight > away. The output is printed to the console. > > On Mon, 11 Jul 2016 at 11:47 Sivakumaran S <siva.kuma...@me.com > <mailto:siva.kuma...@me.com>> wrote: > Hello, > > Is there a way to start the spark server with t

Re: Question on Spark shell

2016-07-11 Thread Sivakumaran S
t; You should have the same output starting the application on the console. You > are not seeing any output? > > On Mon, 11 Jul 2016 at 11:55 Sivakumaran S <siva.kuma...@me.com > <mailto:siva.kuma...@me.com>> wrote: > I am running a spark streaming application using Scala

Question on Spark shell

2016-07-11 Thread Sivakumaran S
Hello, Is there a way to start the spark server with the log output piped to screen? I am currently running spark in the standalone mode on a single machine. Regards, Sivakumaran - To unsubscribe e-mail:

Re: Multiple aggregations over streaming dataframes

2016-07-07 Thread Sivakumaran S
Hi Arnauld, Sorry for the doubt, but what exactly is multiple aggregation? What is the use case? Regards, Sivakumaran > On 07-Jul-2016, at 11:18 AM, Arnaud Bailly wrote: > > Hello, > > I understand multiple aggregations over streaming dataframes is not currently >

Re: Multiple aggregations over streaming dataframes

2016-07-07 Thread Sivakumaran S
ond aggregation. I could > probably rewrite the query in such a way that it does aggregation in one pass > but that would obfuscate the purpose of the various stages. > > Le 7 juil. 2016 12:55, "Sivakumaran S" <siva.kuma...@me.com > <mailto:siva.kuma...@me.com&g

Re: problem extracting map from json

2016-07-07 Thread Sivakumaran S
Hi Michal, Will an example help? import scala.util.parsing.json._//Requires scala-parsec-combinators because it is no longer part of core scala val wbJSON = JSON.parseFull(weatherBox) //wbJSON is a JSON object now //Depending on the structure, now traverse through the object val

Re: Have I done everything correctly when subscribing to Spark User List

2016-08-08 Thread Sivakumaran S
Does it have anything to do with the fact that the mail address is displayed as user @spark.apache.org ? There is a space before ‘@‘. This is as received in my mail client. Sivakumaran > On 08-Aug-2016, at 7:42 PM, Chris Mattmann wrote: > >

Re: Help testing the Spark Extensions for the Apache Bahir 2.0.0 release

2016-08-07 Thread Sivakumaran S
Hi, How can I help? regards, Sivakumaran S > On 06-Aug-2016, at 6:18 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > Apache Bahir is voting it's 2.0.0 release based on Apache Spark 2.0.0. > > https://www.mail-archive.com/dev@bahir.apache.org/msg00312.html

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Sivakumaran S
Not an expert here, but the first step would be devote some time and identify which of these 112 factors are actually causative. Some domain knowledge of the data may be required. Then, you can start of with PCA. HTH, Regards, Sivakumaran S > On 08-Aug-2016, at 3:01 PM, Tony Lane <to

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Sivakumaran S
Hi, Here is a working example I did. HTH Regards, Sivakumaran S val topics = "test" val brokers = "localhost:9092" val topicsSet = topics.split(",").toSet val sparkConf = new SparkConf().setAppName("KafkaWeatherCalc").setMaster("local")

How to tune number of tesks

2017-01-26 Thread Soheila S.
Hi all, Please tell me how can I tune output partition numbers. I run my spark job on my local machine with 8 cores and input data is 6.5GB. It creates 193 tasks and put the output into 193 partitions. How can I change the number of tasks and consequently, the number of output files? Best,

Text

2017-01-27 Thread Soheila S.
Hi All, I read a test file using sparkContext.textfile(filename) and assign it to an RDD and process the RDD (replace some words) and finally write it to a text file using rdd.saveAsTextFile(output). Is there any way to be sure the order of the sentences will not be changed? I need to have the

  1   2   >