Re: How to update structured streaming apps gracefully

2018-12-18 Thread Priya Matpadi
< vincent.gromakow...@gmail.com> wrote: > Checkpointing is only used for failure recovery not for app upgrades. You > need to manually code the unload/load and save it to a persistent store > > Le mar. 18 déc. 2018 à 17:29, Priya Matpadi a écrit : > >> Using checkpointing

Re: How to update structured streaming apps gracefully

2018-12-18 Thread Priya Matpadi
Using checkpointing for graceful updates is my understanding as well, based on the writeup in https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing, and some prototyping. Have you faced any missed events? On Mon, Dec 17, 2018

Re: How to track batch jobs in spark ?

2018-12-05 Thread Priya Matpadi
if you are deploying your spark application on YARN cluster, 1. ssh into master node 2. List the currently running application and retreive the application_id yarn application --list 3. Kill the application using application_id of the form application_x_ from output of list command

Custom SparkListener

2018-09-20 Thread Priya Ch
Hello All, I am trying to extend SparkListener and post job ends trying to retrieve job name to check the status of either success/failure and write to log file. I couldn't find a way where I could fetch job name in the onJobEnd method. Thanks, Padma CH

DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Priya Ch
Hello Team, I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder). Version of the spark i am using is 2.3.1. Can someone

Fwd: Spark checkpoint - nonstreaming

2017-05-26 Thread Priya PM
-- Forwarded message -- From: Priya PM <pmpr...@gmail.com> Date: Fri, May 26, 2017 at 8:54 PM Subject: Re: Spark checkpoint - nonstreaming To: Jörn Franke <jornfra...@gmail.com> Oh, how do i do it. I dont see it mentioned anywhere in the documentation. I have follow

Spark checkpoint - nonstreaming

2017-05-26 Thread Priya
root 133 May 26 16:26 rdd-28 [root@priya-vm 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# cd rdd-28/ [root@priya-vm rdd-28]# ls part-0 part-1 _partitioner Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-checkpoint-nonstreaming-tp28712.html

Video analytics on SPark

2016-09-09 Thread Priya Ch
Hi All, I have video surveillance data and this needs to be processed in Spark. I am going through the Spark + OpenCV. How to load .mp4 images into an RDD ? Can we directly do this or the video needs to be coverted to sequenceFile ? Thanks, Padma CH

Re: Send real-time alert using Spark

2016-07-12 Thread Priya Ch
I mean model training on incoming data is taken care by Spark. For detected anomalies, need to send alert. Could we do this with Spark or any other framework like Akka/REST API would do it ? Thanks, Padma CH On Tue, Jul 12, 2016 at 7:30 PM, Marcin Tustin <mtus...@handybook.com> wrote: &

Send real-time alert using Spark

2016-07-12 Thread Priya Ch
Hi All, I am building Real-time Anomaly detection system where I am using k-means to detect anomaly. Now in-order to send alert to mobile or an email alert how do i send it using Spark itself ? Thanks, Padma CH

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveCont

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Re: Spark Job Execution halts during shuffle...

2016-06-01 Thread Priya Ch
Hi Can someone throw light on this. The issue is not frquently happening. Sometimes the job halts with the above messages. Regards, Padma Ch On Fri, May 27, 2016 at 8:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Priya: > Have you checked the executor logs on hostname1 and hostname2

Spark Job Execution halts during shuffle...

2016-05-26 Thread Priya Ch
Hello Team, I am trying to perform join 2 rdds where one is of size 800 MB and the other is 190 MB. During the join step, my job halts and I don't see progress in the execution. This is the message I see on console - INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
r dataset to all nodes and do the matching with the > bigger dataset there. > This highly depends on the data in your data set. How they compare in size > etc. > > > > On 25 May 2016, at 13:27, Priya Ch <learnings.chitt...@gmail.com> wrote: > > Why do i need to deplo

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Ch On Wed, May 25, 2016 at 4:49 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > Alternatively depending on the exact use case you may employ solr on > Hadoop for text analytics > > > On 25 May 2016, at 12:57, Priya Ch <learnings.chitt...@gmail.com> wrote: > > > &

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Lets say i have rdd A of strings as {"hi","bye","ch"} and another RDD B of strings as {"padma","hihi","chch","priya"}. For every string rdd A i need to check the matches found in rdd B as such for string "hi" i ha

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
ame#save writing as > parquet, orc, ...? > > // maropu > > On Wed, May 25, 2016 at 7:10 PM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> Hi , Yes I have joined using DataFrame join. Now to save this into hdfs >> .I am converting the joined datafram

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
ollowed by all columns from the >> FinancialData table. >> >> >> Not very useful >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <http

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch

Need Streaming output to single HDFS File

2016-04-12 Thread Priya Ch
Hi All, I am working with Kafka, Spark Streaming and I want to write the streaming output to a single file. dstream.saveAsTexFiles() is creating files in different folders. Is there a way to write to a single folder ? or else if written to different folders, how do I merge them ? Thanks, Padma

Streaming k-means visualization

2016-04-08 Thread Priya Ch
Hi All, I am using Streaming k-means to train my model on streaming data. Now I want to visualize the clusters. What would be the reporting tool used for this ? Would zeppelin used to visualize the clusters Regards, Padma Ch

Re: Why KMeans with mllib is so slow ?

2016-03-14 Thread Priya Ch
Hi Xi Shen, Changing the initialization step from "kmeans||" to "random" decreased the execution time from 2 hrs to 6 min. However, by default the no.of runs is 1. If I try to set the number of runs to 10, then again see increase in job execution time. How to proceed on this ?. By the way how

BlockFetchFailed Exception

2016-03-11 Thread Priya Ch
Hi All, I am trying to run spark k-means on a data set which is closely to 1 GB. Most often I seen BlockFetchFailed Exception which I am suspecting because of Out of memory. Here the configuration details- Total cores:12 Total workers:3 Memory per node: 6GB When running the job, I an giving

Spark mllin k-means taking too much time

2016-03-03 Thread Priya Ch
Hi Team, I am running k-means algorithm on KDD 1999 data set ( http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). I am running the algorithm for different values of k as such - 5,10,15,40. The data set is 709 MB. I have placed the file in hdfs with a block size of 128MB (6 blocks).

Spark Mllib kmeans execution

2016-03-02 Thread Priya Ch
Hi All, I am running k-means clustering algorithm. Now, when I am running the algorithm as - val conf = new SparkConf val sc = new SparkContext(conf) . . val kmeans = new KMeans() val model = kmeans.run(RDD[Vector]) . . . The 'kmeans' object gets created on driver. Now does *kmeans.run() *get

Real time anomaly system

2016-03-01 Thread Priya Ch
Hi, I am trying to build real time anomaly detection system using Spark, kafka, Cassandra and Akka. I have network intrusion dataset (KDD 1999 cup). how can i build the system using this ? I understood that certain part of the data, I am considering as historical data for my model training and

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
d use "lsof" on > one of the spark executors (perhaps run it in a for loop, writing the > output to separate files) until it fails and see which files are being > opened, if there's anything that seems to be taking up a clear majority > that might key you in on the culprit. >

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
which would convey the same. On Wed, Jan 6, 2016 at 8:19 PM, Annabel Melongo <melongo_anna...@yahoo.com> wrote: > Priya, > > It would be helpful if you put the entire trace log along with your code > to help determine the root cause of the error. > > Thanks > > > O

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
the "too many open files" > exception. > > > On Tuesday, January 5, 2016 8:03 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Can some one throw light on this ? > > Regards, > Padma Ch > > On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch &l

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
Can some one throw light on this ? Regards, Padma Ch On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Chris, we are using spark 1.3.0 version. we have not set > spark.streaming.concurrentJobs > this parameter. It takes the default value. > &

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-28 Thread Priya Ch
ing execution time - check total number of open files using lsof >> command. Need root permissions. If it is cluster not sure much ! >> 2) which exact line in the code is triggering this error ? Can you paste >> that snippet ? >> >> >> On Wednesday 23 December 2015

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-23 Thread Priya Ch
ulimit -n 65000 fs.file-max = 65000 ( in etc/sysctl.conf file) Thanks, Padma Ch On Tue, Dec 22, 2015 at 6:47 PM, Yash Sharma <yash...@gmail.com> wrote: > Could you share the ulimit for your setup please ? > > - Thanks, via mobile, excuse brevity. > On Dec 22, 2015

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-22 Thread Priya Ch
Jakob, Increased the settings like fs.file-max in /etc/sysctl.conf and also increased user limit in /etc/security/limits.conf. But still see the same issue. On Fri, Dec 18, 2015 at 12:54 AM, Jakob Odersky wrote: > It might be a good idea to see how many files are open

java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-17 Thread Priya Ch
Hi All, When running streaming application, I am seeing the below error: java.io.FileNotFoundException: /data1/yarn/nm/usercache/root/appcache/application_1450172646510_0004/blockmgr-a81f42cd-6b52-4704-83f3-2cfc12a11b86/02/temp_shuffle_589ddccf-d436-4d2c-9935-e5f8c137b54b (Too many open

Inconsistent data in Cassandra

2015-12-06 Thread Priya Ch
Hi All, I have the following scenario in writing rows to Cassandra from Spark Streaming - in a 1 sec batch, I have 3 tickets with same ticket number (primary key) but with different envelope numbers (i.e envelope 1, envelope 2, envelope 3.) I am writing these messages to Cassandra using

Spark Streaming Use Cases

2015-12-02 Thread Priya Ch
Hi All, I have the following use case for Spark Streaming - There are 2 streams of data say - FlightBookings and Ticket For each ticket, I need to associate it with relevant Booking info. There are distinct applications for Booking and Ticket. The Booking streaming application processes the

Subtract on rdd2 is throwing below exception

2015-11-05 Thread Priya Ch
Hi All, I am seeing exception when trying to substract 2 rdds. Lets say rdd1 has messages like - * pnr, bookingId, BookingObject* 101, 1, BookingObject1 // - event number is 0 102, 1, BookingObject2// - event number is 0 103, 2,

Re: Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Priya Ch
across all test suites ??? On Thu, Nov 5, 2015 at 12:36 AM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Priya, > > If you're trying to get unit tests running local spark contexts, you can > just set up your spark context with 'spark.driver.allowMultipleContexts' > set

Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Priya Ch
Hello All, How to use multiple Spark Context in executing multiple test suite of spark code ??? Can some one throw light on this ?

Re: Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
One more question, if i have a function which takes RDD as a parameter, how do we mock an RDD ?? On Thu, Oct 29, 2015 at 5:20 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > How do we do it for Cassandra..can we use the same Mocking ? > EmbeddedCassandra Server

Re: Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
ck.executeUpdate _).expects() > > > On Thu, Oct 29, 2015 at 12:27 PM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> Hi All, >> >> For my Spark Streaming code, which writes the results to Cassandra DB, >> I need to write Unit test cases. what are the available test frameworks to >> mock the connection to Cassandra DB ? >> > >

Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
Hi All, For my Spark Streaming code, which writes the results to Cassandra DB, I need to write Unit test cases. what are the available test frameworks to mock the connection to Cassandra DB ?

Inner Joins on Cassandra RDDs

2015-10-21 Thread Priya Ch
Hello All, I have two Cassandra RDDs. I am using joinWithCassandraTable which is doing a cartesian join because of which we are getting unwanted rows. How to perform inner join on Cassandra RDDs ? If I intend to use normal join, i have to read entire table which is costly. Is there any

Concurrency issue in Streams of data

2015-10-20 Thread Priya Ch
Hi All, When processing streams of data (with batch inter val 1 sec), there is possible case of Concurrency issue. i.e two messages M1 and M2 (updated version of M1) with same key are processed by 2 threads in parallel. To resolve this concurrency issue, I am applying Hash Partitioner on RDD.

Re: CassandraSQLContext throwing NullPointer Exception

2015-09-28 Thread Priya Ch
2015 at 6:54 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which Spark release are you using ? > > Can you show the snippet of your code around CassandraSQLContext#sql() ? > > Thanks > > On Mon, Sep 28, 2015 at 6:21 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >

CassandraSQLContext throwing NullPointer Exception

2015-09-28 Thread Priya Ch
Hi All, I am trying to use dataframes (which contain data from cassandra) in rdd.foreach. This is throwing the following exception: Is CassandraSQLContext accessible within executor 15/09/28 17:22:40 ERROR JobScheduler: Error running job streaming job 144344116 ms.0

Re: SparkContext declared as object variable

2015-09-24 Thread Priya Ch
ode and the spark version is 1.3.0 This has thrown java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5 Regards, Padma Ch On Thu, Sep 24, 2015 at 11:25 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > It should, i don't see any reason for it t

Re: passing SparkContext as parameter

2015-09-22 Thread Priya Ch
cord from cassandra, you >>> need to do cassandraRdd.map(func), which will run distributed on the spark >>> workers >>> >>> *Romi Kuntsman*, *Big Data Engineer* >>> http://www.totango.com >>> >>> On Mon, Sep 21, 2015 at 3:29 PM, Priya

Re: passing SparkContext as parameter

2015-09-22 Thread Priya Ch
; stream.filter(...).map(x=>(key, > )).joinWithCassandra().map(...).saveToCassandra() > > I'm not sure about exactly 10 messages, spark streaming focus on time not > count.. > > > On Tue, Sep 22, 2015 at 2:14 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> I h

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
, 2015 at 3:06 PM, Petr Novak <oss.mli...@gmail.com> wrote: > add @transient? > > On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> Hello All, >> >> How can i pass sparkContext as a parameter to a method in an obje

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
orkers ??? Regards, Padma Ch On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yuzhih...@gmail.com> wrote: > You can use broadcast variable for passing connection information. > > Cheers > > On Sep 21, 2015, at 4:27 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >

passing SparkContext as parameter

2015-09-21 Thread Priya Ch
Hello All, How can i pass sparkContext as a parameter to a method in an object. Because passing sparkContext is giving me TaskNotSerializable Exception. How can i achieve this ? Thanks, Padma Ch

SparkContext declared as object variable

2015-09-18 Thread Priya Ch
Hello All, Instead of declaring sparkContext in main, declared as object variable as - object sparkDemo { val conf = new SparkConf val sc = new SparkContext(conf) def main(args:Array[String]) { val baseRdd = sc.parallelize() . . . } } But this piece of code is giving

Re: Spark Streaming..Exception

2015-09-14 Thread Priya Ch
; true. What is the possible solution for this ? Is this a bug in Spark 1.3.0? Changing the scheduling mode to Stand-alone or Mesos mode would work fine ?? Please someone share your views on this. On Sat, Sep 12, 2015 at 11:04 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hello A

Spark Streaming..Exception

2015-09-12 Thread Priya Ch
Hello All, When I push messages into kafka and read into streaming application, I see the following exception- I am running the application on YARN and no where broadcasting the message within the application. Just simply reading message, parsing it and populating fields in a class and then

foreachRDD causing executor lost failure

2015-09-08 Thread Priya Ch
Hello All, I am using foreachRDD in my code as - dstream.foreachRDD { rdd => rdd.foreach { record => // look up with cassandra table // save updated rows to cassandra table. } } This foreachRDD is causing executor lost failure. what is the behavior of this foreachRDD ??? Thanks, Padma Ch

Spark - launchng job for each action

2015-09-06 Thread Priya Ch
Hi All, In Spark, each action results in launching a job. Lets say my spark app looks as- val baseRDD =sc.parallelize(Array(1,2,3,4,5),2) val rdd1 = baseRdd.map(x => x+2) val rdd2 = rdd1.filter(x => x%2 ==0) val count = rdd2.count val firstElement = rdd2.first println("Count is"+count)

Re: Spark - launchng job for each action

2015-09-06 Thread Priya Ch
euse the data, you need to call rdd2.cache >> >> >> >> On Sun, Sep 6, 2015 at 2:33 PM, Priya Ch <learnings.chitt...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> In Spark, each action results in launching a job. Lets say my spark

Executor lost failure

2015-09-01 Thread Priya Ch
Hi All, I have a spark streaming application which writes the processed results to cassandra. In local mode, the code seems to work fine. The moment i start running in distributed mode using yarn, i see executor lost failure. I increased executor memory to occupy entire node's memory which is

Spark RDD join with CassandraRDD

2015-08-25 Thread Priya Ch
Hi All, I have the following scenario: There exists a booking table in cassandra, which holds the fields like, bookingid, passengeName, contact etc etc. Now in my spark streaming application, there is one class Booking which acts as a container and holds all the field details - class

rdd count is throwing null pointer exception

2015-08-17 Thread Priya Ch
Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{

Re: rdd count is throwing null pointer exception

2015-08-17 Thread Priya Ch
transformation. For more information, see SPARK-5063. On Mon, Aug 17, 2015 at 8:13 PM, Preetam preetam...@gmail.com wrote: The error could be because of the missing brackets after the word cache - .ticketRdd.cache() On Aug 17, 2015, at 7:26 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi

Write to cassandra...each individual statement

2015-08-13 Thread Priya Ch
Hi All, I have a question in writing rdd to cassandra. Instead of writing entire rdd to cassandra, i want to write individual statement into cassandra beacuse there is a need to perform to ETL on each message ( which requires checking with the DB). How could i insert statements individually?

Re: Write to cassandra...each individual statement

2015-08-13 Thread Priya Ch
to do is *transform* the rdd before writing it, e.g. using the .map function. On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, I have a question in writing rdd to cassandra. Instead of writing entire rdd to cassandra, i want to write individual

Fwd: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread Priya Ch
key. Hope that helps. Greetings, Juan 2015-07-30 10:50 GMT+02:00 Priya Ch learnings.chitt...@gmail.com: Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi TD, Thanks for the info. I have the scenario

Re: Writing streaming data to cassandra creates duplicates

2015-07-30 Thread Priya Ch
Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi TD, Thanks for the info. I have the scenario like this. I am reading the data from kafka topic. Let's say kafka has 3 partitions for the topic. In my

Fwd: Writing streaming data to cassandra creates duplicates

2015-07-28 Thread Priya Ch
. This will guard against multiple attempts to run the task that inserts into Cassandra. See http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations TD On Sun, Jul 26, 2015 at 11:19 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, I have

Writing streaming data to cassandra creates duplicates

2015-07-26 Thread Priya Ch
Hi All, I have a problem when writing streaming data to cassandra. Or existing product is on Oracle DB in which while wrtiting data, locks are maintained such that duplicates in the DB are avoided. But as spark has parallel processing architecture, if more than 1 thread is trying to write same

Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-05 Thread Priya Ch
Hi All, I configured Kafka cluster on a single node and I have streaming application which reads data from kafka topic using KafkaUtils. When I execute the code in local mode from the IDE, the application runs fine. But when I submit the same to spark cluster in standalone mode, I end up

Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Priya Ch
Please find the attached worker log. I could see stream closed exception On Wed, Jan 7, 2015 at 10:51 AM, Xiangrui Meng men...@gmail.com wrote: Could you attach the executor log? That may help identify the root cause. -Xiangrui On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch learnings.chitt

TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-05 Thread Priya Ch
Hi All, Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in local mode and not on distributed mode. Null pointer exception has been thrown. Is this a bug in spark-1.1.0 ? *Following is the code:* def main(args:Array[String]) { val conf=new SparkConf val sc=new

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val

Fwd: 1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-24 Thread Priya Ch
. If yes, then try setting the level of parallelism and number of partitions while creating/transforming the RDD. Thanks Best Regards On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch [hidden email] http://user/SendEmail.jtp?type=nodenode=18936i=0 wrote: Hi All, We have set up 2 node cluster (NODE

Default spark.deploy.recoveryMode

2014-10-14 Thread Priya Ch
Hi Spark users/experts, In Spark source code (Master.scala Worker.scala), when registering the worker with master, I see the usage of *persistenceEngine*. When we don't specify spark.deploy.recovery mode explicitly, what is the default value used ? This recovery mode is used to persists and

Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
Hi Team, When I am trying to use DenseMatrix of breeze library in spark, its throwing me the following error: java.lang.noclassdeffounderror: breeze/storage/Zero Can someone help me on this ? Thanks, Padma Ch

spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Priya Ch
Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default

Spark build error

2014-08-06 Thread Priya Ch
Hi, I am trying to build jars using the command : mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package Execution of the above command is throwing the following error: [INFO] Spark Project Core . FAILURE [ 0.295 s] [INFO] Spark Project Bagel