DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Priya Ch
Hello Team, I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder). Version of the spark i am using is 2.3.1. Can someone

Video analytics on SPark

2016-09-09 Thread Priya Ch
Hi All, I have video surveillance data and this needs to be processed in Spark. I am going through the Spark + OpenCV. How to load .mp4 images into an RDD ? Can we directly do this or the video needs to be coverted to sequenceFile ? Thanks, Padma CH

Re: Send real-time alert using Spark

2016-07-12 Thread Priya Ch
gt; Priya, > > You wouldn't necessarily "use spark" to send the alert. Spark is in an > important sense one library among many. You can have your application use > any other library available for your language to send the alert. > > Marcin > > On Tue, Jul 12,

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveCont

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Spark Job Execution halts during shuffle...

2016-05-26 Thread Priya Ch
Hello Team, I am trying to perform join 2 rdds where one is of size 800 MB and the other is 190 MB. During the join step, my job halts and I don't see progress in the execution. This is the message I see on console - INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
d use "lsof" on > one of the spark executors (perhaps run it in a for loop, writing the > output to separate files) until it fails and see which files are being > opened, if there's anything that seems to be taking up a clear majority > that might key you in on the culprit. >

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
n Wednesday, January 6, 2016 4:00 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Running 'lsof' will let us know the open files but how do we come to know > the root cause behind opening too many files. > > Thanks, > Padma CH > > On Wed, J

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
the "too many open files" > exception. > > > On Tuesday, January 5, 2016 8:03 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Can some one throw light on this ? > > Regards, > Padma Ch > > On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch &l

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
, 2015 at 3:06 PM, Petr Novak <oss.mli...@gmail.com> wrote: > add @transient? > > On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> Hello All, >> >> How can i pass sparkContext as a parameter to a method in an obje

Re: Spark Streaming..Exception

2015-09-14 Thread Priya Ch
; true. What is the possible solution for this ? Is this a bug in Spark 1.3.0? Changing the scheduling mode to Stand-alone or Mesos mode would work fine ?? Please someone share your views on this. On Sat, Sep 12, 2015 at 11:04 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hello A

Spark Streaming..Exception

2015-09-12 Thread Priya Ch
Hello All, When I push messages into kafka and read into streaming application, I see the following exception- I am running the application on YARN and no where broadcasting the message within the application. Just simply reading message, parsing it and populating fields in a class and then

Fwd: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread Priya Ch
key. Hope that helps. Greetings, Juan 2015-07-30 10:50 GMT+02:00 Priya Ch learnings.chitt...@gmail.com: Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi TD, Thanks for the info. I have the scenario

Re: Writing streaming data to cassandra creates duplicates

2015-07-30 Thread Priya Ch
Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi TD, Thanks for the info. I have the scenario like this. I am reading the data from kafka topic. Let's say kafka has 3 partitions for the topic. In my

Fwd: Writing streaming data to cassandra creates duplicates

2015-07-28 Thread Priya Ch
. This will guard against multiple attempts to run the task that inserts into Cassandra. See http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations TD On Sun, Jul 26, 2015 at 11:19 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, I have

Writing streaming data to cassandra creates duplicates

2015-07-26 Thread Priya Ch
Hi All, I have a problem when writing streaming data to cassandra. Or existing product is on Oracle DB in which while wrtiting data, locks are maintained such that duplicates in the DB are avoided. But as spark has parallel processing architecture, if more than 1 thread is trying to write same

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val

1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-14 Thread Priya Ch
Hi All, We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set up hdfs which has 2 TB capacity and the block size is 256 mb When we try to process 1 gb file on spark, we see the following exception

Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
Hi Team, When I am trying to use DenseMatrix of breeze library in spark, its throwing me the following error: java.lang.noclassdeffounderror: breeze/storage/Zero Can someone help me on this ? Thanks, Padma Ch

Fwd: Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
of breeze to the classpath? In Spark 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze version you used is different from the one comes with Spark, you might see class not found. -Xiangrui On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi Team

Subscription request for developer community

2014-06-12 Thread Priya Ch
Please accept the request