notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal
am allowing in my cluster. so that if new kernel starts he will at least one container for master. it can be dynamic on priority based. if there is no container left then yarn can preempt some containers and provide them to new requests. -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Spark Streaming with Redis

2016-05-24 Thread Sachin Aggarwal
pointers to > solve this problem. > > Thanks in advance. > > Cheers, > Pari > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Re: Re: How to change output mode to Update

2016-05-18 Thread Sachin Aggarwal
sorry my mistake i gave wrong id here is correct one https://issues.apache.org/jira/browse/SPARK-15183 On Wed, May 18, 2016 at 11:19 AM, Todd <bit1...@163.com> wrote: > Hi Sachin, > > Could you please give the url of jira-15146? Thanks! > > > > > > At 2016

Re: Re: How to change output mode to Update

2016-05-17 Thread Sachin Aggarwal
Hi, there is some code I have added in jira-15146 please have a look at it, I have not finished it. U can use the same code in ur example as of now On 18-May-2016 10:46 AM, "Saisai Shao" wrote: > > .mode(SaveMode.Overwrite) > > From my understanding mode is not supported

Re: Spark structured streaming is Micro batch?

2016-05-06 Thread Sachin Aggarwal
ara Phatak >> http://datamantra.io/ >> > > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
S() ds: org.apache.spark.sql.Dataset[Int] = [value: int] scala> ds.map(_ + 1).collect() // Returns: Array(2, 3, 4) res0: Array[Int] = Array(2, 3, 4) On Wed, Apr 27, 2016 at 4:01 PM, shengshanzhang <shengshanzh...@icloud.com> wrote: > 1.6.1 > > 在 2016年4月27日,下午6:28,Sachin

Re: error: value toDS is not a member of Seq[Int] SQL

2016-04-27 Thread Sachin Aggarwal
nd > who can tell me Why and how to fix this? > > scala> val ds = Seq(1, 2, 3).toDS() > :35: error: value toDS is not a member of Seq[Int] >val ds = Seq(1, 2, 3).toDS() > > > > Thank you a lot! > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: Need Streaming output to single HDFS File

2016-04-12 Thread Sachin Aggarwal
treaming and I want to write the > streaming output to a single file. dstream.saveAsTexFiles() is creating > files in different folders. Is there a way to write to a single folder ? or > else if written to different folders, how do I merge them ? > Thanks, > Padma Ch > -- Thanks & Regards Sachin Aggarwal 7760502772

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
ile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 5 April 2016 at 09:06, Sachin Aggarwal <different.sac...@gmail.com> > wrote: > >> Hey , >> >> I have changed your example itself t

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
one as suggested? > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzad

Re: Spark Streaming - graceful shutdown when stream has no more data

2016-03-01 Thread Sachin Aggarwal
val master = args(0) > > > > val conf = new > > > > SparkConf().setMaster(master).setAppName("StreamingLogEnhanced") > > > > // Create a StreamingContext with a n second batch size > > > > val ssc = new StreamingContext(conf, Seconds(10)) > > > > // Create a DStream from all the input on port > > > > val log = Logger.getLogger(getClass.getName) > > > > > > > > sys.ShutdownHookThread { > > > > log.info("Gracefully stopping Spark Streaming Application") > > > > ssc.stop(true, true) > > > > log.info("Application stopped") > > > > } > > > > val lines = ssc.socketTextStream("localhost", ) > > > > // Create a count of log hits by ip > > > > var ipCounts=countByIp(lines) > > > > ipCounts.print() > > > > > > > > // start our streaming context and wait for it to "finish" > > > > ssc.start() > > > > // Wait for 600 seconds then exit > > > > ssc.awaitTermination(1*600) > > > > ssc.stop() > > > > } > > > > > > > > def countByIp(lines: DStream[String]) = { > > > >val parser = new AccessLogParser > > > >val accessLogDStream = lines.map(line => parser.parseRecord(line)) > > > >val ipDStream = accessLogDStream.map(entry => > > > > (entry.get.clientIpAddress, 1)) > > > >ipDStream.reduceByKey((x, y) => x + y) > > > > } > > > > > > > > } > > > > Thanks for any suggestions in advance. > > > > > > > > > > > > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Thanks & Regards Sachin Aggarwal 7760502772

explaination for parent.slideDuration in ReducedWindowedDStream

2016-02-18 Thread Sachin Aggarwal
ot;# old RDDs = " + oldRDDs.size) // Get the RDDs of the reduced values in "new time steps" val newRDDs = reducedStream.slice(previousWindow.endTime + parent.slideDuration, currentWindow.endTime) logDebug("# new RDDs = " + newRDDs.size) -- Thanks & Regards Sachin Aggarwal 7760502772

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
t;>> override def run() { receive() } >>>>> }.start() >>>>> } >>>>> >>>>> def onStop() { >>>>> // There is nothing much to do as the thread calling receive() >>>>> // is designed to stop by itself isStopped() returns false >>>>> } >>>>> >>>>> /** Create a socket connection and receive data until receiver is >>>>> stopped >>>>> */ >>>>> private def receive() { >>>>> while(!isStopped()) { >>>>> store("I am a dummy source " + Random.nextInt(10)) >>>>> Thread.sleep((1000.toDouble / ratePerSec).toInt) >>>>> } >>>>> } >>>>> } >>>>> {code} >>>>> >>>>> The given issue resides in the following >>>>> *MapWithStateRDDRecord.updateRecordWithData*, starting line 55, in the >>>>> following code block: >>>>> >>>>> {code} >>>>> dataIterator.foreach { case (key, value) => >>>>> wrappedState.wrap(newStateMap.get(key)) >>>>> val returned = mappingFunction(batchTime, key, Some(value), >>>>> wrappedState) >>>>> if (wrappedState.isRemoved) { >>>>> newStateMap.remove(key) >>>>> } else if (wrappedState.isUpdated || >>>>> timeoutThresholdTime.isDefined) >>>>> /* <--- problem is here */ { >>>>> newStateMap.put(key, wrappedState.get(), >>>>> batchTime.milliseconds) >>>>> } >>>>> mappedData ++= returned >>>>> } >>>>> {code} >>>>> >>>>> In case the stream has a timeout set, but the state wasn't set at all, >>>>> the >>>>> "else-if" will still follow through because the timeout is defined but >>>>> "wrappedState" is empty and wasn't set. >>>>> >>>>> If it is mandatory to update state for each entry of *mapWithState*, >>>>> then >>>>> this code should throw a better exception than >>>>> "NoSuchElementException", >>>>> which doesn't really saw anything to the developer. >>>>> >>>>> I haven't provided a fix myself because I'm not familiar with the spark >>>>> implementation, but it seems to be there needs to either be an extra >>>>> check >>>>> if the state is set, or as previously stated a better exception >>>>> message. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/PairDStreamFunctions-mapWithState-fails-in-case-timeout-is-set-without-updating-State-S-tp26147.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> - >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Yuval Itzchakov. >>> >> >> -- Thanks & Regards Sachin Aggarwal 7760502772

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
). In the AWS Management Console, select Security Groups (left navigation bar), select the quick-start group, the Inbound tab and add port 8080. Make sure you click “Add Rule” and then “Apply Rule Changes”. On Fri, Feb 5, 2016 at 1:14 AM, Sachin Aggarwal <different.sac...@gmail.com> wrote: > i thin

Re: PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]

2016-02-04 Thread Sachin Aggarwal
I am sorry for spam, I replied in wrong thread sleepy head :-( On Fri, Feb 5, 2016 at 1:15 AM, Sachin Aggarwal <different.sac...@gmail.com> wrote: > > http://coenraets.org/blog/2011/11/set-up-an-amazon-ec2-instance-with-tomcat-and-mysql-5-minutes-tutorial/ > > The default Tomca

Explaination for info shown in UI

2016-01-28 Thread Sachin Aggarwal
at StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped) 4/4 (3 skipped) 219 Streaming job from [output operation 0, batch time 02:51:00] print at StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2 4/4 -- Thanks & Regards Sachin Aggarwal 7760502772

Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
Hi All, need help guys, I need a work around for this situation *case where this works:* val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"), ("Rishabh", "2"))).toDF("myText", "id") TestDoc1.select(callUDF(&quo

Re: Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
at 4:13 PM, Sachin Aggarwal < > different.sac...@gmail.com> wrote: > >> >> Hi All, >> >> need help guys, I need a work around for this situation >> >> *case where this works:* >> >> val TestDoc1 = sqlContext.createDataFrame(Seq(("sachi