How to insert df in HBASE
HI, I need to insert a Dataframe in to hbase using scala code. Can anyone guide me how to achieve this? Any help would be much appreciated. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-df-in-HBASE-tp25891.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Streaming of COAP Resources
I am currently working on IOT Coap protocol.I accessed server on local host through copper firefox plugin. Then i Added resouce having "GET" functionality in server. After that i made its client as a streaming source. Here is the code of client streaming class customReceiver(test:String) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging with Serializable { @volatile private var stopped = false override def onStart() { val client = new CoapClient("ip/resource") var text = client.get().getResponseText(); store(text) } override def onStop(): Unit = synchronized { try { stopped = true } catch { case e: Exception => println("exception caught: " + e); } } } but i am facing a problem. During streaming it just read a resource once. after that it fetches all empty rdd and completes its batches. Meanwhile if resource changes its value it doesn't read that. are i doing something wrong? or is there exists any other functionality to read whenever resource get changed that i can handle in my Custom receiver.? or any idea about how to GET value continuously during streaming? Any help is much awaited and appreciated. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-of-COAP-Resources-tp25084.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Twitter Streming using Twitter Public Streaming API and Apache Spark
Hi, I wanna fetch PUBLIC tweets (not particular to any account) containing any particular HASHTAG (#) (i.e "CocaCola" in my case) from twitter. I made an APP on twitter to get the credentials, and then used Twitter Public Streaming API. Below is the piece of code. { val config = new twitter4j.conf.ConfigurationBuilder() .setOAuthConsumerKey("***") .setOAuthConsumerSecret("***") .setOAuthAccessToken("***") .setOAuthAccessTokenSecret("***") .build val twitter_auth = new TwitterFactory(config) val a = new twitter4j.auth.OAuthAuthorization(config) val atwitter : Option[twitter4j.auth.Authorization] = Some(twitter_auth.getInstance(a).getAuthorization()) val sparkConf = new SparkConf().setAppName("TwitterPublicStreaming").setMaster("local") val ssc = new StreamingContext(sparkConf, Seconds(1)) var filters: Seq[String]= "#CocaCola" ::Nil val stream = TwitterUtils.createStream(ssc, atwitter,filters, StorageLevel.MEMORY_AND_DISK_2) val data=stream.window(Seconds(1),Seconds(1)) data.print() ssc.start() ssc.awaitTermination() } But most of the times it doesn't fetch tweets. it shows the Empty RDD as the output. Is there anything wrong? Can anyone points out the mistake? Thanks in Anticipation. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Twitter-Streming-using-Twitter-Public-Streaming-API-and-Apache-Spark-tp24687.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver
okay. Then do you have any idea how to avoid this error? Thanks On Tue, Aug 11, 2015 at 12:08 AM, Tathagata Das t...@databricks.com wrote: I think this may be expected. When the streaming context is stopped without the SparkContext, then the receivers are stopped by the driver. The receiver sends back the message that it has been stopped. This is being (probably incorrectly) logged with ERROR level. On Sun, Aug 9, 2015 at 12:52 AM, Sadaf sa...@platalytics.com wrote: Hi When i tried to stop spark streaming using ssc.stop(false,true) It gives the following error. ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver 15/08/07 13:41:11 WARN ReceiverSupervisorImpl: Stopped executor without error 15/08/07 13:41:20 WARN WriteAheadLogManager : Failed to write to write ahead log I've implemented Streaming Listener and a Custom Receiver. Does anyone has idea about this? Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-ReceiverTracker-Deregistered-receiver-for-stream-0-Stopped-by-driver-tp24183.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver
Hi When i tried to stop spark streaming using ssc.stop(false,true) It gives the following error. ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver 15/08/07 13:41:11 WARN ReceiverSupervisorImpl: Stopped executor without error 15/08/07 13:41:20 WARN WriteAheadLogManager : Failed to write to write ahead log I've implemented Streaming Listener and a Custom Receiver. Does anyone has idea about this? Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-ReceiverTracker-Deregistered-receiver-for-stream-0-Stopped-by-driver-tp24183.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
how to stop twitter-spark streaming
Hi All, i am working with spark streaming and twitter's user api. i used this code to stop streaming ssc.addStreamingListener(new StreamingListener{ var count=1 override def onBatchCompleted(batchCompleted: StreamingListenerBatchCompleted) { count += 1 if(count=5) { ssc.stop(true,true) } } }) and also override onStop method in custom receiver to stop streaming. but it gives the following exception java.lang.NullPointerException: Inflater has been closed at java.util.zip.Inflater.ensureOpen(Inflater.java:389) at java.util.zip.Inflater.inflate(Inflater.java:257) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.readLine(BufferedReader.java:317) at java.io.BufferedReader.readLine(BufferedReader.java:382) at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:85) at twitter4j.StatusStreamImpl.next(StatusStreamImpl.java:57) at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:478) 15/08/06 15:50:43 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver 15/08/06 15:50:43 WARN ReceiverSupervisorImpl: Stopped executor without error 15/08/06 15:50:50 WARN WriteAheadLogManager : Failed to write to write ahead log Anyone knows the cause of this exception? Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-stop-twitter-spark-streaming-tp24150.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming - CheckPointing issue
Hi i've done the twitter streaming using twitter's streaming user api and spark streaming. this runs successfully on my local machine. but when i run this program on cluster in local mode. it just run successfully for the very first time. later on it gives the following exception. Exception in thread main org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former. and spark class path is unset already!! I have to make a new checkpoint directory each time to make it run successfully. otherwise it shows above exception. Can anyone help me to resolve this issue? Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-CheckPointing-issue-tp24139.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Twitter live Streaming
Hi Is there any way to get all old tweets since when the account was created using spark streaming and twitters api? Currently my connector is showing those tweets that get posted after the program runs. I've done this task using spark streaming and a custom receiver using twitter user api. Thanks in anticipation. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Twitter-live-Streaming-tp24124.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Twitter Connector-Spark Streaming
Hi. I am writing twitter connector using spark streaming. but it fetched the random tweets. Is there any way to receive the tweets of a particular account? I made an app on twitter and used the credentials as given below. def managingCredentials(): Option[twitter4j.auth.Authorization]= { object auth{ val config = new twitter4j.conf.ConfigurationBuilder() .setOAuthConsumerKey() .setOAuthConsumerSecret() .setOAuthAccessToken() .setOAuthAccessTokenSecret() .build } val twitter_auth = new TwitterFactory(auth.config) val a = new twitter4j.auth.OAuthAuthorization(auth.config) val atwitter : Option[twitter4j.auth.Authorization] = Some(twitter_auth.getInstance(a).getAuthorization()) atwitter } Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Twitter-Connector-Spark-Streaming-tp24078.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Error in starting Spark Streaming Context
Hi I am new to Spark Streaming and writing a code for twitter connector. when i run this code more than one time, it gives the following exception. I have to create a new hdfs directory for checkpointing each time to make it run successfully and moreover it doesn't get stopped. ERROR StreamingContext: Error starting the context, marking it as stopped org.apache.spark.SparkException: org.apache.spark.streaming.dstream.WindowedDStream@532d0784 has not been initialized at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:321) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339) at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:227) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:222) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:222) at org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:92) at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:73) at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:588) at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:586) at twitter.streamingSpark$.twitterConnector(App.scala:38) at twitter.streamingSpark$.main(App.scala:26) at twitter.streamingSpark.main(App.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) The relavent code is def twitterConnector() :Unit = { val atwitter=managingCredentials() val ssc=StreamingContext.getOrCreate(hdfsDirectory,()= { managingContext() }) fetchTweets(ssc, atwitter ) ssc.start() // Start the computation ssc.awaitTermination() } def managingContext():StreamingContext = { //making spark context val conf = new SparkConf().setMaster(local[*]).setAppName(twitterConnector) val ssc = new StreamingContext(conf, Seconds(1)) val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext) import sqlContext.implicits._ //checkpointing ssc.checkpoint(hdfsDirectory) ssc } def fetchTweets (ssc : StreamingContext , atwitter : Option[twitter4j.auth.Authorization]) : Unit = { val tweets =TwitterUtils.createStream(ssc,atwitter,Nil,StorageLevel.MEMORY_AND_DISK_2) val twt = tweets.window(Seconds(10),Seconds(10)) //checkpoint duration /twt.checkpoint(new Duration(1000)) //processing case class Tweet(createdAt:Long, text:String) twt.map(status= Tweet(status.getCreatedAt().getTime()/1000, status.getText()) ) twt.print() } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-starting-Spark-Streaming-Context-tp24063.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: error in twitter streaming
thanks for the suggestion Akashsihag. i've tried this solution and unfortunately it is also giving the same error. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-in-twitter-streaming-tp24030p24056.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming
Hi, I am new to Spark Streaming and writing a code for twitter connector. I am facing the following exception. ERROR StreamingContext: Error starting the context, marking it as stopped org.apache.spark.SparkException: org.apache.spark.streaming.dstream.WindowedDStream@532d0784 has not been initialized at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:321) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339) at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:227) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:222) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:222) at org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:92) at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:73) at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:588) at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:586) at twitter.streamingSpark$.twitterConnector(App.scala:38) at twitter.streamingSpark$.main(App.scala:26) at twitter.streamingSpark.main(App.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) the relavent code is def twitterConnector() :Unit = { val atwitter=managingCredentials() val ssc=StreamingContext.getOrCreate(hdfs://192.168.23.109:9000/home/cloud9/twitterCheckpointDir,()= { managingContext() }) fetchTweets(ssc, atwitter ) ssc.start() // Start the computation ssc.awaitTermination() } def managingContext():StreamingContext = { //making spark context val conf = new SparkConf().setMaster(local[*]).setAppName(twitterConnector) val ssc = new StreamingContext(conf, Seconds(1)) val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext) import sqlContext.implicits._ //checkpointing ssc.checkpoint(hdfs://192.168.23.109:9000/home/cloud9/twitterCheckpointDir) ssc } def fetchTweets (ssc : StreamingContext , atwitter : Option[twitter4j.auth.Authorization]) : Unit = { val tweets =TwitterUtils.createStream(ssc,atwitter,Nil,StorageLevel.MEMORY_AND_DISK_2) val twt = tweets.window(Seconds(10),Seconds(10)) //checkpoint duration /twt.checkpoint(new Duration(1000)) //processing case class Tweet(createdAt:Long, text:String) twt.map(status= Tweet(status.getCreatedAt().getTime()/1000, status.getText()) ) twt.print() } Can anyone help me in this regards? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-tp24058.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Checkpoint issue in spark streaming
Hi all. I am writing a twitter connector using spark streaming. i have written the following code to maintain checkpoint. val ssc=StreamingContext.getOrCreate(hdfs://192.168.23.109:9000/home/cloud9/twitterCheckpoint,()= { managingContext() }) def managingContext():StreamingContext = { //making spark context val conf = new SparkConf().setMaster(local[*]).setAppName(twitterConnector) val ssc = new StreamingContext(conf, Seconds(1)) val sqlContext = new org.apache.spark.sql.SQLContext(ssc.sparkContext) import sqlContext.implicits._ //checkpointing /ssc.checkpoint(hdfs://192.168.23.109:9000/home/cloud9/twitterCheckpoint) ssc } but it gives the following error: java.lang.IllegalArgumentException: requirement failed: WindowedDStream has been marked for checkpointing but the storage level has not been set to enable persisting. Please use DStream.persist() to set the storage level to use memory for better checkpointing performance. I have also mentioned the storage level through the following code. TwitterUtils.createStream(ssc,None,null,StorageLevel.MEMORY_AND_DISK_2) Can anyone help me in this regard? Thanks :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Checkpoint-issue-in-spark-streaming-tp24031.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org