UNSUBSCRIBE
UNSUBSCRIBE Thanks _ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _
RE: Cant start master on windows 7
Hi Jacek, To run a spark master on my windows box, I've created a .bat file with contents something like: .\bin\spark-class.cmd org.apache.spark.deploy.master.Master --host For the worker: .\bin\spark-class.cmd org.apache.spark.deploy.worker.Worker spark://:7077 To wrap these in services, I've user yasw or nssm. Thanks, Tim -Original Message- From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Tuesday, 1 December 2015 4:18 AM To: Shuo Wang Cc: user Subject: Re: Cant start master on windows 7 On Fri, Nov 27, 2015 at 4:27 PM, Shuo Wangwrote: > I am trying to use the start-master.sh script on windows 7. From http://spark.apache.org/docs/latest/spark-standalone.html: "Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand." Can you start the command by hand? Just copy and paste the command from the logs. Mind the spaces! Jacek - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org _ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: how to group timestamp data and filter on it
Hi LCassa, Try: Map to pair, then reduce by key. The spark documentation is a pretty good reference for this & there are plenty of word count examples on the internet. Warm regards, TimB From: Cassa L [mailto:lcas...@gmail.com] Sent: Thursday, 19 November 2015 11:27 AM To: user Subject: how to group timestamp data and filter on it Hi, I have a data stream (JavaDStream) in following format- timestamp=second1, map(key1=value1, key2=value2) timestamp=second2,map(key1=value3, key2=value4) timestamp=second2, map(key1=value1, key2=value5) I want to group data by 'timestamp' first and then filter each RDD for Key1=value1 or key1=value3 etc. Each of above row represent POJO in RDD like: public class Data{ long timestamp; Mapmap; } How do do this in spark? I am trying to figure out if I need to use map or flatMap etc? Thanks, LCassa _ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _
RE: TightVNC - Application Monitor (right pane)
Hi, I have a spark kafka streaming application that works when I run with a local spark context, but not with a remote one. My code consists of: 1. A spring-boot application that creates the context 2. A shaded jar file containing all of my spark code On my pc (windows), I have a spark (1.5.1) master and worker running. The entry point for my application is the start() method. The code is: @throws(classOf[Exception]) def start { val ssc: StreamingContext = createStreamingContext val messagesRDD = createKafkaDStream(ssc, "myTopic", 2) def datasRDD = messagesRDD.map((line : String) => MapFunctions.lineToSparkEventData(line)) def count = datasRDD.count() datasRDD.print(1) ssc.start ssc.awaitTermination } private def createStreamingContext: StreamingContext = { System.setProperty(HADOOP_HOME_DIR, configContainer.getHadoopHomeDir) System.setProperty("spark.streaming.concurrentJobs", String.valueOf(configContainer.getStreamingConcurrentJobs)) def sparkConf = createSparkConf() val ssc = new StreamingContext(sparkConf, Durations.seconds(configContainer.getStreamingContextDurationSeconds)) ssc.sparkContext.setJobGroup("main_streaming_job", "streaming context start") ssc.sparkContext.setLocalProperty("spark.scheduler.pool", "real_time_pool") ssc } private def createSparkConf() : SparkConf = { def masterString = "spark://<>:7077" def conf = new SparkConf().setMaster(masterString).setAppName("devAppRem") // This is not working //def conf = new SparkConf().setMaster("local[4]").setAppName("devAppLocal") // This IS working conf.set("spark.scheduler.allocation.file", "D:\\valid_path_to\\fairscheduler.xml"); val pathToShadedApplicationJar: String = configContainer.getApplicationJarPaths.get(0) val jars: Array[String] = Array[String](pathToShadedApplicationJar) conf.setJars(jars) conf.set("spark.scheduler.mode", "FAIR") } private def createKafkaDStream(ssc: StreamingContext, topics: String, numThreads: Int): DStream[String] = { val zkQuorum: String = configContainer.getZkQuorum val groupId: String = configContainer.getGroupId val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap def lines = KafkaUtils.createStream(ssc, zkQuorum, groupId, topicMap, StorageLevel.MEMORY_AND_DISK_SER).map(_._2) lines } } The Error that I get is: 2015-11-18 10:41:19.191 WARN 3044 --- [result-getter-3] o.apache.spark.scheduler.TaskSetManager : Lost task 0.0 in stage 2.0 (TID 70, 169.254.95.56): java.io.IOException: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at