UNSUBSCRIBE

2015-12-14 Thread Tim Barthram
UNSUBSCRIBE Thanks


_

The information transmitted in this message and its attachments (if any) is 
intended 
only for the person or entity to which it is addressed.
The message may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance 
upon this information, by persons or entities other than the intended recipient 
is 
prohibited.

If you have received this in error, please contact the sender and delete this 
e-mail 
and associated material from any computer.

The intended recipient of this e-mail may only use, reproduce, disclose or 
distribute 
the information contained in this e-mail and any attached files, with the 
permission 
of the sender.

This message has been scanned for viruses.
_


RE: Cant start master on windows 7

2015-11-30 Thread Tim Barthram
Hi Jacek,

To run a spark master on my windows box, I've created a .bat file with contents 
something like:

.\bin\spark-class.cmd org.apache.spark.deploy.master.Master --host 


For the worker:

.\bin\spark-class.cmd org.apache.spark.deploy.worker.Worker spark://:7077


To wrap these in services, I've user yasw or nssm.

Thanks,
Tim




-Original Message-
From: Jacek Laskowski [mailto:ja...@japila.pl] 
Sent: Tuesday, 1 December 2015 4:18 AM
To: Shuo Wang
Cc: user
Subject: Re: Cant start master on windows 7

On Fri, Nov 27, 2015 at 4:27 PM, Shuo Wang  wrote:

> I am trying to use the start-master.sh script on windows 7.

From http://spark.apache.org/docs/latest/spark-standalone.html:

"Note: The launch scripts do not currently support Windows. To run a
Spark cluster on Windows, start the master and workers by hand."

Can you start the command by hand? Just copy and paste the command
from the logs. Mind the spaces!

Jacek

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


_

The information transmitted in this message and its attachments (if any) is 
intended 
only for the person or entity to which it is addressed.
The message may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance 
upon this information, by persons or entities other than the intended recipient 
is 
prohibited.

If you have received this in error, please contact the sender and delete this 
e-mail 
and associated material from any computer.

The intended recipient of this e-mail may only use, reproduce, disclose or 
distribute 
the information contained in this e-mail and any attached files, with the 
permission 
of the sender.

This message has been scanned for viruses.
_

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: how to group timestamp data and filter on it

2015-11-18 Thread Tim Barthram
Hi LCassa,

Try:

Map to pair, then reduce by key.

The spark documentation is a pretty good reference for this & there are plenty 
of word count examples on the internet.

Warm regards,
TimB


From: Cassa L [mailto:lcas...@gmail.com]
Sent: Thursday, 19 November 2015 11:27 AM
To: user
Subject: how to group timestamp data and filter on it

Hi,
I have a data stream (JavaDStream) in following format-
timestamp=second1,  map(key1=value1, key2=value2)
timestamp=second2,map(key1=value3, key2=value4)
timestamp=second2, map(key1=value1, key2=value5)

I want to group data by 'timestamp' first and then filter each RDD for 
Key1=value1 or key1=value3 etc.
Each of above row represent POJO in RDD like:
public class Data{
long timestamp;
Map map;
}
How do do this in spark? I am trying to figure out if I need to use map or 
flatMap etc?
Thanks,
LCassa


_

The information transmitted in this message and its attachments (if any) is 
intended 
only for the person or entity to which it is addressed.
The message may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance 
upon this information, by persons or entities other than the intended recipient 
is 
prohibited.

If you have received this in error, please contact the sender and delete this 
e-mail 
and associated material from any computer.

The intended recipient of this e-mail may only use, reproduce, disclose or 
distribute 
the information contained in this e-mail and any attached files, with the 
permission 
of the sender.

This message has been scanned for viruses.
_


RE: TightVNC - Application Monitor (right pane)

2015-11-17 Thread Tim Barthram
Hi,

I have a spark kafka streaming application that works when I run with a local 
spark context, but not with a remote one.

My code consists of:
1.  A spring-boot application that creates the context
2.  A shaded jar file containing all of my spark code

On my pc (windows), I have a spark (1.5.1) master and worker running.

The entry point for my application is the start() method.

The code is:

  @throws(classOf[Exception])
  def start {
val ssc: StreamingContext = createStreamingContext

val messagesRDD = createKafkaDStream(ssc, "myTopic", 2)

def datasRDD = messagesRDD.map((line : String) => 
MapFunctions.lineToSparkEventData(line))

def count = datasRDD.count()
   datasRDD.print(1)

ssc.start
ssc.awaitTermination
  }


  private def createStreamingContext: StreamingContext = {
System.setProperty(HADOOP_HOME_DIR, configContainer.getHadoopHomeDir)
System.setProperty("spark.streaming.concurrentJobs", 
String.valueOf(configContainer.getStreamingConcurrentJobs))
def sparkConf = createSparkConf()

val ssc = new StreamingContext(sparkConf, 
Durations.seconds(configContainer.getStreamingContextDurationSeconds))
ssc.sparkContext.setJobGroup("main_streaming_job", "streaming context 
start")
ssc.sparkContext.setLocalProperty("spark.scheduler.pool", "real_time_pool")
ssc
  }


  private def createSparkConf() : SparkConf = {
  def masterString = "spark://<>:7077"
def conf = new SparkConf().setMaster(masterString).setAppName("devAppRem")  
// This is not working
//def conf = new 
SparkConf().setMaster("local[4]").setAppName("devAppLocal")  // This IS working

conf.set("spark.scheduler.allocation.file", 
"D:\\valid_path_to\\fairscheduler.xml");


val pathToShadedApplicationJar: String = 
configContainer.getApplicationJarPaths.get(0)
val jars: Array[String] = Array[String](pathToShadedApplicationJar)
conf.setJars(jars)

conf.set("spark.scheduler.mode", "FAIR")

  }


  private def createKafkaDStream(ssc: StreamingContext, topics: String, 
numThreads: Int): DStream[String] = {
val zkQuorum: String = configContainer.getZkQuorum
val groupId: String = configContainer.getGroupId
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
def lines = KafkaUtils.createStream(ssc, zkQuorum, groupId, topicMap, 
StorageLevel.MEMORY_AND_DISK_SER).map(_._2)
lines
  }
}


The Error that I get is:

2015-11-18 10:41:19.191  WARN 3044 --- [result-getter-3] 
o.apache.spark.scheduler.TaskSetManager  : Lost task 0.0 in stage 2.0 (TID 70, 
169.254.95.56): java.io.IOException: java.lang.ClassNotFoundException: 
org.apache.spark.streaming.kafka.KafkaReceiver
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
at 
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.streaming.kafka.KafkaReceiver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at