Re: Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff
k > point and restart? > > On Mon, 17 Apr 2017 at 10:13 pm, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hi everybody, >> >> is there a way to complete invalidate or remove the state used by >> mapWithState, not only for a given

Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff
and therefore takes to long. I tried to unpersist the RDD retrieved by stateSnapshot ( stateSnapshots().transform(_.unpersist()) ) , but this did not work as expected. Thank you, Matthias -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Hochstraße 11

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-17 Thread Matthias Niehoff
> weight, so it is cached on executors. Group id is part of the cache > key. I > >> assumed kafka users would use different group ids for consumers they > wanted > >> to be distinct, since otherwise would cause problems even with the > normal > >> kafka consume

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
Koeninger <c...@koeninger.org>: > Just out of curiosity, have you tried using separate group ids for the > separate streams? > > On Oct 11, 2016 4:46 AM, "Matthias Niehoff" <matthias.niehoff@codecentric. > de> wrote: > >> I stripped down the job to

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
multiple times, until about on minute has passed. I think this class is responsible for the endless loop, scheduling the microbatches, but I do not know exactly what it does and why it has a problem with multiple Kafka Direct Streams. 2016-10-11 11:46 GMT+02:00 Matthias Niehoff <matthias.n

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
duler: Added jobs for time 1476100946000 ms 16/10/10 14:03:26 INFO MapPartitionsRDD: Removing RDD 889 from persistence list 16/10/10 14:03:26 INFO JobScheduler: Starting job streaming job 1476100946000 ms.0 from job set of time 1476100946000 ms 2016-10-11 9:28 GMT+02:00 Matthias Niehoff <matthi

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
chema: String, recordMapper: GenericRecord => EventType): DStream[EventType] = { serializedAdRequestStream.mapPartitions { iteratorOfMessages => val schema: Schema = new Schema.Parser().parse(avroSchema) val recordInjection = GenericAvroCodecs.toBinary(schema)

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-10 Thread Matthias Niehoff
hare? > > On Tue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff > <matthias.nieh...@codecentric.de> wrote: > > Hi, > > sry for the late reply. A public holiday in Germany. > > > > Yes, its using a unique group id which no other job or consumer group is > >

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Matthias Niehoff
-09-27 18:55 GMT+02:00 Cody Koeninger <c...@koeninger.org>: > What's the actual stacktrace / exception you're getting related to > commit failure? > > On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff > <matthias.nieh...@codecentric.de> wrote: > > Hi everybody, >

Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Matthias Niehoff
reset" -> "largest", "enable.auto.commit" -> "false", "max.poll.records" -> "1000" -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49

Re: How spark decides whether to do BroadcastHashJoin or SortMergeJoin

2016-07-22 Thread Matthias Niehoff
list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe e-mail: user-unsubscr

Structured Streaming and Microbatches

2016-07-13 Thread Matthias Niehoff
Hi everybody, as far as I understand with new the structured Streaming API the output will not get processed every x seconds anymore. Instead the data will be processed as soon as is arrived. But there might be a delay due to processing time for the data. A small example: Data comes in and the

Substract two DStreams

2016-06-15 Thread Matthias Niehoff
between to DStreams? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Re: about an exception when receiving data from kafka topic using Direct mode of Spark Streaming

2016-05-25 Thread Matthias Niehoff
(amazonRating) > println("amazon product with rating sent to kafka cluster..." + > amazonRating.toString) > System.exit(0) > } > > } > } > > > I have written a stack overflow post > <http://stackoverflow.com/questions/37303202/about-an-

Sliding Average over Window in Spark Streaming

2016-05-06 Thread Matthias Niehoff
to keep all the values in the window to compute the average. One way would be add every new value to a list in the reduce method and then to the avg computation in a separate map, but this seems kind of ugly. Do you have an idea how to solve this? Thanks! -- Matthias Niehoff | IT-Consultant

Re: Please assist: Spark 1.5.2 / cannot find StateSpec / State

2016-04-13 Thread Matthias Niehoff
+= "org.apache.spark" %% "spark-streaming-flume" % > "1.3.0" % "provided" > ... > But compilations fail mentioning that class StateSpec and State are not > found > > Could pls someone point me to the right packages to refer if i want

Do not wrap result of a UDAF in an Struct

2016-03-29 Thread Matthias Niehoff
0) ++ buffer2.getSeq[String](0) } def evaluate(buffer: Row): Any = { buffer } } -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil:

DataFrames UDAF with array and struct

2016-03-23 Thread Matthias Niehoff
is the Type of the List? Or in other words: What is mapping of StructType with StructFields into Scala collection/data types? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Matthias Niehoff
the individual to whom it is addressed. It may >> contain privileged or confidential information and should not be circulated >> or used for any purpose other than for what it is intended. If you have >> received this message in error, please notify the originator immediate

Re: Streaming job delays

2016-03-10 Thread Matthias Niehoff
t;> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>>

Re: Streaming job delays

2016-03-09 Thread Matthias Niehoff
-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >

Re: Add Jars to Master/Worker classpath

2016-03-03 Thread Matthias Niehoff
Sumedh Wale <sw...@snappydata.io>: > On Wednesday 02 March 2016 09:39 PM, Matthias Niehoff wrote: > > no, not to driver and executor but to the master and worker instances of > the spark standalone cluster > > > Why exactly does adding jars to driver/executor extraClassP

Re: Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff
no, not to driver and executor but to the master and worker instances of the spark standalone cluster Am 2. März 2016 um 17:05 schrieb Igor Berman <igor.ber...@gmail.com>: > spark.driver.extraClassPath > spark.executor.extraClassPath > > 2016-03-02 18:01 GMT+02:0

Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff
e also using —driver-java-options and spark.executor.extraClassPath which leads to exceptions when running our apps in standalone cluster mode. So what is the best way to add jars to the master and worker classpath? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Cons

Re: Using jar bundled log4j.xml on worker nodes

2016-02-05 Thread Matthias Niehoff
t; > On Thu, Feb 4, 2016 at 9:06 AM, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hello everybody, >> >> we’ve bundle our log4j.xml into our jar (in the classpath root). >> >> I’ve added the log4j.xml to the spark-de

Using jar bundled log4j.xml on worker nodes

2016-02-04 Thread Matthias Niehoff
are using Spark 1.5.2 Thank you! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Logs of Custom Receiver

2015-11-30 Thread Matthias Niehoff
but no following statements in the start() method and other methods called from there. (All log at the same level) Where do I find this log statements? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe

Dynamic Resource Allocation with Spark Streaming (Standalone Cluster, Spark 1.5.1)

2015-10-26 Thread Matthias Niehoff
Hello everybody, I have a few (~15) Spark Streaming jobs which have load peaks as well as long times with a low load. So I thought the new Dynamic Resource Allocation for Standalone Clusters might be helpful (SPARK-4751). I have a test "cluster" with 1 worker consisting of 4 executors with 2