from:"Matthias Niehoff"

Re: Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff

k > point and restart? > > On Mon, 17 Apr 2017 at 10:13 pm, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hi everybody, >> >> is there a way to complete invalidate or remove the state used by >> mapWithState, not only for a given

Invalidating/Remove complete mapWithState state

2017-04-17 Thread Matthias Niehoff

and therefore takes to long. I tried to unpersist the RDD retrieved by stateSnapshot ( stateSnapshots().transform(_.unpersist()) ) , but this did not work as expected. Thank you, Matthias -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Hochstraße 11

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-17 Thread Matthias Niehoff

> weight, so it is cached on executors. Group id is part of the cache > key. I > >> assumed kafka users would use different group ids for consumers they > wanted > >> to be distinct, since otherwise would cause problems even with the > normal > >> kafka consume

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff

Koeninger <c...@koeninger.org>: > Just out of curiosity, have you tried using separate group ids for the > separate streams? > > On Oct 11, 2016 4:46 AM, "Matthias Niehoff" <matthias.niehoff@codecentric. > de> wrote: > >> I stripped down the job to

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff

multiple times, until about on minute has passed. I think this class is responsible for the endless loop, scheduling the microbatches, but I do not know exactly what it does and why it has a problem with multiple Kafka Direct Streams. 2016-10-11 11:46 GMT+02:00 Matthias Niehoff <matthias.n

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff

duler: Added jobs for time 1476100946000 ms 16/10/10 14:03:26 INFO MapPartitionsRDD: Removing RDD 889 from persistence list 16/10/10 14:03:26 INFO JobScheduler: Starting job streaming job 1476100946000 ms.0 from job set of time 1476100946000 ms 2016-10-11 9:28 GMT+02:00 Matthias Niehoff <matthi

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff

chema: String, recordMapper: GenericRecord => EventType): DStream[EventType] = { serializedAdRequestStream.mapPartitions { iteratorOfMessages => val schema: Schema = new Schema.Parser().parse(avroSchema) val recordInjection = GenericAvroCodecs.toBinary(schema)

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-10 Thread Matthias Niehoff

hare? > > On Tue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff > <matthias.nieh...@codecentric.de> wrote: > > Hi, > > sry for the late reply. A public holiday in Germany. > > > > Yes, its using a unique group id which no other job or consumer group is > >

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Matthias Niehoff

-09-27 18:55 GMT+02:00 Cody Koeninger <c...@koeninger.org>: > What's the actual stacktrace / exception you're getting related to > commit failure? > > On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff > <matthias.nieh...@codecentric.de> wrote: > > Hi everybody, >

Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Matthias Niehoff

reset" -> "largest", "enable.auto.commit" -> "false", "max.poll.records" -> "1000" -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49

Re: How spark decides whether to do BroadcastHashJoin or SortMergeJoin

2016-07-22 Thread Matthias Niehoff

list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe e-mail: user-unsubscr

Structured Streaming and Microbatches

2016-07-13 Thread Matthias Niehoff

Hi everybody, as far as I understand with new the structured Streaming API the output will not get processed every x seconds anymore. Instead the data will be processed as soon as is arrived. But there might be a delay due to processing time for the data. A small example: Data comes in and the

Substract two DStreams

2016-06-15 Thread Matthias Niehoff

between to DStreams? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Re: about an exception when receiving data from kafka topic using Direct mode of Spark Streaming

2016-05-25 Thread Matthias Niehoff

(amazonRating) > println("amazon product with rating sent to kafka cluster..." + > amazonRating.toString) > System.exit(0) > } > > } > } > > > I have written a stack overflow post > <http://stackoverflow.com/questions/37303202/about-an-

Sliding Average over Window in Spark Streaming

2016-05-06 Thread Matthias Niehoff

to keep all the values in the window to compute the average. One way would be add every new value to a list in the reduce method and then to the avg computation in a separate map, but this seems kind of ugly. Do you have an idea how to solve this? Thanks! -- Matthias Niehoff | IT-Consultant

Re: Please assist: Spark 1.5.2 / cannot find StateSpec / State

2016-04-13 Thread Matthias Niehoff

+= "org.apache.spark" %% "spark-streaming-flume" % > "1.3.0" % "provided" > ... > But compilations fail mentioning that class StateSpec and State are not > found > > Could pls someone point me to the right packages to refer if i want

Do not wrap result of a UDAF in an Struct

2016-03-29 Thread Matthias Niehoff

0) ++ buffer2.getSeq[String](0) } def evaluate(buffer: Row): Any = { buffer } } -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil:

DataFrames UDAF with array and struct

2016-03-23 Thread Matthias Niehoff

is the Type of the List? Or in other words: What is mapping of StructType with StructFields into Scala collection/data types? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Matthias Niehoff

the individual to whom it is addressed. It may >> contain privileged or confidential information and should not be circulated >> or used for any purpose other than for what it is intended. If you have >> received this message in error, please notify the originator immediate

Re: Streaming job delays

2016-03-10 Thread Matthias Niehoff

t;> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>>

Re: Streaming job delays

2016-03-09 Thread Matthias Niehoff

-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >

Re: Add Jars to Master/Worker classpath

2016-03-03 Thread Matthias Niehoff

Sumedh Wale <sw...@snappydata.io>: > On Wednesday 02 March 2016 09:39 PM, Matthias Niehoff wrote: > > no, not to driver and executor but to the master and worker instances of > the spark standalone cluster > > > Why exactly does adding jars to driver/executor extraClassP

Re: Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff

no, not to driver and executor but to the master and worker instances of the spark standalone cluster Am 2. März 2016 um 17:05 schrieb Igor Berman <igor.ber...@gmail.com>: > spark.driver.extraClassPath > spark.executor.extraClassPath > > 2016-03-02 18:01 GMT+02:0

Add Jars to Master/Worker classpath

2016-03-02 Thread Matthias Niehoff

e also using —driver-java-options and spark.executor.extraClassPath which leads to exceptions when running our apps in standalone cluster mode. So what is the best way to add jars to the master and worker classpath? Thank you -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Cons

Re: Using jar bundled log4j.xml on worker nodes

2016-02-05 Thread Matthias Niehoff

t; > On Thu, Feb 4, 2016 at 9:06 AM, Matthias Niehoff < > matthias.nieh...@codecentric.de> wrote: > >> Hello everybody, >> >> we’ve bundle our log4j.xml into our jar (in the classpath root). >> >> I’ve added the log4j.xml to the spark-de

Using jar bundled log4j.xml on worker nodes

2016-02-04 Thread Matthias Niehoff

are using Spark 1.5.2 Thank you! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de

Logs of Custom Receiver

2015-11-30 Thread Matthias Niehoff

but no following statements in the start() method and other methods called from there. (All log at the same level) Where do I find this log statements? Thanks! -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe

Dynamic Resource Allocation with Spark Streaming (Standalone Cluster, Spark 1.5.1)

2015-10-26 Thread Matthias Niehoff

Hello everybody, I have a few (~15) Spark Streaming jobs which have load peaks as well as long times with a low load. So I thought the new Dynamic Resource Allocation for Standalone Clusters might be helpful (SPARK-4751). I have a test "cluster" with 1 worker consisting of 4 executors with 2

Re: Invalidating/Remove complete mapWithState state

Invalidating/Remove complete mapWithState state

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Re: Problems with new experimental Kafka Consumer for 0.10

Problems with new experimental Kafka Consumer for 0.10

Re: How spark decides whether to do BroadcastHashJoin or SortMergeJoin

Structured Streaming and Microbatches

Substract two DStreams

Re: about an exception when receiving data from kafka topic using Direct mode of Spark Streaming

Sliding Average over Window in Spark Streaming

Re: Please assist: Spark 1.5.2 / cannot find StateSpec / State

Do not wrap result of a UDAF in an Struct

DataFrames UDAF with array and struct

Re: Spark job for Reading time series data from Cassandra

Re: Streaming job delays

Re: Streaming job delays

Re: Add Jars to Master/Worker classpath

Re: Add Jars to Master/Worker classpath

Add Jars to Master/Worker classpath

Re: Using jar bundled log4j.xml on worker nodes

Using jar bundled log4j.xml on worker nodes

Logs of Custom Receiver

Dynamic Resource Allocation with Spark Streaming (Standalone Cluster, Spark 1.5.1)

28 matches

Site Navigation

Mail list logo

Footer information