k
> point and restart?
>
> On Mon, 17 Apr 2017 at 10:13 pm, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi everybody,
>>
>> is there a way to complete invalidate or remove the state used by
>> mapWithState, not only for a given
and therefore takes to long.
I tried to unpersist the RDD retrieved by stateSnapshot (
stateSnapshots().transform(_.unpersist()) ) , but this did not work as
expected.
Thank you,
Matthias
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Hochstraße 11
> weight, so it is cached on executors. Group id is part of the cache
> key. I
> >> assumed kafka users would use different group ids for consumers they
> wanted
> >> to be distinct, since otherwise would cause problems even with the
> normal
> >> kafka consume
Koeninger <c...@koeninger.org>:
> Just out of curiosity, have you tried using separate group ids for the
> separate streams?
>
> On Oct 11, 2016 4:46 AM, "Matthias Niehoff" <matthias.niehoff@codecentric.
> de> wrote:
>
>> I stripped down the job to
multiple times,
until about on minute has passed. I think this class is responsible for the
endless loop, scheduling the microbatches, but I do not know exactly what
it does and why it has a problem with multiple Kafka Direct Streams.
2016-10-11 11:46 GMT+02:00 Matthias Niehoff <matthias.n
duler: Added jobs for time 1476100946000 ms
16/10/10 14:03:26 INFO MapPartitionsRDD: Removing RDD 889 from persistence list
16/10/10 14:03:26 INFO JobScheduler: Starting job streaming job
1476100946000 ms.0 from job set of time 1476100946000 ms
2016-10-11 9:28 GMT+02:00 Matthias Niehoff <matthi
chema: String,
recordMapper: GenericRecord => EventType): DStream[EventType] = {
serializedAdRequestStream.mapPartitions {
iteratorOfMessages =>
val schema: Schema = new Schema.Parser().parse(avroSchema)
val recordInjection = GenericAvroCodecs.toBinary(schema)
hare?
>
> On Tue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff
> <matthias.nieh...@codecentric.de> wrote:
> > Hi,
> > sry for the late reply. A public holiday in Germany.
> >
> > Yes, its using a unique group id which no other job or consumer group is
> >
-09-27 18:55 GMT+02:00 Cody Koeninger <c...@koeninger.org>:
> What's the actual stacktrace / exception you're getting related to
> commit failure?
>
> On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff
> <matthias.nieh...@codecentric.de> wrote:
> > Hi everybody,
>
reset" -> "largest",
"enable.auto.commit" -> "false",
"max.poll.records" -> "1000"
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49
list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -----
> To unsubscribe e-mail: user-unsubscr
Hi everybody,
as far as I understand with new the structured Streaming API the output
will not get processed every x seconds anymore. Instead the data will be
processed as soon as is arrived. But there might be a delay due to
processing time for the data.
A small example:
Data comes in and the
between to DStreams?
Thank you
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de
(amazonRating)
> println("amazon product with rating sent to kafka cluster..." +
> amazonRating.toString)
> System.exit(0)
> }
>
> }
> }
>
>
> I have written a stack overflow post
> <http://stackoverflow.com/questions/37303202/about-an-
to keep all the values in the window to compute the average. One way
would be add every new value to a list in the reduce method and then to the
avg computation in a separate map, but this seems kind of ugly.
Do you have an idea how to solve this?
Thanks!
--
Matthias Niehoff | IT-Consultant
+= "org.apache.spark" %% "spark-streaming-flume" %
> "1.3.0" % "provided"
> ...
> But compilations fail mentioning that class StateSpec and State are not
> found
>
> Could pls someone point me to the right packages to refer if i want
0) ++ buffer2.getSeq[String](0)
}
def evaluate(buffer: Row): Any = {
buffer
}
}
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil:
is the Type of the List? Or in
other words: What is mapping of StructType with StructFields into Scala
collection/data types?
Thanks!
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681
the individual to whom it is addressed. It may
>> contain privileged or confidential information and should not be circulated
>> or used for any purpose other than for what it is intended. If you have
>> received this message in error, please notify the originator immediate
t;> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>
-spark-user-list.1001560.n3.nabble.com/Streaming-job-delays-tp26433.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>
Sumedh Wale <sw...@snappydata.io>:
> On Wednesday 02 March 2016 09:39 PM, Matthias Niehoff wrote:
>
> no, not to driver and executor but to the master and worker instances of
> the spark standalone cluster
>
>
> Why exactly does adding jars to driver/executor extraClassP
no, not to driver and executor but to the master and worker instances of
the spark standalone cluster
Am 2. März 2016 um 17:05 schrieb Igor Berman <igor.ber...@gmail.com>:
> spark.driver.extraClassPath
> spark.executor.extraClassPath
>
> 2016-03-02 18:01 GMT+02:0
e also using
—driver-java-options
and spark.executor.extraClassPath which leads to exceptions when running
our apps in standalone cluster mode.
So what is the best way to add jars to the master and worker classpath?
Thank you
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Cons
t;
> On Thu, Feb 4, 2016 at 9:06 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hello everybody,
>>
>> we’ve bundle our log4j.xml into our jar (in the classpath root).
>>
>> I’ve added the log4j.xml to the spark-de
are using Spark 1.5.2
Thank you!
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de
but no following statements in the
start() method and other methods called from there. (All log at the same
level)
Where do I find this log statements?
Thanks!
--
Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe
Hello everybody,
I have a few (~15) Spark Streaming jobs which have load peaks as well as
long times with a low load. So I thought the new Dynamic Resource
Allocation for Standalone Clusters might be helpful (SPARK-4751).
I have a test "cluster" with 1 worker consisting of 4 executors with 2
28 matches
Mail list logo