I've always had a question about Trigger.Once that I never got around to
ask or test for myself. If you have a 24/7 stream to a Kafka topic.
Will Trigger.Once get the last offset(s) when it starts and then quit once
it hits this offset(s) or will the job run until no new messages is added
to the
Hi,
I have a Spark Scala program created and compiled with Maven. It works
fine. It basically does the following:
1. Reads an xml file from HDFS location
2. Creates a DF on top of what it reads
3. Creates a new DF with some columns renamed etc
4. Creates a new DF for rejected rows
Hi Mich!
I think you can combine the good/rejected into one method that
internally:
- Create good/rejected df's given an input df and input rules/predicates
to apply to the df.
- Create a third df containing the good rows and the rejected rows with
the bad columns nulled out
-
I neglected to include the rationale: the assumption is this will be a
repeatedly needed process thus a reusable method were helpful. The
predicate/input rules that are supported will need to be flexible enough to
support the range of input data domains and use cases . For my workflows
the
Hi Alex, read the book , it is a good one but i don’t see things which I
strongly want to understand.
You are right on the partition and tasks.
1.How to use coalesce with spark structured streaming ?
Also I want to ask few more questions,
2. How to restrict number of executors on structured
If I understand correctly, Trigger.once executes only one micro-batch and
terminates, that's all. Your understanding of structured streaming applies
there as well.
It's like a hybrid approach as bringing incremental processing from
micro-batch but having processing interval as batch. That said,