Hi,
We have a structured streaming application, and we face a memory leak while
caching in the foreachBatch block.
We do unpersist every iteration, and we also verify via
"spark.sparkContext.getPersistentRDDs" that we don't have unnecessary
cached data.
We also noted in the profiler that many
I have spark batch application writing to ADLS Gen2 (hierarchy).
When designing the application I was sure the spark would perform global
commit once the job is committed, but what it really does it commits on
each task, meaning *once task completes writing it moves from temp to
target storage*.
Hi,
What is the expected behavior if the streaming is stopped after the write
commit and before the read commit? Should I expect data duplication?
Thanks.
Hi,
I'm developing a new Spark connector using data source v2 API (spark 3.1.1).
I noticed that the planInputPartitions method (in MicroBatchStream) is
called twice every micro-batch.
What the motivation/reason is?
Thanks,
Kineret
Hi,
I would like to support data locality in Spark data source v2. How can I
provide Spark the ability to read and process data on the same node?
I didn't find any interface that supports 'getPreferredLocations' (or
equivalent).
Thanks!
Hi All,
I get 'No plan for EventTimeWatermark' error while doing a query with
columns pruning using structured streaming with a custom data source that
implements Spark datasource v2.
My data source implementation that handles the schemas includes the
following:
class MyDataSourceReader extends
I write spark data source v2 in spark 2.3 and I want to support
writeStream. What should I do in order to do so?
my defaultSource class:
class MyDefaultSource extends DataSourceV2 with ReadSupport with
WriteSupport with MicroBatchReadSupport { ..
Which interface is missing?
I try to read a stream using my custom data source (v2, using spark 2.3),
and it fails *in the second iteration* with the following exception while
reading prune columns:Query [id=xxx, runId=yyy] terminated with exception:
assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,... 26 more
I have the same problem as described in the following question in
StackOverflow (but nobody has answered to it).
https://stackoverflow.com/questions/51103634/spark-streaming-schema-mismatch-using-microbatchreader-with-columns-pruning
Any idea of how to solve it (using Spark 2.3)?
Thanks,