Hi Sean and Holden,
I decided it was best to send an email so I could share all my findings
with the team. I think it should be relatively easy to fix with updates but
I am not that good at working on the repo. I tried but ended up with some
roadblocks that were going to take some time to figure
We are trying to switch from Postgres to the Spark's built-in Hive with
Thrift server as the data sink to persist the ML result data, with the
hope that Hive would improve the ML pipeline performance. However, it
turned out that it took significantly longer for Hive to persist
dataframes (via
Hi,
I'm developing a new Spark connector using data source v2 API (spark 3.1.1).
I noticed that the planInputPartitions method (in MicroBatchStream) is
called twice every micro-batch.
What the motivation/reason is?
Thanks,
Kineret