Chirs, an SPIP sounds good to me. I agree with Li that it wouldn't be too
difficult to extend the currently functionality to transfer multiple
DataFrames. For the SPIP, I would keep it more high-level and I don't
think it's necessary to include details of the Python worker, we can hash
that out
Thanks Chris, look forward to it.
I think sending multiple dataframes to the python worker requires some
changes but shouldn't be too difficult. We can probably sth like:
[numberOfDataFrames][FirstDataFrameInArrowFormat][SecondDataFrameInArrowFormat]
In:
Like `RDD.map`, you can throw whatever exceptions and they will be
propagated to the driver side and fail the Spark job.
On Mon, Apr 8, 2019 at 3:10 PM Andrew Melo wrote:
> Hello,
>
> I'm developing a (java) DataSourceV2 to read a columnar fileformat
> popular in a number of physical sciences
Hi,
I created the PR https://github.com/apache/spark/pull/24299 few days ago
and the jira is still unassigned, how can I be sure that the jira/PR are
taken into account?
Kind regards
--
Taoufik Dachraoui
Hi,
Just to say, I really do think this is useful and am currently working on a
SPIP to formally propose this. One concern I do have, however, is that the
current arrow serialization code is tied to passing through a single dataframe
as the udf parameter and so any modification to allow
Hello,
I'm developing a (java) DataSourceV2 to read a columnar fileformat
popular in a number of physical sciences (https://root.cern.ch/). (I
also understand that the API isn't fixed and subject to change).
My question is -- what is the expected way to transmit exceptions from
the DataSource up