Ryan thanks for putting up a list!

Generally there are a few tunning to the data source v2 API in 2.4, and it
shouldn't be too hard if you already have a data source v2 implementation
and you want to upgrade to Spark 2.4.

However, we do want to do some big API changes for data source v2 in the
next release, e.g.
SPARK-25390 <https://issues.apache.org/jira/browse/SPARK-25390>: data
source V2 API refactoring
SPARK-25531 <https://issues.apache.org/jira/browse/SPARK-25531>: new write
APIs for data source v2
SPARK-24252 <https://issues.apache.org/jira/browse/SPARK-24252>: Add
catalog support


On Tue, Oct 2, 2018 at 1:11 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi Assaf,
> The major changes to the V2 API that you linked to aren’t going into 2.4.
> Those will be in the next release because they weren’t finished in time for
> 2.4.
>
> Here are the major updates that will be in 2.4:
>
>    - SPARK-23323 <https://issues.apache.org/jira/browse/SPARK-23323>: The
>    output commit coordinator is used by default to ensure only one attempt of
>    each task commits.
>    - SPARK-23325 <https://issues.apache.org/jira/browse/SPARK-23325> and
>    SPARK-24971 <https://issues.apache.org/jira/browse/SPARK-24971>:
>    Readers should always produce InternalRow instead of Row or UnsafeRow; see
>    SPARK-23325 for detail.
>    - SPARK-24990 <https://issues.apache.org/jira/browse/SPARK-24990>:
>    ReadSupportWithSchema was removed, the user-supplied schema option was
>    added to ReadSupport.
>    - SPARK-24073 <https://issues.apache.org/jira/browse/SPARK-24073>:
>    Read splits are now called InputPartition and a few methods were also
>    renamed for clarity.
>    - SPARK-25127 <https://issues.apache.org/jira/browse/SPARK-25127>:
>    SupportsPushDownCatalystFilters was removed because it leaked Expression in
>    the public API. V2 always uses the Filter API now.
>    - SPARK-24478 <https://issues.apache.org/jira/browse/SPARK-24478>:
>    Push down is now done when converting the a physical plan.
>
> I think there are also quite a few updates for the streaming side, but I’m
> not as familiar with those so I’ll let someone else jump in with a summary.
>
> rb
>
> On Mon, Oct 1, 2018 at 9:51 AM assaf.mendelson <assaf.mendel...@rsa.com>
> wrote:
>
>> Hi all,
>> I understood from previous threads that the Data source V2 API will see
>> some
>> changes in spark 2.4.0, however, I can't seem to find what these changes
>> are.
>>
>> Is there some documentation which summarizes the changes?
>>
>> The only mention I seem to find is this pull request:
>> https://github.com/apache/spark/pull/22009. Is this all of it?
>>
>> Thanks,
>>     Assaf.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to