Re: Vanilla Spark Readers on Iceberg written data..

Gautam Wed, 15 May 2019 17:36:49 -0700

RD,
          Trying to figure  if there are regressions expected between
reader and data. Bypassing metadata is easy for us coz data is in separate
directory. ETL pipeline can point the reader config to the correct
location.


On Wed, May 15, 2019 at 5:14 PM RD <rdsr...@gmail.com> wrote:

> Is backporting relevant datasource patches to Spark 2.3 a non starter? If
> this were doable I believe this is much simpler than bypassing Iceberg
> metadata to read files directly.
>
> -R
>
> On Wed, May 15, 2019 at 3:02 PM Gautam <gautamkows...@gmail.com> wrote:
>
>> Just wanted to add, from what I have tested so far I see this working
>> fine with Vanilla Spark reading Iceberg data.
>>
>> On Wed, May 15, 2019 at 2:59 PM Gautam <gautamkows...@gmail.com> wrote:
>>
>>> Hello There,
>>>                     I am currently doing some testing with Vanilla Spark
>>> Readers'  ability to read Iceberg generated data. This is both from an
>>> Iceberg/Parquet Reader interoperability and Spark version backward
>>> compatibility standpoint (e.g. Spark distributions running v2.3.x  which
>>> doesn't support Iceberg DataSource vs. those running 2.4.x) .
>>>
>>> To be clear I am talking about doing the following on data written by
>>> Iceberg :
>>>
>>> spark.read.format("parquet").load($icebergBasePath + "/data")
>>>
>>> Can I safely assume this will continue to work? If not then what could
>>> be the reasons and associated risks?
>>>
>>> This would be good to know coz these things often come up in migration
>>> path discussions and evaluating costs associated with generating and
>>> keeping two copies of the same data.
>>>
>>> thanks,
>>> - Gautam.
>>>
>>

Re: Vanilla Spark Readers on Iceberg written data..

Reply via email to