Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Michael Armbrust
> > Thanks for confirmation. We are using the workaround to create a separate > Hive external table STORED AS PARQUET with the exact location of Delta > table. Our use case is batch-driven and we are running VACUUM with 0 > retention after every batch is completed. Do you see any potential problem

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread ayan guha
Hi Thanks for confirmation. We are using the workaround to create a separate Hive external table STORED AS PARQUET with the exact location of Delta table. Our use case is batch-driven and we are running VACUUM with 0 retention after every batch is completed. Do you see any potential problem with

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Tathagata Das
@ayan guha @Gourav Sengupta Delta Lake is OSS currently does not support defining tables in Hive metastore using DDL commands. We are hoping to add the necessary compatibility fixes in Apache Spark to make Delta Lake work with tables and DDL commands. So we will support them in a future release.

Re: Announcing Delta Lake 0.2.0

2019-06-21 Thread Gourav Sengupta
Hi Ayan, I may be wrong about this, but I think that Delta files are in Parquet format. But I am sure that you have already checked this. Am I missing something? Regards, Gourav Sengupta On Fri, Jun 21, 2019 at 6:39 AM ayan guha wrote: > Hi > We used spark.sql to create a table using DELTA.

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread ayan guha
Hi We used spark.sql to create a table using DELTA. We also have a hive metastore attached to the spark session. Hence, a table gets created in Hive metastore. We then tried to query the table from Hive. We faced following issues: 1. SERDE is SequenceFile, should have been Parquet 2. Scema

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Gourav Sengupta
Hi Liwen, thanks a ton, I think that there is a difference between a storage class and metastore, just like there is a difference between a database and file system and coffee and cup. It will be wonderful to keep the focus on the fantastic opportunity that Delta creates for us :) Regards,

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Liwen Sun
Hi James, Right now we don't have plans for having a catalog component as part of Delta Lake, but we are looking to support Hive metastore and also DDL commands in the near future. Thanks, Liwen On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios wrote: > Is there a plan to have a business

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Li Gao
Lyft recently open sourced a data discovery tool called Amundsen that can serve many of the data catalog needs. https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 https://github.com/lyft/amundsenmetadatalibrary You still need HMS to store the data schema though.

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread James Cotrotsios
Is there a plan to have a business catalog component for the Data Lake? If not how would someone make a proposal to create an open source project related to that. I would be interested in building out an open source data catalog that would use the Hive metadata store as a baseline for technical

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Gourav Sengupta
Hi Ayan, Delta is obviously well thought through, its been available in Databricks since last year and a half now I think and besides that it is from some of the best minds at work :) But what may not be well tested in Delta is its availability as a storage class for HIVE. How about your

Re: Announcing Delta Lake 0.2.0

2019-06-20 Thread Gourav Sengupta
Hi Liwen, its done https://github.com/delta-io/delta/issues/73 Please let me know in case the description looks fine. I can also contribute to the test cases in case required. Regards, Gourav On Thu, Jun 20, 2019 at 12:52 AM Liwen Sun wrote: > Hi Gourav, > > Thanks for the suggestion.

Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread ayan guha
Hi We are using Delta features. The only problem we faced till now is Hive can not read DELTA outputs by itself (even if the Hive metastore is shared). However, if we create hive external table pointing to the folder (and with Vacuum), it can read the data. Other than that, the feature looks

Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
Hi Gourav, Thanks for the suggestion. Please open a Github issue at https://github.com/delta-io/delta/issues to describe your use case and requirements for "external tables" so we can better track this feature and also get feedback from the community. Regards, Liwen On Wed, Jun 19, 2019 at

Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi, does Delta support external tables? I think that most users will be needing this. Regards, Gourav On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun wrote: > We are delighted to announce the availability of Delta Lake 0.2.0! > > To try out Delta Lake 0.2.0, please follow the Delta Lake

Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi, this is fantastic :) Regards, Gourav Sengupta On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun wrote: > We are delighted to announce the availability of Delta Lake 0.2.0! > > To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart: > https://docs.delta.io/0.2.0/quick-start.html > >

Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
We are delighted to announce the availability of Delta Lake 0.2.0! To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart: https://docs.delta.io/0.2.0/quick-start.html To view the release notes: https://github.com/delta-io/delta/releases/tag/v0.2.0 This release introduces two main