Hi
We used spark.sql to create a table using DELTA. We also have a hive
metastore attached to the spark session. Hence, a table gets created in
Hive metastore. We then tried to query the table from Hive. We faced
following issues:
1. SERDE is SequenceFile, should have been Parquet
2. Scema
Hi Liwen,
thanks a ton, I think that there is a difference between a storage class
and metastore, just like there is a difference between a database and file
system and coffee and cup.
It will be wonderful to keep the focus on the fantastic opportunity that
Delta creates for us :)
Regards,
Hi James,
Right now we don't have plans for having a catalog component as part of
Delta Lake, but we are looking to support Hive metastore and also DDL
commands in the near future.
Thanks,
Liwen
On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios
wrote:
> Is there a plan to have a business
Lyft recently open sourced a data discovery tool called Amundsen that can
serve many of the data catalog needs.
https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
https://github.com/lyft/amundsenmetadatalibrary
You still need HMS to store the data schema though.
I have spark cluster on two data centers each. Cluster on spark cluster B
is 6 times slower than cluster A. I ran the same job on both cluster and
time difference is of 6 times. I used the same config and using spark
2.3.3. I checked that on spark UI it displays the slaves nodes but when i
check
Is there a plan to have a business catalog component for the Data Lake? If
not how would someone make a proposal to create an open source project
related to that. I would be interested in building out an open source data
catalog that would use the Hive metadata store as a baseline for technical
Hi Ayan,
Delta is obviously well thought through, its been available in Databricks
since last year and a half now I think and besides that it is from some of
the best minds at work :)
But what may not be well tested in Delta is its availability as a storage
class for HIVE.
How about your
Hi Liwen,
its done https://github.com/delta-io/delta/issues/73
Please let me know in case the description looks fine. I can also
contribute to the test cases in case required.
Regards,
Gourav
On Thu, Jun 20, 2019 at 12:52 AM Liwen Sun wrote:
> Hi Gourav,
>
> Thanks for the suggestion.
Hi Community ,
I am still looking for an answer for this question
I am running a cluster using Spark 2.3.1 , but I wondering if it is safe to
include Spark 2.4.1 and use new features such as higher order functions.
Thank you.
--
Sent from: