Exposing custom functions defined over a scala Dataframe to Python.

2020-08-13 Thread rahul kumar
Hello Devs, I have been working on a custom spark connector. This connector is implemented in Scala. So, far I have defined a custom join-kinda. function over dataframe. Now I want to expose it in pyspark as well. I did some investigation in the spark project and it appears to me that all

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Walaa Eldin Moustafa
+1 to making views as special forms of tables. Sometimes a table can be converted to a view to hide some of the implementation details while not impacting readers (provided that the write path is controlled). Also, views can be defined on top of either other views or base tables, so the less

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

2020-08-13 Thread Jason Moore
Thank you so much! Any update on getting the RC1 up for vote? Jason. From: 郑瑞峰 Sent: Wednesday, 5 August 2020 12:54 PM To: Jason Moore ; Spark dev list Subject: 回复: [DISCUSS] Apache Spark 3.0.1 Release Hi all, I am going to prepare the realease of 3.0.1 RC1,

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Burak Yavuz
My high level comment here is that as a naive person, I would expect a View to be a special form of Table that SupportsRead but doesn't SupportWrite. loadTable in the TableCatalog API should load both tables and views. This way you avoid multiple RPCs to a catalog or data source or metastore, and

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread John Zhuge
Thanks Ryan. ViewCatalog API mimics TableCatalog API including how shared namespace is handled: - The doc for createView states "it will throw ViewAlreadyExistsException when a view or table

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Ryan Blue
I agree with Wenchen that we need to be clear about resolution and behavior. For example, I think that we would agree that CREATE VIEW catalog.schema.name should fail when there is a table named catalog.schema.name. We’ve already included this behavior in the documentation for the TableCatalog API

Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-13 Thread Michel Sumbul
Hi guys, Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in HA mode (with kerberos)? I have a wordcount job running with Spark3 reading data on HDFS (hadoop 3.1) everything secure with kerberos. Everything works fine if the data folder is not encrypted (spark on k8s). If

Re: Companies and Organizations listing for datapipelines.com

2020-08-13 Thread Sean Owen
I thought I replied but maybe not - yes that is fine, you just need to open a pull request to update it at https://github.com/apache/spark-website/ On Thu, Aug 13, 2020 at 12:03 AM wrote: > > Hi there, > > Just following up on my previous email since I haven't heard back. Some more > info about

Spark 3: creating schema for hive metastore hangs forever

2020-08-13 Thread Tomas Bartalos
Hello, I'm using spark-3.0.0-bin-hadoop3.2 with custom hive metastore DB (postgres). I'm setting the "autoCreateAll" flag to true, so hive is creating its relational schema on first use. The problem is there is a deadlock and the query hangs forever: *Tx1* (*holds lock on TBLS relation*,