Re: Does Spark SQL support GRANT/REVOKE operations on Tables?

2020-06-10 Thread Bill Glennon
Hi,

Should the grant statement be this below?

GRANT INSERT ON TABLE table_priv1 TO user2;
and not
GRANT INSERT ON table_priv1 TO USER user2;


Hope this helps.

Regards,
Bill

On Wed, Jun 10, 2020 at 11:47 PM Nasrulla Khan Haris
 wrote:

> I did enable auth related configs in hive-site.xml as per below document.
>
>
>
> I tried this on Spark 2.4.4. Is it supported ?
>
>
> https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+Metastore+Server
>
>
>
>
>
>
>
> *From:* Nasrulla Khan Haris
> *Sent:* Wednesday, June 10, 2020 5:55 PM
> *To:* user@spark.apache.org
> *Subject:* Does Spark SQL support GRANT/REVOKE operations on Tables?
>
>
>
> HI Spark users,
>
>
>
> I see REVOKE/GRANT operations In list of supported operations but when I
> run the on a table. I see
>
>
>
> Error: org.apache.spark.sql.catalyst.parser.ParseException:
>
> Operation not allowed: GRANT(line 1, pos 0)
>
>
>
>
>
> == SQL ==
>
> GRANT INSERT ON table_priv1 TO USER user2
>
> ^^^
>
>
>
>   at
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:41)
>
>   at
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1047)
>
>   at
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1038)
>
>   at
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>
>   at
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:1038)
>
>   at
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:55)
>
>   at
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:782)
>
>   at
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
>
>   at
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
>
>   at
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
>
>   at
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>
>   at
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:71)
>
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:70)
>
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:100)
>
>   at
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
>
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
>
>   ... 49 elided
>
>
>
>
>
> Error: org.apache.spark.sql.catalyst.parser.ParseException:
>
> mismatched input 'dir' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC',
> 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE',
> 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START',
> 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE',
> 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK',
> 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)
>
>
>
> == SQL ==
>
>
>
>
>
>
>
> Appreciate your response.
>
>
>
> Thanks,
>
> NKH
>


RE: Does Spark SQL support GRANT/REVOKE operations on Tables?

2020-06-10 Thread Nasrulla Khan Haris
I did enable auth related configs in hive-site.xml as per below document.

I tried this on Spark 2.4.4. Is it supported ?
https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+Metastore+Server



From: Nasrulla Khan Haris
Sent: Wednesday, June 10, 2020 5:55 PM
To: user@spark.apache.org
Subject: Does Spark SQL support GRANT/REVOKE operations on Tables?

HI Spark users,

I see REVOKE/GRANT operations In list of supported operations but when I run 
the on a table. I see

Error: org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: GRANT(line 1, pos 0)


== SQL ==
GRANT INSERT ON table_priv1 TO USER user2
^^^

  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:41)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1047)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1038)
  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:1038)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:55)
  at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:782)
  at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:71)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:70)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:100)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  ... 49 elided


Error: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'dir' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 
'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 
'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 
'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 
'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 
'LOAD'}(line 1, pos 0)

== SQL ==



Appreciate your response.

Thanks,
NKH


Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Takeshi Yamamuro
Congrats and thanks, Holden!

Bests,
Takeshi

On Thu, Jun 11, 2020 at 11:16 AM Dongjoon Hyun 
wrote:

> Thank you so much, Holden! :)
>
> On Wed, Jun 10, 2020 at 6:59 PM Hyukjin Kwon  wrote:
>
>> Yay!
>>
>> 2020년 6월 11일 (목) 오전 10:38, Holden Karau 님이 작성:
>>
>>> We are happy to announce the availability of Spark 2.4.6!
>>>
>>> Spark 2.4.6 is a maintenance release containing stability, correctness,
>>> and security fixes.
>>> This release is based on the branch-2.4 maintenance branch of Spark. We
>>> strongly recommend all 2.4 users to upgrade to this stable release.
>>>
>>> To download Spark 2.4.6, head over to the download page:
>>> http://spark.apache.org/downloads.html
>>> Spark 2.4.6 is also available in Maven Central, PyPI, and CRAN.
>>>
>>> Note that you might need to clear your browser cache or
>>> to use `Private`/`Incognito` mode according to your browsers.
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-2.4.6.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this
>>> release. This release would not have been possible without you.
>>>
>>

-- 
---
Takeshi Yamamuro


Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Dongjoon Hyun
Thank you so much, Holden! :)

On Wed, Jun 10, 2020 at 6:59 PM Hyukjin Kwon  wrote:

> Yay!
>
> 2020년 6월 11일 (목) 오전 10:38, Holden Karau 님이 작성:
>
>> We are happy to announce the availability of Spark 2.4.6!
>>
>> Spark 2.4.6 is a maintenance release containing stability, correctness,
>> and security fixes.
>> This release is based on the branch-2.4 maintenance branch of Spark. We
>> strongly recommend all 2.4 users to upgrade to this stable release.
>>
>> To download Spark 2.4.6, head over to the download page:
>> http://spark.apache.org/downloads.html
>> Spark 2.4.6 is also available in Maven Central, PyPI, and CRAN.
>>
>> Note that you might need to clear your browser cache or
>> to use `Private`/`Incognito` mode according to your browsers.
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-2.4.6.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>


Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Hyukjin Kwon
Yay!

2020년 6월 11일 (목) 오전 10:38, Holden Karau 님이 작성:

> We are happy to announce the availability of Spark 2.4.6!
>
> Spark 2.4.6 is a maintenance release containing stability, correctness,
> and security fixes.
> This release is based on the branch-2.4 maintenance branch of Spark. We
> strongly recommend all 2.4 users to upgrade to this stable release.
>
> To download Spark 2.4.6, head over to the download page:
> http://spark.apache.org/downloads.html
> Spark 2.4.6 is also available in Maven Central, PyPI, and CRAN.
>
> Note that you might need to clear your browser cache or
> to use `Private`/`Incognito` mode according to your browsers.
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-2.4.6.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>


[ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Holden Karau
We are happy to announce the availability of Spark 2.4.6!

Spark 2.4.6 is a maintenance release containing stability, correctness, and
security fixes.
This release is based on the branch-2.4 maintenance branch of Spark. We
strongly recommend all 2.4 users to upgrade to this stable release.

To download Spark 2.4.6, head over to the download page:
http://spark.apache.org/downloads.html
Spark 2.4.6 is also available in Maven Central, PyPI, and CRAN.

Note that you might need to clear your browser cache or
to use `Private`/`Incognito` mode according to your browsers.

To view the release notes:
https://spark.apache.org/releases/spark-release-2.4.6.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.


Does Spark SQL support GRANT/REVOKE operations on Tables?

2020-06-10 Thread Nasrulla Khan Haris
HI Spark users,

I see REVOKE/GRANT operations In list of supported operations but when I run 
the on a table. I see

Error: org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: GRANT(line 1, pos 0)


== SQL ==
GRANT INSERT ON table_priv1 TO USER user2
^^^

  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:41)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1047)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:1038)
  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:1038)
  at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:55)
  at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:782)
  at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:72)
  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:71)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:70)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:100)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  ... 49 elided


Error: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'dir' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 
'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 
'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 
'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 
'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 
'LOAD'}(line 1, pos 0)

== SQL ==



Appreciate your response.

Thanks,
NKH


Broadcast join data reuse

2020-06-10 Thread tcondie
We have a case where data the is small enough to be broadcasted in joined
with multiple tables in a single plan. Looking at the physical plan, I do
not see anything that indicates if the broadcast data is done only once
i.e., the BroadcastExchange is being reused i.i.e., that data is not
redistributed from scratch. Could someone with insight into the physical
plan strategy for such a case confirm whether previous broadcasted data is
reused or if subsequent BroadcastExechange steps are done from scratch. 

 

Thanks and best regards,

Tyson



Accessing Teradata DW data from Spark

2020-06-10 Thread Mich Talebzadeh
Using JDBC drivers much like accessing Oracle data, one can utilise the
power of Spark on Teradata via JDBC drivers.

I have seen connections in some articles which indicates this process is
pretty mature.

My question is if anyone has done this work and how is performance in Spark
vis-a-vis running the same code on Teradata itself. For example in Oracle
one can force parallel processing by using numPartitions

val s = HiveContext.read.format("jdbc").options(
   Map("url" -> _ORACLEserver,
   "dbtable" -> "(SELECT ID FROM scratchpad.dummy4)",
   "partitionColumn" -> "ID",
   "lowerBound" -> minID,
   "upperBound" -> maxID,
   "numPartitions" -> "5",
   "user" -> _username,
   "password" -> _password)).load

As both Oracle & Teradata are data warehouses, this may work. The intention
is to read from Teradata initially as tactical and use Hadoop/Hive/Spark as
strategic.

Obviously the underlying tables reading from Hive compared to Teradata will
be different. However the SQL to fetch, slice and dice data will be similar.

Let me know your thoughts

Thanks



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Issue with pyspark query

2020-06-10 Thread Tzahi File
Hi,

This is a general question regarding moving spark SQL query to PySpark, if
needed I will add some more from the errors log and query syntax.
I'm trying to move a spark SQL query to run through PySpark.
The query syntax and spark configuration are the same.
For some reason the query failed to run through PySpark with an java heap
space error.
In the Spark SQL query I'm using insert overwrite partition, while in
pyspark I'm using DF to write the data to a specific location in S3.

Are there any differences in the configuration that you might think I need
to change?


Thanks,

-- 
Tzahi
Data Engineer