Re: Override jars in spark submit

2019-06-19 Thread Keith Chapman
Hi Naresh,

You could use "--conf spark.driver.extraClassPath=". Note
that the jar will not be shipped to the executors, if its a class that is
needed on the executors as well you should provide "--conf
spark.executor.extraClassPath=". Note that if you do
provide executor extraclasspath the jar file needs to be present on all the
executors.

Regards,
Keith.

http://keith-chapman.com


On Wed, Jun 19, 2019 at 8:57 PM naresh Goud 
wrote:

> Hello All,
>
> How can we override jars in spark submit?
> We have hive-exec-spark jar which is available as part of default spark
> cluster jars.
> We wanted to override above mentioned jar in spark submit with latest
> version jar.
> How do we do that ?
>
>
> Thank you,
> Naresh
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>


connecting spark with mysql

2019-06-19 Thread ya
Hi everyone,

I tried to manipulate MySQL tables from spark, I do not want to move these 
tables from MySQL to spark, as these tables can easily get very big. It is 
ideal that the data stays in the database where it was stored. For me, spark is 
only used to speed up the read and write process (as I am more a data analyst 
rather than an application developer). So I did not install hadoop. People here 
have helped me a lot, but I still cannot connect MySQL to spark, possible 
reasons are, for instance, java version, java files location, connector files 
location, MySQL version, environment variable location, the use of jdbc or 
odbc, and so on. My questions are:

1. Do we need to install hadoop and java before installing spark?

2. Which version of each of these package are stable for successful 
installation and connection, if anyone had any possible experience? (the 
solutions online might worked on older version of these packages, but seems not 
working anymore in my case, I’m on mac by the way).

3. So far, the only way I tried successfully is to utilize the sqldf package on 
SparkR to connect MySQL, but does it mean that spark is working (to speed up 
the process) when I run the sql queries with sqldf package on SparkR? 

I hope I described my questions clearly. Thank you very much for the help.

Best regards,

YA

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark SQL

2019-06-19 Thread naresh Goud
Just to make it more clear,  Spark sql uses hive metastore and run queries
using its own engine and not uses hive execution engine.

Please correct me if it’s not true.



On Mon, Jun 10, 2019 at 2:29 PM Russell Spitzer 
wrote:

> Spark can use the HiveMetastore as a catalog, but it doesn't use the hive
> parser or optimization engine. Instead it uses Catalyst, see
> https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
>
> On Mon, Jun 10, 2019 at 2:07 PM naresh Goud 
> wrote:
>
>> Hi Team,
>>
>> Is Spark Sql uses hive engine to run queries ?
>> My understanding that spark sql uses hive meta store to get metadata
>> information to run queries.
>>
>> Thank you,
>> Naresh
>> --
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Override jars in spark submit

2019-06-19 Thread naresh Goud
Hello All,

How can we override jars in spark submit?
We have hive-exec-spark jar which is available as part of default spark
cluster jars.
We wanted to override above mentioned jar in spark submit with latest
version jar.
How do we do that ?


Thank you,
Naresh
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread ayan guha
Hi

We are using Delta features. The only problem we faced till now is Hive can
not read DELTA outputs by itself (even if the Hive metastore is shared).
However, if we create hive external table pointing to the folder (and with
Vacuum), it can read the data.

Other than that, the feature looks good and well thought out. We are doing
a volume testing now

Best
Ayan

On Thu, Jun 20, 2019 at 9:52 AM Liwen Sun  wrote:

> Hi Gourav,
>
> Thanks for the suggestion. Please open a Github issue at
> https://github.com/delta-io/delta/issues to describe your use case and
> requirements for "external tables" so we can better track this feature and
> also get feedback from the community.
>
> Regards,
> Liwen
>
> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>> does Delta support external tables? I think that most users will be
>> needing this.
>>
>>
>> Regards,
>> Gourav
>>
>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Delta Lake Users and Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to delta-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>> 
>>> .
>>>
>>

-- 
Best Regards,
Ayan Guha


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
Hi Gourav,

Thanks for the suggestion. Please open a Github issue at
https://github.com/delta-io/delta/issues to describe your use case and
requirements for "external tables" so we can better track this feature and
also get feedback from the community.

Regards,
Liwen

On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta 
wrote:

> Hi,
>
> does Delta support external tables? I think that most users will be
> needing this.
>
>
> Regards,
> Gourav
>
> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun 
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Delta Lake Users and Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to delta-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>> 
>> .
>>
>


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi,

does Delta support external tables? I think that most users will be needing
this.


Regards,
Gourav

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun  wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> 
> .
>


Re: Announcing Delta Lake 0.2.0

2019-06-19 Thread Gourav Sengupta
Hi,

this is fantastic :)

Regards,
Gourav Sengupta

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun  wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> 
> .
>


Announcing Delta Lake 0.2.0

2019-06-19 Thread Liwen Sun
We are delighted to announce the availability of Delta Lake 0.2.0!

To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
https://docs.delta.io/0.2.0/quick-start.html

To view the release notes:
https://github.com/delta-io/delta/releases/tag/v0.2.0

This release introduces two main features:

*Cloud storage support*
In addition to HDFS, you can now configure Delta Lake to read and write
data on cloud storage services such as Amazon S3 and Azure Blob Storage.
For configuration instructions, please see:
https://docs.delta.io/0.2.0/delta-storage.html

*Improved concurrency*
Delta Lake now allows concurrent append-only writes while still ensuring
serializability. For concurrency control in Delta Lake, please see:
https://docs.delta.io/0.2.0/delta-concurrency.html

We have also greatly expanded the test coverage as part of this release.

We would like to acknowledge all community members for contributing to this
release.

Best regards,
Liwen Sun


pyspark cached dataframe shows deserialized at StorageLevel

2019-06-19 Thread Mitsutoshi Kiuchi

Hi,

Spark document describes "Since the data is always serialized on the  
Python side, all the constants use the serialized formats.".


http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.StorageLevel

But when I cached dataframe and looked StorageLevel, it shows that  
cached dataframe is deserialized. It looks wrong behavior against  
document. but does anyone please comment about it ?


Here is code to reproduce.

from pyspark.sql.functions import col, length
source = [
("frighten",), ("watch",), ("dish",),
("reflect",), ("serious",), ("summer",),
("embrace",), ("transition",), ("Venus",)]
dfa = spark.createDataFrame(source, ["word"]).withColumn("num",  
length(col("word")))

dfa2 = dfa.cache()

dfa.storageLevel  >>> shows "StorageLevel(True, True, False, True, 1)"
dfa2.storageLevel >>> same as above


Thanks for reading!

Mitsutoshi Kiuchi

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: tcps oracle connection from spark

2019-06-19 Thread Luca Canali
Connecting to Oracle from Spark using the TPCS protocol works OK for me.
Maybe try to turn debug on with -Djavax.net.debug=all?
See also:
https://blogs.oracle.com/dev2dev/ssl-connection-to-oracle-db-using-jdbc%2c-tlsv12%2c-jks-or-oracle-wallets

Regards,
L.

From: Richard Xin 
Sent: Wednesday, June 19, 2019 00:51
To: User 
Subject: Re: tcps oracle connection from spark

and btw, same connection string works fine when used in SQL Developer.

On Tuesday, June 18, 2019, 03:49:24 PM PDT, Richard Xin 
mailto:richardxin...@yahoo.com>> wrote:


HI, I need help with tcps oracle connection from spark (version: 
spark-2.4.0-bin-hadoop2.7)


Properties prop = new Properties();
prop.putAll(sparkOracle);  // username/password

prop.put("javax.net.ssl.trustStore", "path to root.jks");
prop.put("javax.net.ssl.trustStorePassword", "password_here");

df.write()
.mode(SaveMode.Append)
.option("driver", "oracle.jdbc.driver.OracleDriver")
.jdbc("jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcps)(HOST=host.mycomapny.com)(PORT=1234)))(CONNECT_DATA=(SERVICE_NAME=service_name)))","tableName",
 prop)
;


note "PROTOCOL=tcps" in the connection string.

The code worked fine for "tcp" hosts, but some of our servers use "tcps" only, 
I got following errors when hitting oracld tcps hosts, can someone shed some 
light? Thanks a lot!

Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Remote 
host terminated the handshake
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:682)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:715)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:385)
at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:30)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:564)
at 
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:48)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:506)
at 
com.apple.jmet.pallas.data_migration.DirectMigrationWConfig.main(DirectMigrationWConfig.java:103)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at 

[webinar] TFX Chicago Taxi example on Mini Kubeflow (MiniKF)

2019-06-19 Thread Chris Pavlou

Hi all,

I would like to invite you to our webinar "Kubeflow Pipelines on-prem". 
It will take place on Friday, June 21 at 9am Pacific Time. You can 
register here:


https://zoom.us/webinar/register/WN_j_HJbkISTluMckyyr706eg

We are going to demonstrate the end-to-end TFX Chicago Taxi example 
running on-prem using MiniKF. MiniKF is a single-node deployment of 
Kubeflow that you can install on-prem or on your laptop.


https://www.kubeflow.org/docs/started/getting-started-minikf/

Looking forward to seeing you on Friday!

Best,
Chris

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2019-06-19 Thread Tushar Marne
-- 
Tushar Marne
9011062432


Re: Ask for ARM CI for spark

2019-06-19 Thread Tianhua huang
Thanks for your reply.

As I said before, I met some problem of build or test for spark on aarch64
server, so it will be better to have the ARM CI to make sure the spark
is compatible
for AArch64 platforms.

I’m from OpenLab team(https://openlabtesting.org/ ,a community to do open
source project testing. And we can support some Arm virtual machines to
AMPLab Jenkins, and also we have a developer team that willing to work on
this, we willing to maintain build CI jobs and address the CI issues.
What do you think?


Thanks for your attention.

On Wed, Jun 19, 2019 at 6:39 AM shane knapp  wrote:

> yeah, we don't have any aarch64 systems for testing...  this has been
> asked before but is currently pretty low on our priority list as we don't
> have the hardware.
>
> sorry,
>
> shane
>
> On Mon, Jun 10, 2019 at 7:08 PM Tianhua huang 
> wrote:
>
>> Hi, sorry to disturb you.
>> The CI testing for apache spark is supported by AMPLab Jenkins, and I
>> find there are some computers(most of them are Linux (amd64) arch) for
>> the CI development, but seems there is no Aarch64 computer for spark CI
>> testing. Recently, I build and run test for spark(master and branch-2.4) on
>> my arm server, and unfortunately there are some problems, for example, ut
>> test is failed due to a LEVELDBJNI native package, the details for java
>> test see http://paste.openstack.org/show/752063/ and python test see
>> http://paste.openstack.org/show/752709/
>> So I have a question about the ARM CI testing for spark, is there any
>> plan to support it? Thank you very much and I will wait for your reply!
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


RE: Unable to run simple spark-sql

2019-06-19 Thread Nirmal Kumar
Hi Raymond,

I cross checked hive/conf/hive-site.xml and spark2/conf/hive-site.xml
Same value is being shown by Ambari Hive config.
Seems correct value here:

  
  hive.metastore.warehouse.dir
  /apps/hive/warehouse
 

Problem :
Spark trying to create a local directory under the home directory of hive user 
(/home/hive/).
Why is it referring the local file system and from where?

Thanks,
Nirmal

From: Raymond Honderdors 
Sent: 19 June 2019 11:18
To: Nirmal Kumar 
Cc: user 
Subject: Re: Unable to run simple spark-sql

Hi Nirmal,
i came across the following article 
"https://stackoverflow.com/questions/47497003/why-is-hive-creating-tables-in-the-local-file-system"
(and an updated ref link : 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration)
you should check "hive.metastore.warehouse.dir" in hive config files


On Tue, Jun 18, 2019 at 8:09 PM Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>> wrote:
Just an update on the thread that it's kerberized.

I'm trying to execute the query with a different user xyz not hive.
Because seems like some permission issue the user xyz trying creating directory 
in /home/hive directory

Do i need some impersonation setting?

Thanks,
Nirmal

Get Outlook for 
Android>


From: Nirmal Kumar
Sent: Tuesday, June 18, 2019 5:56:06 PM
To: Raymond Honderdors; Nirmal Kumar
Cc: user
Subject: RE: Unable to run simple spark-sql

Hi Raymond,

Permission on hdfs is 777
drwxrwxrwx   - impadmin hdfs  0 2019-06-13 16:09 
/home/hive/spark-warehouse


But it’s pointing to a local file system:
Exception in thread "main" java.lang.IllegalStateException: Cannot create 
staging directory  
'file:/home/hive/spark-warehouse/testdb.db/employee_orc/.hive-staging_hive_2019-06-18_16-08-21_448_1691186175028734135-1'

Thanks,
-Nirmal


From: Raymond Honderdors 
mailto:raymond.honderd...@sizmek.com>>
Sent: 18 June 2019 17:52
To: Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>.invalid>
Cc: user mailto:user@spark.apache.org>>
Subject: Re: Unable to run simple spark-sql

Hi
Can you check the permission of the user running spark
On the hdfs folder where it tries to create the table

On Tue, Jun 18, 2019, 15:05 Nirmal Kumar 
mailto:nirmal.ku...@impetus.co.in>.invalid>>
 wrote:
Hi List,

I tried running the following sample Java code using Spark2 version 2.0.0 on 
YARN (HDP-2.5.0.0)

public class SparkSQLTest {
  public static void main(String[] args) {
SparkSession sparkSession = SparkSession.builder().master("yarn")
.config("spark.sql.warehouse.dir", "/apps/hive/warehouse")
.config("hive.metastore.uris", "thrift://x:9083")
.config("spark.driver.extraJavaOptions", "-Dhdp.version=2.5.0.0-1245")