Re: Unsubscribe

2020-08-26 Thread Annabel Melongo
 Thanks, Stephen
On Wednesday, August 26, 2020, 07:07:05 PM PDT, Stephen Coy 
 wrote:  
 
 The instructions for all Apache mail lists are in the mail headers:

List-Unsubscribe: 




On 27 Aug 2020, at 7:49 am, Jeff Evans  wrote:
That is not how you unsubscribe.  See here for instructions: 
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e
On Wed, Aug 26, 2020, 4:22 PM Annabel Melongo 
 wrote:

Please remove me from the mailing list


This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/  

Re: Unsubscribe

2020-08-26 Thread Stephen Coy
The instructions for all Apache mail lists are in the mail headers:


List-Unsubscribe: 



On 27 Aug 2020, at 7:49 am, Jeff Evans 
mailto:jeffrey.wayne.ev...@gmail.com>> wrote:

That is not how you unsubscribe.  See here for instructions: 
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e

On Wed, Aug 26, 2020, 4:22 PM Annabel Melongo 
mailto:melongo_anna...@yahoo.com.invalid>> 
wrote:
Please remove me from the mailing list

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia's privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: Unsubscribe

2020-08-26 Thread Jeff Evans
That is not how you unsubscribe.  See here for instructions:
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e

On Wed, Aug 26, 2020, 4:22 PM Annabel Melongo
 wrote:

> Please remove me from the mailing list
>


Unsubscribe

2020-08-26 Thread Annabel Melongo
Please remove me from the mailing list

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
And this is a test using Oracle supplied JAVA script DataSourceSample.java
with slight amendment for login/password and table. it connects ok

hduser@rhes76: /home/hduser/dba/bin/ADW/src> javac -classpath
./ojdbc8.jar:. DataSourceSample.java
hduser@rhes76: /home/hduser/dba/bin/ADW/src> java -classpath ./ojdbc8.jar:.
DataSourceSample
AArray = [B@57d5872c
AArray = [B@667a738
AArray = [B@2145433b
Driver Name: Oracle JDBC driver
Driver Version: 18.3.0.0.0
Default Row Prefetch Value is: 20
Database Username is: SCRATCHPAD

DATETAKEN  WEIGHT
-
2017-09-07 07:22:09 74.7
2017-09-08 07:26:18 74.8
2017-09-09 07:15:53 75
2017-09-10 07:53:30 75.9
2017-09-11 07:21:49 75.8
2017-09-12 07:31:27 75.6
2017-09-26 07:11:26 75.4
2017-09-27 07:22:48 75.6
2017-09-28 07:15:52 75.4
2017-09-29 07:30:40 74.9



Regards,


LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:58, Mich Talebzadeh 
wrote:

> Hi Kuassi,
>
> This is the error. Only test running on local mode
>
> scala> val driverName = "oracle.jdbc.OracleDriver"
> driverName: String = oracle.jdbc.OracleDriver
>
> scala> var url = "jdbc:oracle:thin:@mydb_high
> ?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
> url: String = jdbc:oracle:thin:@mydb_high
> ?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess
> scala> var _username = "scratchpad"
> _username: String = scratchpad
> scala> var _password = "xx"  -- no special characters
> _password: String = xxx
> scala> var _dbschema = "SCRATCHPAD"
> _dbschema: String = SCRATCHPAD
> scala> var _dbtable = "LL_18201960"
> _dbtable: String = LL_18201960
> scala> var e:SQLException = null
> e: java.sql.SQLException = null
> scala> var connection:Connection = null
> connection: java.sql.Connection = null
> scala> var metadata:DatabaseMetaData = null
> metadata: java.sql.DatabaseMetaData = null
> scala> val prop = new java.util.Properties
> prop: java.util.Properties = {}
> scala> prop.setProperty("user", _username)
> res1: Object = null
> scala> prop.setProperty("password",_password)
> res2: Object = null
> scala> // Check Oracle is accessible
>
> *scala> try {*
> * |   connection = DriverManager.getConnection(url, _username,
> _password)*
> * | } catch {*
> * |   case e: SQLException => e.printStackTrace*
> * |   connection.close()*
> * | }*
> *java.sql.SQLRecoverableException: IO Error: Invalid connection string
> format, a valid format is: "host:port:sid"*
> at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
> at
> oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
> at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
> at
> oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
> at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
>
> Is this related to Oracle or Spark? Do I need to set up another connection
> parameter etc?
>
>
>
> Cheers
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 26 Aug 2020 at 21:09,  wrote:
>
>> Mich,
>>
>> All looks fine.
>> Perhaps some special chars in username or password?
>>
>> it is recommended not to use such characters like '@', '.' in your
>> password.
>>
>> Best, Kuassi
>>
>> On 8/26/20 12:52 PM, Mich Talebzadeh wrote:
>>
>> Thanks Kuassi.
>>
>> This is the version of jar file that work OK with JDBC connection via
>> JAVA to ADW
>>
>> unzip -p ojdbc8.jar META-INF/MANIFEST.MF
>> Manifest-Version: 1.0
>> Implementation-Title: JDBC
>> *Implementation-Version: 18.3.0.0.0*
>> sealed: true
>> Specification-Vendor: Sun Microsystems Inc.
>> Specification-Title: JDBC
>> Class-Path: oraclepki.jar
>> Implementation-Vendor: Oracle Corporation
>> Main-Class: oracle.jdbc.OracleDriver
>> Ant-Version: Apache Ant 1.7.1
>> Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
>> Created-By: 25.171-b11 (Oracle Corporation)
>> Specification-Version: 4.0
>>
>> And this the setting for TNS_ADMIN
>>
>> e*cho ${TNS_ADMIN}*
>> */home/hduser/dba/bin/ADW/DBAccess*
>>
>> hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
>> *# Connection property while using Oracle wallets.*
>>

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
Hi Kuassi,

This is the error. Only test running on local mode

scala> val driverName = "oracle.jdbc.OracleDriver"
driverName: String = oracle.jdbc.OracleDriver

scala> var url = "jdbc:oracle:thin:@mydb_high
?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
url: String = jdbc:oracle:thin:@mydb_high
?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "xx"  -- no special characters
_password: String = xxx
scala> var _dbschema = "SCRATCHPAD"
_dbschema: String = SCRATCHPAD
scala> var _dbtable = "LL_18201960"
_dbtable: String = LL_18201960
scala> var e:SQLException = null
e: java.sql.SQLException = null
scala> var connection:Connection = null
connection: java.sql.Connection = null
scala> var metadata:DatabaseMetaData = null
metadata: java.sql.DatabaseMetaData = null
scala> val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user", _username)
res1: Object = null
scala> prop.setProperty("password",_password)
res2: Object = null
scala> // Check Oracle is accessible

*scala> try {*
* |   connection = DriverManager.getConnection(url, _username,
_password)*
* | } catch {*
* |   case e: SQLException => e.printStackTrace*
* |   connection.close()*
* | }*
*java.sql.SQLRecoverableException: IO Error: Invalid connection string
format, a valid format is: "host:port:sid"*
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
at
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
at
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
at java.sql.DriverManager.getConnection(DriverManager.java:664)

Is this related to Oracle or Spark? Do I need to set up another connection
parameter etc?



Cheers


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:09,  wrote:

> Mich,
>
> All looks fine.
> Perhaps some special chars in username or password?
>
> it is recommended not to use such characters like '@', '.' in your
> password.
>
> Best, Kuassi
>
> On 8/26/20 12:52 PM, Mich Talebzadeh wrote:
>
> Thanks Kuassi.
>
> This is the version of jar file that work OK with JDBC connection via JAVA
> to ADW
>
> unzip -p ojdbc8.jar META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Implementation-Title: JDBC
> *Implementation-Version: 18.3.0.0.0*
> sealed: true
> Specification-Vendor: Sun Microsystems Inc.
> Specification-Title: JDBC
> Class-Path: oraclepki.jar
> Implementation-Vendor: Oracle Corporation
> Main-Class: oracle.jdbc.OracleDriver
> Ant-Version: Apache Ant 1.7.1
> Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
> Created-By: 25.171-b11 (Oracle Corporation)
> Specification-Version: 4.0
>
> And this the setting for TNS_ADMIN
>
> e*cho ${TNS_ADMIN}*
> */home/hduser/dba/bin/ADW/DBAccess*
>
> hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
> *# Connection property while using Oracle wallets.*
>
> *oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
> *# FOLLOW THESE STEPS FOR USING JKS*
> *# (1) Uncomment the following properties to use JKS.*
> *# (2) Comment out the oracle.net.wallet_location property above*
> *# (3) Set the correct password for both trustStorePassword and
> keyStorePassword.*
> *# It's the password you specified when downloading the wallet from OCI
> Console or the Service Console.*
> *#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
> *#javax.net.ssl.trustStorePassword=*
> *#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
> *#javax.net.ssl.keyStorePassword=hduser@rhes76:
> /home/hduser/dba/bin/ADW/DBAccess>*
>
> Regards,
>
> Mich
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 26 Aug 2020 at 20:16,  wrote:
>
>> Hi,
>>
>> From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd recommend
>> ojdbc8.jar from the latest release.
>> One more thing to pay 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah
Fwiw here are our write-ups on Java connectivity to Database Cloud 
Services: 
https://www.oracle.com/database/technologies/appdev/jdbc-db-cloud.html


Kuassi

On 8/26/20 1:50 PM, Mich Talebzadeh wrote:

Thanks Jorn,

Only running in REPL in local mode

This works fine connecting with ojdbc6.jar to Oracle 12c.

cheers



LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:39, Jörn Franke > wrote:


Is the directory available on all nodes ?


Am 26.08.2020 um 22:08 schrieb kuassi.men...@oracle.com
:



Mich,

All looks fine.
Perhaps some special chars in username or password?


it is recommended not to use such characters like '@', '.' in
your password.

Best, Kuassi
On 8/26/20 12:52 PM, Mich Talebzadeh wrote:

Thanks Kuassi.

This is the version of jar file that work OK with JDBC
connection via JAVA to ADW

unzip -p ojdbc8.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: JDBC
*Implementation-Version: 18.3.0.0.0*
sealed: true
Specification-Vendor: Sun Microsystems Inc.
Specification-Title: JDBC
Class-Path: oraclepki.jar
Implementation-Vendor: Oracle Corporation
Main-Class: oracle.jdbc.OracleDriver
Ant-Version: Apache Ant 1.7.1
Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
Created-By: 25.171-b11 (Oracle Corporation)
Specification-Version: 4.0

And this the setting for TNS_ADMIN

e*cho ${TNS_ADMIN}*
*/home/hduser/dba/bin/ADW/DBAccess*

hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat
ojdbc.properties*
*# Connection property while using Oracle wallets.*

*oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
*# FOLLOW THESE STEPS FOR USING JKS*
*# (1) Uncomment the following properties to use JKS.*
*# (2) Comment out the oracle.net.wallet_location property above*
*# (3) Set the correct password for both trustStorePassword and
keyStorePassword.*
*# It's the password you specified when downloading the wallet
from OCI Console or the Service Console.*
*#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
*#javax.net.ssl.trustStorePassword=*
*#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
*#javax.net.ssl.keyStorePassword=hduser@rhes76:
/home/hduser/dba/bin/ADW/DBAccess>*

Regards,

Mich

LinkedIn

/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

/



*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other
property which may arise from relying on this
email's technical content is explicitly disclaimed. The author
will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On Wed, 26 Aug 2020 at 20:16, mailto:kuassi.men...@oracle.com>> wrote:

Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c?
I'd recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to
use Oracle Wallet, as follows (i.e., anything related to JKS
commented out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to
comment out the firts line and un-comment the other lines
and set the values.

Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well
established using ojdb6.jar.

I am attempting to connect to Oracle Autonomous Data
warehouse (ADW) version

*Oracle Database 19c 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
Thanks Jorn,

Only running in REPL in local mode

This works fine connecting with ojdbc6.jar to Oracle 12c.

cheers



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 26 Aug 2020 at 21:39, Jörn Franke  wrote:

> Is the directory available on all nodes ?
>
> Am 26.08.2020 um 22:08 schrieb kuassi.men...@oracle.com:
>
> 
>
> Mich,
>
> All looks fine.
> Perhaps some special chars in username or password?
>
> it is recommended not to use such characters like '@', '.' in your
> password.
>
> Best, Kuassi
>
> On 8/26/20 12:52 PM, Mich Talebzadeh wrote:
>
> Thanks Kuassi.
>
> This is the version of jar file that work OK with JDBC connection via JAVA
> to ADW
>
> unzip -p ojdbc8.jar META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Implementation-Title: JDBC
> *Implementation-Version: 18.3.0.0.0*
> sealed: true
> Specification-Vendor: Sun Microsystems Inc.
> Specification-Title: JDBC
> Class-Path: oraclepki.jar
> Implementation-Vendor: Oracle Corporation
> Main-Class: oracle.jdbc.OracleDriver
> Ant-Version: Apache Ant 1.7.1
> Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
> Created-By: 25.171-b11 (Oracle Corporation)
> Specification-Version: 4.0
>
> And this the setting for TNS_ADMIN
>
> e*cho ${TNS_ADMIN}*
> */home/hduser/dba/bin/ADW/DBAccess*
>
> hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
> *# Connection property while using Oracle wallets.*
>
> *oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
> *# FOLLOW THESE STEPS FOR USING JKS*
> *# (1) Uncomment the following properties to use JKS.*
> *# (2) Comment out the oracle.net.wallet_location property above*
> *# (3) Set the correct password for both trustStorePassword and
> keyStorePassword.*
> *# It's the password you specified when downloading the wallet from OCI
> Console or the Service Console.*
> *#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
> *#javax.net.ssl.trustStorePassword=*
> *#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
> *#javax.net.ssl.keyStorePassword=hduser@rhes76:
> /home/hduser/dba/bin/ADW/DBAccess>*
>
> Regards,
>
> Mich
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 26 Aug 2020 at 20:16,  wrote:
>
>> Hi,
>>
>> From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd recommend
>> ojdbc8.jar from the latest release.
>> One more thing to pay attention to is the content of the ojdbc.properties
>> file (part of the unzipped wallet)
>> Make sure that ojdbc.properties file has been configured to use Oracle
>> Wallet, as follows (i.e., anything related to JKS commented out)
>>
>>
>> *oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
>> *#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
>> *#javax.net.ssl.trustStorePassword=*
>> *#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
>> *#javax.net.ssl.keyStorePassword=*
>>
>> Alternatively, if you want to use JKS< then you need to comment out the
>> firts line and un-comment the other lines and set the values.
>>
>> Kuassi
>> On 8/26/20 11:58 AM, Mich Talebzadeh wrote:
>>
>> Hi,
>>
>> The connection from Spark to Oracle 12c etc are well established using
>> ojdb6.jar.
>>
>> I am attempting to connect to Oracle Autonomous Data warehouse (ADW)
>> version
>>
>> *Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*
>>
>> Oracle document suggest using ojdbc8.jar
>> 
>>  to
>> connect to the database with the following URL format using Oracle Wallet
>>
>> "jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
>>
>> This works fine through JAVA itself but throws an error with
>> Spark version 2.4.3.
>>
>> The connection string is defined as follows
>>
>> val url = "jdbc:oracle:thin:@mydb_high
>> ?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
>>
>> 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Jörn Franke
Is the directory available on all nodes ?

> Am 26.08.2020 um 22:08 schrieb kuassi.men...@oracle.com:
> 
> 
> Mich,
> 
> All looks fine.
> Perhaps some special chars in username or password?
> 
>> it is recommended not to use such characters like '@', '.' in your password.
> Best, Kuassi
> On 8/26/20 12:52 PM, Mich Talebzadeh wrote:
>> Thanks Kuassi.
>> 
>> This is the version of jar file that work OK with JDBC connection via JAVA 
>> to ADW
>> 
>> unzip -p ojdbc8.jar META-INF/MANIFEST.MF
>> Manifest-Version: 1.0
>> Implementation-Title: JDBC
>> Implementation-Version: 18.3.0.0.0
>> sealed: true
>> Specification-Vendor: Sun Microsystems Inc.
>> Specification-Title: JDBC
>> Class-Path: oraclepki.jar
>> Implementation-Vendor: Oracle Corporation
>> Main-Class: oracle.jdbc.OracleDriver
>> Ant-Version: Apache Ant 1.7.1
>> Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
>> Created-By: 25.171-b11 (Oracle Corporation)
>> Specification-Version: 4.0
>> 
>> And this the setting for TNS_ADMIN
>> 
>> echo ${TNS_ADMIN}
>> /home/hduser/dba/bin/ADW/DBAccess
>> 
>> hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> cat ojdbc.properties
>> # Connection property while using Oracle wallets.
>> oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))
>> # FOLLOW THESE STEPS FOR USING JKS
>> # (1) Uncomment the following properties to use JKS.
>> # (2) Comment out the oracle.net.wallet_location property above
>> # (3) Set the correct password for both trustStorePassword and 
>> keyStorePassword.
>> # It's the password you specified when downloading the wallet from OCI 
>> Console or the Service Console.
>> #javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks
>> #javax.net.ssl.trustStorePassword=
>> #javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks
>> #javax.net.ssl.keyStorePassword=hduser@rhes76: 
>> /home/hduser/dba/bin/ADW/DBAccess>
>> 
>> Regards,
>> 
>> Mich
>> 
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>> On Wed, 26 Aug 2020 at 20:16,  wrote:
>>> Hi,
>>> 
>>> From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd recommend 
>>> ojdbc8.jar from the latest release.
>>> One more thing to pay attention to is the content of the ojdbc.properties 
>>> file (part of the unzipped wallet)
>>> Make sure that ojdbc.properties file has been configured to use Oracle 
>>> Wallet, as follows (i.e., anything related to JKS commented out)
>>> 
>>> oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))
>>> #javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks
>>> #javax.net.ssl.trustStorePassword=
>>> #javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks
>>> #javax.net.ssl.keyStorePassword=
>>> 
>>> Alternatively, if you want to use JKS< then you need to comment out the 
>>> firts line and un-comment the other lines and set the values.
>>> 
>>> Kuassi
>>> 
>>> On 8/26/20 11:58 AM, Mich Talebzadeh wrote:
 Hi,
 
 The connection from Spark to Oracle 12c etc are well established using 
 ojdb6.jar.
 
 I am attempting to connect to Oracle Autonomous Data warehouse (ADW) 
 version 
 
 Oracle Database 19c Enterprise Edition Release 19.0.0.0.0
 
 Oracle document suggest using ojdbc8.jar to connect to the database with 
 the following URL format using Oracle Wallet
 
 "jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
 
 This works fine through JAVA itself but throws an error with Spark version 
 2.4.3.
 
 The connection string is defined as follows
 
 val url = 
 "jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
 
 where DBAcess directory is the unzipped wallet for Wallet_mydb.zip as 
 created by ADW connection.
 
 The thing is that this works through normal connection via java code.using 
 the same URL
 
 So the question is whether there is a dependency in Spark JDBC connection 
 to the ojdbc.
 
 The error I am getting is:
 
 java.sql.SQLRecoverableException: IO Error: Invalid connection string 
 format, a valid format is: "host:port:sid"
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
 at 
 oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
 at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
 at 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah

Mich,

All looks fine.
Perhaps some special chars in username or password?

it is recommended not to use such characters like '@', '.' in your 
password.

Best, Kuassi

On 8/26/20 12:52 PM, Mich Talebzadeh wrote:

Thanks Kuassi.

This is the version of jar file that work OK with JDBC connection via 
JAVA to ADW


unzip -p ojdbc8.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: JDBC
*Implementation-Version: 18.3.0.0.0*
sealed: true
Specification-Vendor: Sun Microsystems Inc.
Specification-Title: JDBC
Class-Path: oraclepki.jar
Implementation-Vendor: Oracle Corporation
Main-Class: oracle.jdbc.OracleDriver
Ant-Version: Apache Ant 1.7.1
Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
Created-By: 25.171-b11 (Oracle Corporation)
Specification-Version: 4.0

And this the setting for TNS_ADMIN

e*cho ${TNS_ADMIN}*
*/home/hduser/dba/bin/ADW/DBAccess*

hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
*# Connection property while using Oracle wallets.*
*oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
*# FOLLOW THESE STEPS FOR USING JKS*
*# (1) Uncomment the following properties to use JKS.*
*# (2) Comment out the oracle.net.wallet_location property above*
*# (3) Set the correct password for both trustStorePassword and 
keyStorePassword.*
*# It's the password you specified when downloading the wallet from 
OCI Console or the Service Console.*

*#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
*#javax.net.ssl.trustStorePassword=*
*#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
*#javax.net.ssl.keyStorePassword=hduser@rhes76: 
/home/hduser/dba/bin/ADW/DBAccess>*


Regards,

Mich

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




On Wed, 26 Aug 2020 at 20:16, > wrote:


Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd
recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to use
Oracle Wallet, as follows (i.e., anything related to JKS commented
out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to comment
out the firts line and un-comment the other lines and set the values.

Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well established
using ojdb6.jar.

I am attempting to connect to Oracle Autonomous Data warehouse
(ADW) version

*Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*

Oracle document suggest using ojdbc8.jar


 to
connect to the database with the following URL format using
Oracle Wallet

"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

This works fine through JAVA itself but throws an error with
Spark version 2.4.3.

The connection string is defined as follows

val url =
"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

where DBAcess directory is the unzipped wallet for
Wallet_mydb.zip as created by ADW connection.

The thing is that this works through normal connection via java
code.using the same URL

So the question is whether there is a dependency in Spark JDBC
connection to the ojdbc.

The error I am getting is:

java.sql.SQLRecoverableException: IO Error: Invalid connection
string format, a valid format is: "host:port:sid"
        at
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
        at
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
        at
oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
        at

oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
        at

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
Thanks Kuassi.

This is the version of jar file that work OK with JDBC connection via JAVA
to ADW

unzip -p ojdbc8.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: JDBC
*Implementation-Version: 18.3.0.0.0*
sealed: true
Specification-Vendor: Sun Microsystems Inc.
Specification-Title: JDBC
Class-Path: oraclepki.jar
Implementation-Vendor: Oracle Corporation
Main-Class: oracle.jdbc.OracleDriver
Ant-Version: Apache Ant 1.7.1
Repository-Id: JAVAVM_18.1.0.0.0_LINUX.X64_180620
Created-By: 25.171-b11 (Oracle Corporation)
Specification-Version: 4.0

And this the setting for TNS_ADMIN

e*cho ${TNS_ADMIN}*
*/home/hduser/dba/bin/ADW/DBAccess*

hduser@rhes76: /home/hduser/dba/bin/ADW/DBAccess> *cat ojdbc.properties*
*# Connection property while using Oracle wallets.*
*oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
*# FOLLOW THESE STEPS FOR USING JKS*
*# (1) Uncomment the following properties to use JKS.*
*# (2) Comment out the oracle.net.wallet_location property above*
*# (3) Set the correct password for both trustStorePassword and
keyStorePassword.*
*# It's the password you specified when downloading the wallet from OCI
Console or the Service Console.*
*#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
*#javax.net.ssl.trustStorePassword=*
*#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
*#javax.net.ssl.keyStorePassword=hduser@rhes76:
/home/hduser/dba/bin/ADW/DBAccess>*

Regards,

Mich

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 26 Aug 2020 at 20:16,  wrote:

> Hi,
>
> From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd recommend
> ojdbc8.jar from the latest release.
> One more thing to pay attention to is the content of the ojdbc.properties
> file (part of the unzipped wallet)
> Make sure that ojdbc.properties file has been configured to use Oracle
> Wallet, as follows (i.e., anything related to JKS commented out)
>
>
> *oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))*
> *#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks*
> *#javax.net.ssl.trustStorePassword=*
> *#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks*
> *#javax.net.ssl.keyStorePassword=*
>
> Alternatively, if you want to use JKS< then you need to comment out the
> firts line and un-comment the other lines and set the values.
>
> Kuassi
> On 8/26/20 11:58 AM, Mich Talebzadeh wrote:
>
> Hi,
>
> The connection from Spark to Oracle 12c etc are well established using
> ojdb6.jar.
>
> I am attempting to connect to Oracle Autonomous Data warehouse (ADW)
> version
>
> *Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*
>
> Oracle document suggest using ojdbc8.jar
> 
>  to
> connect to the database with the following URL format using Oracle Wallet
>
> "jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
>
> This works fine through JAVA itself but throws an error with Spark version
> 2.4.3.
>
> The connection string is defined as follows
>
> val url = "jdbc:oracle:thin:@mydb_high
> ?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"
>
> where DBAcess directory is the unzipped wallet for Wallet_mydb.zip as
> created by ADW connection.
>
> The thing is that this works through normal connection via java code.using
> the same URL
>
> So the question is whether there is a dependency in Spark JDBC connection
> to the ojdbc.
>
> The error I am getting is:
>
> java.sql.SQLRecoverableException: IO Error: Invalid connection string
> format, a valid format is: "host:port:sid"
> at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
> at
> oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
> at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
> at
> oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
> at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
>
> This Oracle doc
> 
> explains the connectivity.
>
> The unzipped wallet has the followiing files
>
>  ls DBAccess/
> README  cwallet.sso  ewallet.p12  keystore.jks  ojdbc.properties
> sqlnet.ora  tnsnames.ora  truststore.jks
>
>
> Thanks
>
> 

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread kuassi . mensah

Hi,

From which release is the ojdbc8.jar from? 12c, 18c or 19c? I'd 
recommend ojdbc8.jar from the latest release.
One more thing to pay attention to is the content of the 
ojdbc.properties file (part of the unzipped wallet)
Make sure that ojdbc.properties file has been configured to use Oracle 
Wallet, as follows (i.e., anything related to JKS commented out)


/oracle.net.wallet_location=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=${TNS_ADMIN})))//
//#javax.net.ssl.trustStore=${TNS_ADMIN}/truststore.jks//
//#javax.net.ssl.trustStorePassword=//
//#javax.net.ssl.keyStore=${TNS_ADMIN}/keystore.jks//
//#javax.net.ssl.keyStorePassword=/

Alternatively, if you want to use JKS< then you need to comment out the 
firts line and un-comment the other lines and set the values.


Kuassi

On 8/26/20 11:58 AM, Mich Talebzadeh wrote:

Hi,

The connection from Spark to Oracle 12c etc are well established using 
ojdb6.jar.


I am attempting to connect to Oracle Autonomous Data warehouse (ADW) 
version


*Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*

Oracle document suggest using ojdbc8.jar 
 to 
connect to the database with the following URL format using Oracle Wallet


"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

This works fine through JAVA itself but throws an error with 
Spark version 2.4.3.


The connection string is defined as follows

val url = 
"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"


where DBAcess directory is the unzipped wallet for Wallet_mydb.zip as 
created by ADW connection.


The thing is that this works through normal connection via java 
code.using the same URL


So the question is whether there is a dependency in Spark JDBC 
connection to the ojdbc.


The error I am getting is:

java.sql.SQLRecoverableException: IO Error: Invalid connection string 
format, a valid format is: "host:port:sid"

        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
        at 
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)

        at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
        at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)

        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)

This Oracle doc 
 
explains the connectivity.


The unzipped wallet has the followiing files

 ls DBAccess/
README cwallet.sso  ewallet.p12  keystore.jks ojdbc.properties  
sqlnet.ora  tnsnames.ora truststore.jks



Thanks

Mich



LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw 
/




*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.




Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
Hi,

The connection from Spark to Oracle 12c etc are well established using
ojdb6.jar.

I am attempting to connect to Oracle Autonomous Data warehouse (ADW)
version

*Oracle Database 19c Enterprise Edition Release 19.0.0.0.0*

Oracle document suggest using ojdbc8.jar to connect to the database with
the following URL format using Oracle Wallet

"jdbc:oracle:thin:@mydb_high?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

This works fine through JAVA itself but throws an error with Spark version
2.4.3.

The connection string is defined as follows

val url = "jdbc:oracle:thin:@mydb_high
?TNS_ADMIN=/home/hduser/dba/bin/ADW/DBAccess"

where DBAcess directory is the unzipped wallet for Wallet_mydb.zip as
created by ADW connection.

The thing is that this works through normal connection via java code.using
the same URL

So the question is whether there is a dependency in Spark JDBC connection
to the ojdbc.

The error I am getting is:

java.sql.SQLRecoverableException: IO Error: Invalid connection string
format, a valid format is: "host:port:sid"
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
at
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
at
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
at java.sql.DriverManager.getConnection(DriverManager.java:664)

This Oracle doc

explains the connectivity.

The unzipped wallet has the followiing files

 ls DBAccess/
README  cwallet.sso  ewallet.p12  keystore.jks  ojdbc.properties
sqlnet.ora  tnsnames.ora  truststore.jks


Thanks

Mich



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
Hi
Can you try using emrfs?
Your study looks good best of luck.

Regards
Gourav

On Wed, 26 Aug 2020, 12:37 Rao, Abhishek (Nokia - IN/Bangalore), <
abhishek@nokia.com> wrote:

> Yeah… Not sure if I’m missing any configurations which is causing this
> issue. Any suggestions?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Gourav Sengupta 
> *Sent:* Wednesday, August 26, 2020 2:35 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark 3.0 using S3 taking long time for some set of TPC DS
> Queries
>
>
>
> Hi,
>
>
>
> So the results does not make sense.
>
>
>
>
>
> Regards,
>
> Gourav
>
>
>
> On Wed, Aug 26, 2020 at 9:04 AM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi Gourav,
>
>
>
> Yes. We’re using s3a.
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Gourav Sengupta 
> *Sent:* Wednesday, August 26, 2020 1:18 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark 3.0 using S3 taking long time for some set of TPC DS
> Queries
>
>
>
> Hi,
>
>
>
> are you using s3a, which is not using EMRFS? In that case, these results
> does not make sense to me.
>
>
>
> Regards,
>
> Gourav Sengupta
>
>
>
> On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi All,
>
>
>
> We’re doing some performance comparisons between Spark querying data on
> HDFS vs Spark querying data on S3 (Ceph Object Store used for S3 storage)
> using standard TPC DS Queries. We are observing that Spark 3.0 with S3 is
> consuming significantly larger duration for some set of queries when
> compared with HDFS.
>
> We also ran similar queries with Spark 2.4.5 querying data from S3 and we
> see that for these set of queries, time taken by Spark 2.4.5 is lesser
> compared to Spark 3.0 looks to be very strange.
>
> Below are the details of 9 queries where Spark 3.0 is taking >5 times the
> duration for running queries on S3 when compared to Hadoop.
>
>
>
> *Environment Details:*
>
>- *Spark running on Kubernetes*
>- *TPC DS Scale Factor*: *500 GB*
>- *Hadoop 3.x*
>- *Same CPU and memory used for all executions*
>
>
>
> *Query*
>
> *Spark 3.0 with S3 (Time in seconds)*
>
> *Spark 3.0 with Hadoop (Time in seconds)*
>
>
>
>
>
> *Spark 2.4.5 with S3 *
>
> *(Time in seconds)*
>
> *Spark 3.0 HDFS vs S3 (Factor)*
>
> *Spark 2.4.5 S3 vs Spark 3.0 S3 (Factor)*
>
> *Table involved*
>
> 9
>
> 880.129
>
> 106.109
>
> 147.65
>
> *8.294574*
>
> *5.960914*
>
> store_sales
>
> 44
>
> 129.618
>
> 23.747
>
> 103.916
>
> *5.458289*
>
> *1.247334*
>
> store_sales
>
> 58
>
> 142.113
>
> 20.996
>
> 33.936
>
> *6.768575*
>
> *4.187677*
>
> store_sales
>
> 62
>
> 32.519
>
> 5.425
>
> 14.809
>
> *5.994286*
>
> *2.195894*
>
> web_sales
>
> 76
>
> 138.765
>
> 20.73
>
> 49.892
>
> *6.693922*
>
> *2.781308*
>
> store_sales
>
> 88
>
> 475.824
>
> 48.2
>
> 94.382
>
> *9.871867*
>
> *5.04147*
>
> store_sales
>
> 90
>
> 53.896
>
> 6.804
>
> 18.11
>
> *7.921223*
>
> *2.976035*
>
> web_sales
>
> 94
>
> 241.172
>
> 43.49
>
> 81.181
>
> *5.545459*
>
> *2.970794*
>
> web_sales
>
> 96
>
> 67.059
>
> 10.396
>
> 15.993
>
> *6.450462*
>
> *4.193022*
>
> store_sales
>
>
>
> When we analysed it further, we see that all these queries are performing
> operations either on store_sales or web_sales tables and Spark 3 with S3
> seems to be downloading much more data from storage when compared to Spark
> 3 with Hadoop or Spark 2.4.5 with S3 and this is resulting in more time for
> query completion. I’m attaching the screen shots of Driver UI for one such
> instance (Query 9) for reference.
>
> Also attached the spark configurations (Spark 3.0) used for these tests.
>
>
>
> We’re not sure why Spark 3.0 on S3 is having this behaviour. Any inputs on
> what we’re missing?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Yeah… Not sure if I’m missing any configurations which is causing this issue. 
Any suggestions?

Thanks and Regards,
Abhishek

From: Gourav Sengupta 
Sent: Wednesday, August 26, 2020 2:35 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
Cc: user@spark.apache.org
Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

Hi,

So the results does not make sense.


Regards,
Gourav

On Wed, Aug 26, 2020 at 9:04 AM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi Gourav,

Yes. We’re using s3a.

Thanks and Regards,
Abhishek

From: Gourav Sengupta 
mailto:gourav.sengu...@gmail.com>>
Sent: Wednesday, August 26, 2020 1:18 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>>
Cc: user@spark.apache.org
Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

Hi,

are you using s3a, which is not using EMRFS? In that case, these results does 
not make sense to me.

Regards,
Gourav Sengupta

On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi All,

We’re doing some performance comparisons between Spark querying data on HDFS vs 
Spark querying data on S3 (Ceph Object Store used for S3 storage) using 
standard TPC DS Queries. We are observing that Spark 3.0 with S3 is consuming 
significantly larger duration for some set of queries when compared with HDFS.
We also ran similar queries with Spark 2.4.5 querying data from S3 and we see 
that for these set of queries, time taken by Spark 2.4.5 is lesser compared to 
Spark 3.0 looks to be very strange.
Below are the details of 9 queries where Spark 3.0 is taking >5 times the 
duration for running queries on S3 when compared to Hadoop.

Environment Details:

  *   Spark running on Kubernetes
  *   TPC DS Scale Factor: 500 GB
  *   Hadoop 3.x
  *   Same CPU and memory used for all executions

Query
Spark 3.0 with S3 (Time in seconds)
Spark 3.0 with Hadoop (Time in seconds)


Spark 2.4.5 with S3
(Time in seconds)
Spark 3.0 HDFS vs S3 (Factor)
Spark 2.4.5 S3 vs Spark 3.0 S3 (Factor)
Table involved
9
880.129
106.109
147.65
8.294574
5.960914
store_sales
44
129.618
23.747
103.916
5.458289
1.247334
store_sales
58
142.113
20.996
33.936
6.768575
4.187677
store_sales
62
32.519
5.425
14.809
5.994286
2.195894
web_sales
76
138.765
20.73
49.892
6.693922
2.781308
store_sales
88
475.824
48.2
94.382
9.871867
5.04147
store_sales
90
53.896
6.804
18.11
7.921223
2.976035
web_sales
94
241.172
43.49
81.181
5.545459
2.970794
web_sales
96
67.059
10.396
15.993
6.450462
4.193022
store_sales

When we analysed it further, we see that all these queries are performing 
operations either on store_sales or web_sales tables and Spark 3 with S3 seems 
to be downloading much more data from storage when compared to Spark 3 with 
Hadoop or Spark 2.4.5 with S3 and this is resulting in more time for query 
completion. I’m attaching the screen shots of Driver UI for one such instance 
(Query 9) for reference.
Also attached the spark configurations (Spark 3.0) used for these tests.

We’re not sure why Spark 3.0 on S3 is having this behaviour. Any inputs on what 
we’re missing?

Thanks and Regards,
Abhishek


-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org


Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
Hi,

So the results does not make sense.


Regards,
Gourav

On Wed, Aug 26, 2020 at 9:04 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:

> Hi Gourav,
>
>
>
> Yes. We’re using s3a.
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Gourav Sengupta 
> *Sent:* Wednesday, August 26, 2020 1:18 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark 3.0 using S3 taking long time for some set of TPC DS
> Queries
>
>
>
> Hi,
>
>
>
> are you using s3a, which is not using EMRFS? In that case, these results
> does not make sense to me.
>
>
>
> Regards,
>
> Gourav Sengupta
>
>
>
> On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi All,
>
>
>
> We’re doing some performance comparisons between Spark querying data on
> HDFS vs Spark querying data on S3 (Ceph Object Store used for S3 storage)
> using standard TPC DS Queries. We are observing that Spark 3.0 with S3 is
> consuming significantly larger duration for some set of queries when
> compared with HDFS.
>
> We also ran similar queries with Spark 2.4.5 querying data from S3 and we
> see that for these set of queries, time taken by Spark 2.4.5 is lesser
> compared to Spark 3.0 looks to be very strange.
>
> Below are the details of 9 queries where Spark 3.0 is taking >5 times the
> duration for running queries on S3 when compared to Hadoop.
>
>
>
> *Environment Details:*
>
>- *Spark running on Kubernetes*
>- *TPC DS Scale Factor*: *500 GB*
>- *Hadoop 3.x*
>- *Same CPU and memory used for all executions*
>
>
>
> *Query*
>
> *Spark 3.0 with S3 (Time in seconds)*
>
> *Spark 3.0 with Hadoop (Time in seconds)*
>
>
>
>
>
> *Spark 2.4.5 with S3 *
>
> *(Time in seconds)*
>
> *Spark 3.0 HDFS vs S3 (Factor)*
>
> *Spark 2.4.5 S3 vs Spark 3.0 S3 (Factor)*
>
> *Table involved*
>
> 9
>
> 880.129
>
> 106.109
>
> 147.65
>
> *8.294574*
>
> *5.960914*
>
> store_sales
>
> 44
>
> 129.618
>
> 23.747
>
> 103.916
>
> *5.458289*
>
> *1.247334*
>
> store_sales
>
> 58
>
> 142.113
>
> 20.996
>
> 33.936
>
> *6.768575*
>
> *4.187677*
>
> store_sales
>
> 62
>
> 32.519
>
> 5.425
>
> 14.809
>
> *5.994286*
>
> *2.195894*
>
> web_sales
>
> 76
>
> 138.765
>
> 20.73
>
> 49.892
>
> *6.693922*
>
> *2.781308*
>
> store_sales
>
> 88
>
> 475.824
>
> 48.2
>
> 94.382
>
> *9.871867*
>
> *5.04147*
>
> store_sales
>
> 90
>
> 53.896
>
> 6.804
>
> 18.11
>
> *7.921223*
>
> *2.976035*
>
> web_sales
>
> 94
>
> 241.172
>
> 43.49
>
> 81.181
>
> *5.545459*
>
> *2.970794*
>
> web_sales
>
> 96
>
> 67.059
>
> 10.396
>
> 15.993
>
> *6.450462*
>
> *4.193022*
>
> store_sales
>
>
>
> When we analysed it further, we see that all these queries are performing
> operations either on store_sales or web_sales tables and Spark 3 with S3
> seems to be downloading much more data from storage when compared to Spark
> 3 with Hadoop or Spark 2.4.5 with S3 and this is resulting in more time for
> query completion. I’m attaching the screen shots of Driver UI for one such
> instance (Query 9) for reference.
>
> Also attached the spark configurations (Spark 3.0) used for these tests.
>
>
>
> We’re not sure why Spark 3.0 on S3 is having this behaviour. Any inputs on
> what we’re missing?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Gourav,

Yes. We’re using s3a.

Thanks and Regards,
Abhishek

From: Gourav Sengupta 
Sent: Wednesday, August 26, 2020 1:18 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
Cc: user@spark.apache.org
Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

Hi,

are you using s3a, which is not using EMRFS? In that case, these results does 
not make sense to me.

Regards,
Gourav Sengupta

On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi All,

We’re doing some performance comparisons between Spark querying data on HDFS vs 
Spark querying data on S3 (Ceph Object Store used for S3 storage) using 
standard TPC DS Queries. We are observing that Spark 3.0 with S3 is consuming 
significantly larger duration for some set of queries when compared with HDFS.
We also ran similar queries with Spark 2.4.5 querying data from S3 and we see 
that for these set of queries, time taken by Spark 2.4.5 is lesser compared to 
Spark 3.0 looks to be very strange.
Below are the details of 9 queries where Spark 3.0 is taking >5 times the 
duration for running queries on S3 when compared to Hadoop.

Environment Details:

  *   Spark running on Kubernetes
  *   TPC DS Scale Factor: 500 GB
  *   Hadoop 3.x
  *   Same CPU and memory used for all executions

Query
Spark 3.0 with S3 (Time in seconds)
Spark 3.0 with Hadoop (Time in seconds)


Spark 2.4.5 with S3
(Time in seconds)
Spark 3.0 HDFS vs S3 (Factor)
Spark 2.4.5 S3 vs Spark 3.0 S3 (Factor)
Table involved
9
880.129
106.109
147.65
8.294574
5.960914
store_sales
44
129.618
23.747
103.916
5.458289
1.247334
store_sales
58
142.113
20.996
33.936
6.768575
4.187677
store_sales
62
32.519
5.425
14.809
5.994286
2.195894
web_sales
76
138.765
20.73
49.892
6.693922
2.781308
store_sales
88
475.824
48.2
94.382
9.871867
5.04147
store_sales
90
53.896
6.804
18.11
7.921223
2.976035
web_sales
94
241.172
43.49
81.181
5.545459
2.970794
web_sales
96
67.059
10.396
15.993
6.450462
4.193022
store_sales

When we analysed it further, we see that all these queries are performing 
operations either on store_sales or web_sales tables and Spark 3 with S3 seems 
to be downloading much more data from storage when compared to Spark 3 with 
Hadoop or Spark 2.4.5 with S3 and this is resulting in more time for query 
completion. I’m attaching the screen shots of Driver UI for one such instance 
(Query 9) for reference.
Also attached the spark configurations (Spark 3.0) used for these tests.

We’re not sure why Spark 3.0 on S3 is having this behaviour. Any inputs on what 
we’re missing?

Thanks and Regards,
Abhishek


-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org


Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
Hi,

are you using s3a, which is not using EMRFS? In that case, these results
does not make sense to me.

Regards,
Gourav Sengupta

On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:

> Hi All,
>
>
>
> We’re doing some performance comparisons between Spark querying data on
> HDFS vs Spark querying data on S3 (Ceph Object Store used for S3 storage)
> using standard TPC DS Queries. We are observing that Spark 3.0 with S3 is
> consuming significantly larger duration for some set of queries when
> compared with HDFS.
>
> We also ran similar queries with Spark 2.4.5 querying data from S3 and we
> see that for these set of queries, time taken by Spark 2.4.5 is lesser
> compared to Spark 3.0 looks to be very strange.
>
> Below are the details of 9 queries where Spark 3.0 is taking >5 times the
> duration for running queries on S3 when compared to Hadoop.
>
>
>
> *Environment Details:*
>
>- *Spark running on Kubernetes*
>- *TPC DS Scale Factor*: *500 GB*
>- *Hadoop 3.x*
>- *Same CPU and memory used for all executions*
>
>
>
> *Query*
>
> *Spark 3.0 with S3 (Time in seconds)*
>
> *Spark 3.0 with Hadoop (Time in seconds)*
>
>
>
>
>
> *Spark 2.4.5 with S3 *
>
> *(Time in seconds)*
>
> *Spark 3.0 HDFS vs S3 (Factor)*
>
> *Spark 2.4.5 S3 vs Spark 3.0 S3 (Factor)*
>
> *Table involved*
>
> 9
>
> 880.129
>
> 106.109
>
> 147.65
>
> *8.294574*
>
> *5.960914*
>
> store_sales
>
> 44
>
> 129.618
>
> 23.747
>
> 103.916
>
> *5.458289*
>
> *1.247334*
>
> store_sales
>
> 58
>
> 142.113
>
> 20.996
>
> 33.936
>
> *6.768575*
>
> *4.187677*
>
> store_sales
>
> 62
>
> 32.519
>
> 5.425
>
> 14.809
>
> *5.994286*
>
> *2.195894*
>
> web_sales
>
> 76
>
> 138.765
>
> 20.73
>
> 49.892
>
> *6.693922*
>
> *2.781308*
>
> store_sales
>
> 88
>
> 475.824
>
> 48.2
>
> 94.382
>
> *9.871867*
>
> *5.04147*
>
> store_sales
>
> 90
>
> 53.896
>
> 6.804
>
> 18.11
>
> *7.921223*
>
> *2.976035*
>
> web_sales
>
> 94
>
> 241.172
>
> 43.49
>
> 81.181
>
> *5.545459*
>
> *2.970794*
>
> web_sales
>
> 96
>
> 67.059
>
> 10.396
>
> 15.993
>
> *6.450462*
>
> *4.193022*
>
> store_sales
>
>
>
> When we analysed it further, we see that all these queries are performing
> operations either on store_sales or web_sales tables and Spark 3 with S3
> seems to be downloading much more data from storage when compared to Spark
> 3 with Hadoop or Spark 2.4.5 with S3 and this is resulting in more time for
> query completion. I’m attaching the screen shots of Driver UI for one such
> instance (Query 9) for reference.
>
> Also attached the spark configurations (Spark 3.0) used for these tests.
>
>
>
> We’re not sure why Spark 3.0 on S3 is having this behaviour. Any inputs on
> what we’re missing?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org