[MLlib] StackOverflow Error spark mllib 1.6.1 FpGrowth Algorithm for Association rule generation

2016-12-27 Thread Maitray Thaker
Hi,
I am getting stackoverflow error when I run FpGrowth algorithm on my
21 million transactions with a low support, since I want almost every
products association with other product. I know the problem is caused
by the recursive lineage of the algorithm, but I don't know how to get
around this problem. I also don't know that the RDD checkpointing is
done internally or not in the algorithm. Please suggest a solution
thanks.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



how to integrate Apache Kafka with spark ?

2016-12-27 Thread sathyanarayanan mudhaliyar
How do I take input from Apache Kafka into Apache Spark Streaming for
stream processing ?

-sathya


StackOverflow Error spark mllib 1.6.1 FpGrowth Algorithm for Association rule generation

2016-12-27 Thread Maitray Thaker
Hi,
I am getting stackoverflow error when I run FpGrowth algorithm on my
21 million transactions with a low support, since I want almost every
products association with other product. I know the problem is caused
by the recursive lineage of the algorithm, but I don't know how to get
around this problem. I also don't know that the RDD checkpointing is
done internally or not in the algorithm. Please suggest a solution
thanks.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark streaming with Yarn: executors not fully utilized

2016-12-27 Thread Nishant Kumar
I have updated my question:

http://stackoverflow.com/questions/41345552/spark-streaming-with-yarn-executors-not-fully-utilized

On Wed, Dec 28, 2016 at 9:49 AM, Nishant Kumar 
wrote:

> Hi,
>
> I am running spark streaming with Yarn with -
>
> *spark-submit --master yarn --deploy-mode cluster --num-executors 2 
> --executor-memory 8g --driver-memory 2g --executor-cores 8 ..*
>
> I am consuming Kafka through DireactStream approach (No receiver). I have
> 2 topics (each with 3 partitions).
>
> I reparation RDD (i have one DStream) into 16 parts (assuming no of
> executor * num of cores = 2 * 8 = 16 *Is it correct ?*) and then i do
> foreachPartition and writes each partition to local file and then send it
> to other server (not spark) through http (Using apache http client post
> with multi-part).
>
> *When i checked details of this step (or JOB is it correct naming?)
> through Spark UI, it showed that total 16 task executed on single executor
> with 8 task at a time.*
>
> This is Spark UI details -
>
> *Details for Stage 717 (Attempt 0)*
>
> Index  ID  Attempt Status  Locality Level  Executor ID / Host  Launch Time 
> Duration  GC Time Shuffle Read Size / Records Errors
> 0  5080  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 313.3 KB / 6137
> 1  5081  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 328.5 KB / 6452
> 2  5082  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 324.3 KB / 6364
> 3  5083  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 321.5 KB / 6306
> 4  5084  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 324.8 KB / 6364
> 5  5085  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 320.8 KB / 6307
> 6  5086  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 2 s 11 ms 323.4 KB / 6356
> 7  5087  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:46 3 s 11 ms 316.8 KB / 6207
> 8  5088  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   317.7 KB / 6245
> 9  5089  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   320.4 KB / 6280
> 10  5090  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   323.0 KB / 6334
> 11  5091  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   323.7 KB / 6371
> 12  5092  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   316.7 KB / 6218
> 13  5093  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   321.0 KB / 6301
> 14  5094  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name  2016/12/27 
> 12:11:48 2 s   321.4 KB / 6304
>
> I was expecting it to execute 16 parallel task (2 executor * 8 core) on
> either one or more executor. I think i am missing something. Please help.
>
>
> --
> *Nishant Kumar*
> Senior Software Engineer
>
> Phone: +91 80088 42030
> Skype: nishant.kumar_applift
>
>
> *AppLift India*
> 107/3, 80 Feet Main Road,
> Koramangala 4th Block,
> Bangalore - 560034
> www.AppLift.com 
>



-- 
*Nishant Kumar*
Senior Software Engineer

Phone: +91 80088 42030
Skype: nishant.kumar_applift


*AppLift India*
107/3, 80 Feet Main Road,
Koramangala 4th Block,
Bangalore - 560034
www.AppLift.com 

-- 


Meet us at:
@ *PG Connects*, London, Jan 16-17
@ *Mobile Games Forum*, London, Jan 17-18
@ *GMASA*, Jakarta, Jan 26
@* Casual Connect*, Berlin, Feb 7-9
@* Mobile World Congress*, Barcelona, Feb 27-Mar 2
@* GDC*, San Francisco, Feb 27-Mar 3

Click here  to see all the events we will be 
attending. 


Spark streaming with Yarn: executors not fully utilized

2016-12-27 Thread Nishant Kumar
 Hi,

I am running spark streaming with Yarn with -

*spark-submit --master yarn --deploy-mode cluster --num-executors 2
--executor-memory 8g --driver-memory 2g --executor-cores 8 ..*

I am consuming Kafka through DireactStream approach (No receiver). I have 2
topics (each with 3 partitions).

I reparation RDD (i have one DStream) into 16 parts (assuming no of
executor * num of cores = 2 * 8 = 16 *Is it correct ?*) and then i do
foreachPartition and writes each partition to local file and then send it
to other server (not spark) through http (Using apache http client post
with multi-part).

*When i checked details of this step (or JOB is it correct naming?) through
Spark UI, it showed that total 16 task executed on single executor with 8
task at a time.*

This is Spark UI details -

*Details for Stage 717 (Attempt 0)*

Index  ID  Attempt Status  Locality Level  Executor ID / Host  Launch
Time Duration  GC Time Shuffle Read Size / Records Errors
0  5080  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 313.3 KB / 6137
1  5081  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 328.5 KB / 6452
2  5082  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 324.3 KB / 6364
3  5083  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 321.5 KB / 6306
4  5084  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 324.8 KB / 6364
5  5085  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 320.8 KB / 6307
6  5086  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 2 s 11 ms 323.4 KB / 6356
7  5087  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:46 3 s 11 ms 316.8 KB / 6207
8  5088  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   317.7 KB / 6245
9  5089  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   320.4 KB / 6280
10  5090  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   323.0 KB / 6334
11  5091  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   323.7 KB / 6371
12  5092  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   316.7 KB / 6218
13  5093  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   321.0 KB / 6301
14  5094  0 SUCCESS NODE_LOCAL  1 / executor1_machine_host_name
2016/12/27 12:11:48 2 s   321.4 KB / 6304

I was expecting it to execute 16 parallel task (2 executor * 8 core) on
either one or more executor. I think i am missing something. Please help.


-- 
*Nishant Kumar*
Senior Software Engineer

Phone: +91 80088 42030
Skype: nishant.kumar_applift


*AppLift India*
107/3, 80 Feet Main Road,
Koramangala 4th Block,
Bangalore - 560034
www.AppLift.com 

-- 


Meet us at:
@ *PG Connects*, London, Jan 16-17
@ *Mobile Games Forum*, London, Jan 17-18
@ *GMASA*, Jakarta, Jan 26
@* Casual Connect*, Berlin, Feb 7-9
@* Mobile World Congress*, Barcelona, Feb 27-Mar 2
@* GDC*, San Francisco, Feb 27-Mar 3

Click here  to see all the events we will be 
attending. 


Re: Location for the additional jar files in Spark

2016-12-27 Thread Divya Gehlot
Hi Mich ,

Have  you set SPARK_CLASSPATH in Spark-env.sh ?


Thanks,
Divya

On 27 December 2016 at 17:33, Mich Talebzadeh 
wrote:

> When one runs in Local mode (one JVM) on an edge host (the host user
> accesses the cluster), it is possible to put additional jar file say
> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>
> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>
> Normally a group of users can have read access to a shared directory like
> above and once they log in their shell will invoke an environment file that
> will have the above classpath plus additional parameters like $JAVA_HOME
> etc are set up for them.
>
> However, if the user chooses to run spark through spark-submit with yarn,
> then the only way this will work in my research is to add the jar file as
> follows on every node of Spark cluster
>
> in $SPARK_HOME/conf/spark-defaults.conf
>
> Add the jar path to the following:
>
> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>
> Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
> will cause initialisation error
>
> ERROR SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Found both spark.executor.extraClassPath
> and SPARK_CLASSPATH. Use only the former.
>
> I was wondering if there are other ways of making this work in YARN mode,
> where every node of cluster will require this JAR file?
>
> Thanks
>


Re: unsubscribe

2016-12-27 Thread Minikek
Once you are in, there is no way out… :-)

> On Dec 27, 2016, at 7:37 PM, Kyle Kelley  wrote:
> 
> You are now in position 238 for unsubscription. If you wish for your
> subscription to occur immediately, please email
> dev-unsubscr...@spark.apache.org
> 
> Best wishes.
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-27 Thread Sun Rui
Although the Spark task scheduler is aware of rack-level data locality, it 
seems that only YARN implements the support for it. However, node-level 
locality can still work for Standalone.

It is not necessary to copy the hadoop config files into the Spark CONF 
directory. Set HADOOP_CONF_DIR to point to the conf directory of your Hadoop.

Data Locality involves in both task data locality and executor data locality. 
Executor data locality is only supported on YARN with executor dynamic 
allocation enabled. For standalone, by default, a Spark application will 
acquire all available cores in the cluster, generally meaning there is at least 
one executor on each node, in which case task data locality can work because a 
task can be dispatched to an executor on any of the preferred nodes of the task 
for execution.

for your case, have you set spark.cores.max to limit the cores to acquire, 
which means executors are available on a subset of the cluster nodes?

> On Dec 27, 2016, at 01:39, Karamba  wrote:
> 
> Hi,
> 
> I am running a couple of docker hosts, each with an HDFS and a spark
> worker in a spark standalone cluster.
> In order to get data locality awareness, I would like to configure Racks
> for each host, so that a spark worker container knows from which hdfs
> node container it should load its data. Does this make sense?
> 
> I configured HDFS container nodes via the core-site.xml in
> $HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my
> setup.
> 
> I configured SPARK the same way. I placed core-site.xml and
> hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect.
> 
> Submitting a spark job via spark-submit to the spark-master that loads
> from HDFS just has Data locality ANY.
> 
> It would be great if anybody would help me getting the right configuration!
> 
> Thanks and best regards,
> on
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Location for the additional jar files in Spark

2016-12-27 Thread Mendelson, Assaf
You should probably add --driver-class-path with the jar as well. In theory 
--jars should add it to the driver as well but in my experience it does not (I 
think there was a jira open on it). In any case you can find it in 
stackoverflow: See 
http://stackoverflow.com/questions/40995943/connect-to-oracle-db-using-pyspark/41000181#41000181.
 Another thing you might want to try is adding the driver option to the read. 
See 
http://stackoverflow.com/questions/36326066/working-with-jdbc-jar-in-pyspark/36328672#36328672.
Assaf

From: Léo Biscassi [mailto:leo.bisca...@gmail.com]
Sent: Tuesday, December 27, 2016 2:59 PM
To: Mich Talebzadeh; Deepak Sharma
Cc: user @spark
Subject: Re: Location for the additional jar files in Spark


Hi all,
I have the same problem with spark 2.0.2.

Best regards,

On Tue, Dec 27, 2016, 9:40 AM Mich Talebzadeh 
> wrote:
Thanks Deppak

but get the same error unfortunately

ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell
Spark context Web UI available at http://50.140.197.217:4041
Spark context available as 'sc' (master = local[*], app id = 
local-1482842478988).

Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
  /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
HiveContext: org.apache.spark.sql.hive.HiveContext = 
org.apache.spark.sql.hive.HiveContext@a323a5b
scala> //val sqlContext = new HiveContext(sc)
scala> println ("\nStarted at"); spark.sql("SELECT 
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') 
").collect.foreach(println)
Started at
[27/12/2016 12:41:43.43]
scala> //
scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "oracle"
_password: String = oracle
scala> //
scala> val s = HiveContext.read.format("jdbc").options(
 | Map("url" -> _ORACLEserver,
 | "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED, 
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
 | "partitionColumn" -> "ID",
 | "lowerBound" -> "1",
 | "upperBound" -> "1",
 | "numPartitions" -> "10",
 | "user" -> _username,
 | "password" -> _password)).load
java.sql.SQLException: No suitable driver
  at java.sql.DriverManager.getDriver(DriverManager.java:315)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at scala.Option.getOrElse(Option.scala:121)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:53)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:117)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
  ... 56 elided


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 27 December 2016 at 11:37, Deepak Sharma 
> wrote:
How about this:
ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell

Thanks
Deepak

On Tue, Dec 27, 2016 at 5:04 PM, Mich Talebzadeh 
> wrote:
Ok I tried this but no luck

spark-shell --jars /home/hduser/jars/ojdbc6.jar
Spark context Web UI available at http://50.140.197.217:4041
Spark context available as 'sc' (master = local[*], app id = 
local-1482838526271).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
  /_/
Using Scala version 2.11.8 (Java 

Re: Location for the additional jar files in Spark

2016-12-27 Thread Léo Biscassi
Hi all,
I have the same problem with spark 2.0.2.

Best regards,

On Tue, Dec 27, 2016, 9:40 AM Mich Talebzadeh 
wrote:

> Thanks Deppak
>
> but get the same error unfortunately
>
> ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell
> Spark context Web UI available at http://50.140.197.217:4041
> Spark context available as 'sc' (master = local[*], app id =
> local-1482842478988).
>
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0
>   /_/
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_77)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for
> details
> HiveContext: org.apache.spark.sql.hive.HiveContext =
> org.apache.spark.sql.hive.HiveContext@a323a5b
> scala> //val sqlContext = new HiveContext(sc)
> scala> println ("\nStarted at"); spark.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
> Started at
> [27/12/2016 12:41:43.43]
> scala> //
> scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
> _ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
> scala> var _username = "scratchpad"
> _username: String = scratchpad
> scala> var _password = "oracle"
> _password: String = oracle
> scala> //
> scala> val s = HiveContext.read.format("jdbc").options(
>  | Map("url" -> _ORACLEserver,
>  | "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
> RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
>  | "partitionColumn" -> "ID",
>  | "lowerBound" -> "1",
>  | "upperBound" -> "1",
>  | "numPartitions" -> "10",
>  | "user" -> _username,
>  | "password" -> _password)).load
> java.sql.SQLException: No suitable driver
>   at java.sql.DriverManager.getDriver(DriverManager.java:315)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
>   at scala.Option.getOrElse(Option.scala:121)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:53)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:117)
>   at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
>   at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
>   ... 56 elided
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 11:37, Deepak Sharma  wrote:
>
> How about this:
> ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell
>
> Thanks
> Deepak
>
> On Tue, Dec 27, 2016 at 5:04 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Ok I tried this but no luck
>
> spark-shell --jars /home/hduser/jars/ojdbc6.jar
> Spark context Web UI available at http://50.140.197.217:4041
> Spark context available as 'sc' (master = local[*], app id =
> local-1482838526271).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0
>   /_/
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_77)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for
> details
> HiveContext: org.apache.spark.sql.hive.HiveContext =
> org.apache.spark.sql.hive.HiveContext@ad0bb4e
> scala> //val sqlContext = new HiveContext(sc)
> scala> println ("\nStarted at"); spark.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
> Started at
> [27/12/2016 

Re: Location for the additional jar files in Spark

2016-12-27 Thread Mich Talebzadeh
Thanks Deppak

but get the same error unfortunately

ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell
Spark context Web UI available at http://50.140.197.217:4041
Spark context available as 'sc' (master = local[*], app id =
local-1482842478988).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
  /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for
details
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@a323a5b
scala> //val sqlContext = new HiveContext(sc)
scala> println ("\nStarted at"); spark.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
Started at
[27/12/2016 12:41:43.43]
scala> //
scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "oracle"
_password: String = oracle
scala> //
scala> val s = HiveContext.read.format("jdbc").options(
 | Map("url" -> _ORACLEserver,
 | "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
 | "partitionColumn" -> "ID",
 | "lowerBound" -> "1",
 | "upperBound" -> "1",
 | "numPartitions" -> "10",
 | "user" -> _username,
 | "password" -> _password)).load
java.sql.SQLException: No suitable driver
  at java.sql.DriverManager.getDriver(DriverManager.java:315)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at scala.Option.getOrElse(Option.scala:121)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:53)
  at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
  at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:117)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
  at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
  ... 56 elided

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 December 2016 at 11:37, Deepak Sharma  wrote:

> How about this:
> ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell
>
> Thanks
> Deepak
>
> On Tue, Dec 27, 2016 at 5:04 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Ok I tried this but no luck
>>
>> spark-shell --jars /home/hduser/jars/ojdbc6.jar
>> Spark context Web UI available at http://50.140.197.217:4041
>> Spark context available as 'sc' (master = local[*], app id =
>> local-1482838526271).
>> Spark session available as 'spark'.
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0
>>   /_/
>> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.8.0_77)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> warning: there was one deprecation warning; re-run with -deprecation for
>> details
>> HiveContext: org.apache.spark.sql.hive.HiveContext =
>> org.apache.spark.sql.hive.HiveContext@ad0bb4e
>> scala> //val sqlContext = new HiveContext(sc)
>> scala> println ("\nStarted at"); spark.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>> Started at
>> [27/12/2016 11:36:26.26]
>> scala> //
>> scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
>> _ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
>> scala> var _username = "scratchpad"
>> _username: String = scratchpad
>> scala> var _password = "oracle"
>> _password: String = 

Re: Location for the additional jar files in Spark

2016-12-27 Thread Deepak Sharma
How about this:
ADD_JARS="/home/hduser/jars/ojdbc6.jar" spark-shell

Thanks
Deepak

On Tue, Dec 27, 2016 at 5:04 PM, Mich Talebzadeh 
wrote:

> Ok I tried this but no luck
>
> spark-shell --jars /home/hduser/jars/ojdbc6.jar
> Spark context Web UI available at http://50.140.197.217:4041
> Spark context available as 'sc' (master = local[*], app id =
> local-1482838526271).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0
>   /_/
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_77)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for
> details
> HiveContext: org.apache.spark.sql.hive.HiveContext =
> org.apache.spark.sql.hive.HiveContext@ad0bb4e
> scala> //val sqlContext = new HiveContext(sc)
> scala> println ("\nStarted at"); spark.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
> Started at
> [27/12/2016 11:36:26.26]
> scala> //
> scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
> _ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
> scala> var _username = "scratchpad"
> _username: String = scratchpad
> scala> var _password = "oracle"
> _password: String = oracle
> scala> //
> scala> val s = HiveContext.read.format("jdbc").options(
>  | Map("url" -> _ORACLEserver,
>  | "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
> RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
>  | "partitionColumn" -> "ID",
>  | "lowerBound" -> "1",
>  | "upperBound" -> "1",
>  | "numPartitions" -> "10",
>  | "user" -> _username,
>  | "password" -> _password)).load
> java.sql.SQLException: No suitable driver
>   at java.sql.DriverManager.getDriver(DriverManager.java:315)
>   at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
>   at org.apache.spark.sql.execution.datasources.jdbc.
> JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.
> createConnectionFactory(JdbcUtils.scala:53)
>   at org.apache.spark.sql.execution.datasources.jdbc.
> JDBCRDD$.resolveTable(JDBCRDD.scala:123)
>   at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(
> JDBCRelation.scala:117)
>   at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.
> createRelation(JdbcRelationProvider.scala:53)
>   at org.apache.spark.sql.execution.datasources.
> DataSource.resolveRelation(DataSource.scala:315)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
>   ... 56 elided
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 11:23, Deepak Sharma  wrote:
>
>> I meant ADD_JARS as you said --jars is not working for you with
>> spark-shell.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Dec 27, 2016 at 4:51 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Ok just to be clear do you mean
>>>
>>> ADD_JARS="~/jars/ojdbc6.jar" spark-shell
>>>
>>> or
>>>
>>> spark-shell --jars $ADD_JARS
>>>
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 December 2016 at 10:30, Deepak Sharma 
>>> wrote:
>>>
 It works for me with spark 1.6 (--jars)
 Please try this:
 ADD_JARS="<>" spark-shell

 Thanks
 Deepak

 On Tue, Dec 27, 2016 at 

Re: Location for the additional jar files in Spark

2016-12-27 Thread Mich Talebzadeh
Ok I tried this but no luck

spark-shell --jars /home/hduser/jars/ojdbc6.jar
Spark context Web UI available at http://50.140.197.217:4041
Spark context available as 'sc' (master = local[*], app id =
local-1482838526271).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
  /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for
details
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@ad0bb4e
scala> //val sqlContext = new HiveContext(sc)
scala> println ("\nStarted at"); spark.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
Started at
[27/12/2016 11:36:26.26]
scala> //
scala> var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username = "scratchpad"
_username: String = scratchpad
scala> var _password = "oracle"
_password: String = oracle
scala> //
scala> val s = HiveContext.read.format("jdbc").options(
 | Map("url" -> _ORACLEserver,
 | "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
 | "partitionColumn" -> "ID",
 | "lowerBound" -> "1",
 | "upperBound" -> "1",
 | "numPartitions" -> "10",
 | "user" -> _username,
 | "password" -> _password)).load
java.sql.SQLException: No suitable driver
  at java.sql.DriverManager.getDriver(DriverManager.java:315)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:54)
  at scala.Option.getOrElse(Option.scala:121)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:53)
  at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
  at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:117)
  at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
  at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
  ... 56 elided




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 December 2016 at 11:23, Deepak Sharma  wrote:

> I meant ADD_JARS as you said --jars is not working for you with
> spark-shell.
>
> Thanks
> Deepak
>
> On Tue, Dec 27, 2016 at 4:51 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Ok just to be clear do you mean
>>
>> ADD_JARS="~/jars/ojdbc6.jar" spark-shell
>>
>> or
>>
>> spark-shell --jars $ADD_JARS
>>
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 December 2016 at 10:30, Deepak Sharma 
>> wrote:
>>
>>> It works for me with spark 1.6 (--jars)
>>> Please try this:
>>> ADD_JARS="<>" spark-shell
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Dec 27, 2016 at 3:49 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Thanks.

 The problem is that with spark-shell --jars does not work! This is
 Spark 2 accessing Oracle 12c

 spark-shell --jars /home/hduser/jars/ojdbc6.jar

 It comes back with

 java.sql.SQLException: No suitable driver

 unfortunately

 and spark-shell uses spark-submit 

Re: Location for the additional jar files in Spark

2016-12-27 Thread Deepak Sharma
I meant ADD_JARS as you said --jars is not working for you with spark-shell.

Thanks
Deepak

On Tue, Dec 27, 2016 at 4:51 PM, Mich Talebzadeh 
wrote:

> Ok just to be clear do you mean
>
> ADD_JARS="~/jars/ojdbc6.jar" spark-shell
>
> or
>
> spark-shell --jars $ADD_JARS
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 10:30, Deepak Sharma  wrote:
>
>> It works for me with spark 1.6 (--jars)
>> Please try this:
>> ADD_JARS="<>" spark-shell
>>
>> Thanks
>> Deepak
>>
>> On Tue, Dec 27, 2016 at 3:49 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> The problem is that with spark-shell --jars does not work! This is Spark
>>> 2 accessing Oracle 12c
>>>
>>> spark-shell --jars /home/hduser/jars/ojdbc6.jar
>>>
>>> It comes back with
>>>
>>> java.sql.SQLException: No suitable driver
>>>
>>> unfortunately
>>>
>>> and spark-shell uses spark-submit under the bonnet if you look at the
>>> shell file
>>>
>>> "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main
>>> --name "Spark shell" "$@"
>>>
>>>
>>> hm
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 27 December 2016 at 09:52, Deepak Sharma 
>>> wrote:
>>>
 Hi Mich
 You can copy the jar to shared location and use --jars command line
 argument of spark-submit.
 Who so ever needs  access to this jar , can refer to the shared path
 and access it using --jars argument.

 Thanks
 Deepak

 On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> When one runs in Local mode (one JVM) on an edge host (the host user
> accesses the cluster), it is possible to put additional jar file say
> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>
> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>
> Normally a group of users can have read access to a shared directory
> like above and once they log in their shell will invoke an environment 
> file
> that will have the above classpath plus additional parameters like
> $JAVA_HOME etc are set up for them.
>
> However, if the user chooses to run spark through spark-submit with
> yarn, then the only way this will work in my research is to add the jar
> file as follows on every node of Spark cluster
>
> in $SPARK_HOME/conf/spark-defaults.conf
>
> Add the jar path to the following:
>
> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>
> Note that setting both spark.executor.extraClassPath and
> SPARK_CLASSPATH
> will cause initialisation error
>
> ERROR SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Found both
> spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
>
> I was wondering if there are other ways of making this work in YARN
> mode, where every node of cluster will require this JAR file?
>
> Thanks
>



 --
 Thanks
 Deepak
 www.bigdatabig.com
 www.keosha.net

>>>
>>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: Location for the additional jar files in Spark

2016-12-27 Thread Mich Talebzadeh
Ok just to be clear do you mean

ADD_JARS="~/jars/ojdbc6.jar" spark-shell

or

spark-shell --jars $ADD_JARS


Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 December 2016 at 10:30, Deepak Sharma  wrote:

> It works for me with spark 1.6 (--jars)
> Please try this:
> ADD_JARS="<>" spark-shell
>
> Thanks
> Deepak
>
> On Tue, Dec 27, 2016 at 3:49 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks.
>>
>> The problem is that with spark-shell --jars does not work! This is Spark
>> 2 accessing Oracle 12c
>>
>> spark-shell --jars /home/hduser/jars/ojdbc6.jar
>>
>> It comes back with
>>
>> java.sql.SQLException: No suitable driver
>>
>> unfortunately
>>
>> and spark-shell uses spark-submit under the bonnet if you look at the
>> shell file
>>
>> "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main
>> --name "Spark shell" "$@"
>>
>>
>> hm
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 27 December 2016 at 09:52, Deepak Sharma 
>> wrote:
>>
>>> Hi Mich
>>> You can copy the jar to shared location and use --jars command line
>>> argument of spark-submit.
>>> Who so ever needs  access to this jar , can refer to the shared path and
>>> access it using --jars argument.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 When one runs in Local mode (one JVM) on an edge host (the host user
 accesses the cluster), it is possible to put additional jar file say
 accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works

 export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar

 Normally a group of users can have read access to a shared directory
 like above and once they log in their shell will invoke an environment file
 that will have the above classpath plus additional parameters like
 $JAVA_HOME etc are set up for them.

 However, if the user chooses to run spark through spark-submit with
 yarn, then the only way this will work in my research is to add the jar
 file as follows on every node of Spark cluster

 in $SPARK_HOME/conf/spark-defaults.conf

 Add the jar path to the following:

 spark.executor.extraClassPath   /user_jars/ojdbc6.jar

 Note that setting both spark.executor.extraClassPath and
 SPARK_CLASSPATH
 will cause initialisation error

 ERROR SparkContext: Error initializing SparkContext.
 org.apache.spark.SparkException: Found both
 spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.

 I was wondering if there are other ways of making this work in YARN
 mode, where every node of cluster will require this JAR file?

 Thanks

>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>


Re: Location for the additional jar files in Spark

2016-12-27 Thread Deepak Sharma
It works for me with spark 1.6 (--jars)
Please try this:
ADD_JARS="<>" spark-shell

Thanks
Deepak

On Tue, Dec 27, 2016 at 3:49 PM, Mich Talebzadeh 
wrote:

> Thanks.
>
> The problem is that with spark-shell --jars does not work! This is Spark 2
> accessing Oracle 12c
>
> spark-shell --jars /home/hduser/jars/ojdbc6.jar
>
> It comes back with
>
> java.sql.SQLException: No suitable driver
>
> unfortunately
>
> and spark-shell uses spark-submit under the bonnet if you look at the
> shell file
>
> "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main
> --name "Spark shell" "$@"
>
>
> hm
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 09:52, Deepak Sharma  wrote:
>
>> Hi Mich
>> You can copy the jar to shared location and use --jars command line
>> argument of spark-submit.
>> Who so ever needs  access to this jar , can refer to the shared path and
>> access it using --jars argument.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> When one runs in Local mode (one JVM) on an edge host (the host user
>>> accesses the cluster), it is possible to put additional jar file say
>>> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>>>
>>> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>>>
>>> Normally a group of users can have read access to a shared directory
>>> like above and once they log in their shell will invoke an environment file
>>> that will have the above classpath plus additional parameters like
>>> $JAVA_HOME etc are set up for them.
>>>
>>> However, if the user chooses to run spark through spark-submit with
>>> yarn, then the only way this will work in my research is to add the jar
>>> file as follows on every node of Spark cluster
>>>
>>> in $SPARK_HOME/conf/spark-defaults.conf
>>>
>>> Add the jar path to the following:
>>>
>>> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>>>
>>> Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
>>> will cause initialisation error
>>>
>>> ERROR SparkContext: Error initializing SparkContext.
>>> org.apache.spark.SparkException: Found both
>>> spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
>>>
>>> I was wondering if there are other ways of making this work in YARN
>>> mode, where every node of cluster will require this JAR file?
>>>
>>> Thanks
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: Location for the additional jar files in Spark

2016-12-27 Thread Mich Talebzadeh
Thanks.

The problem is that with spark-shell --jars does not work! This is Spark 2
accessing Oracle 12c

spark-shell --jars /home/hduser/jars/ojdbc6.jar

It comes back with

java.sql.SQLException: No suitable driver

unfortunately

and spark-shell uses spark-submit under the bonnet if you look at the shell
file

"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name
"Spark shell" "$@"


hm





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 December 2016 at 09:52, Deepak Sharma  wrote:

> Hi Mich
> You can copy the jar to shared location and use --jars command line
> argument of spark-submit.
> Who so ever needs  access to this jar , can refer to the shared path and
> access it using --jars argument.
>
> Thanks
> Deepak
>
> On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> When one runs in Local mode (one JVM) on an edge host (the host user
>> accesses the cluster), it is possible to put additional jar file say
>> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>>
>> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>>
>> Normally a group of users can have read access to a shared directory like
>> above and once they log in their shell will invoke an environment file that
>> will have the above classpath plus additional parameters like $JAVA_HOME
>> etc are set up for them.
>>
>> However, if the user chooses to run spark through spark-submit with yarn,
>> then the only way this will work in my research is to add the jar file as
>> follows on every node of Spark cluster
>>
>> in $SPARK_HOME/conf/spark-defaults.conf
>>
>> Add the jar path to the following:
>>
>> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>>
>> Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
>> will cause initialisation error
>>
>> ERROR SparkContext: Error initializing SparkContext.
>> org.apache.spark.SparkException: Found both
>> spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
>>
>> I was wondering if there are other ways of making this work in YARN mode,
>> where every node of cluster will require this JAR file?
>>
>> Thanks
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>


Re: Location for the additional jar files in Spark

2016-12-27 Thread Deepak Sharma
Hi Mich
You can copy the jar to shared location and use --jars command line
argument of spark-submit.
Who so ever needs  access to this jar , can refer to the shared path and
access it using --jars argument.

Thanks
Deepak

On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh 
wrote:

> When one runs in Local mode (one JVM) on an edge host (the host user
> accesses the cluster), it is possible to put additional jar file say
> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>
> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>
> Normally a group of users can have read access to a shared directory like
> above and once they log in their shell will invoke an environment file that
> will have the above classpath plus additional parameters like $JAVA_HOME
> etc are set up for them.
>
> However, if the user chooses to run spark through spark-submit with yarn,
> then the only way this will work in my research is to add the jar file as
> follows on every node of Spark cluster
>
> in $SPARK_HOME/conf/spark-defaults.conf
>
> Add the jar path to the following:
>
> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>
> Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
> will cause initialisation error
>
> ERROR SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Found both spark.executor.extraClassPath
> and SPARK_CLASSPATH. Use only the former.
>
> I was wondering if there are other ways of making this work in YARN mode,
> where every node of cluster will require this JAR file?
>
> Thanks
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: Location for the additional jar files in Spark

2016-12-27 Thread Sebastian Piu
I take you don't want to use the --jars option to avoid moving them every
time?

On Tue, 27 Dec 2016, 10:33 Mich Talebzadeh, 
wrote:

> When one runs in Local mode (one JVM) on an edge host (the host user
> accesses the cluster), it is possible to put additional jar file say
> accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works
>
> export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar
>
> Normally a group of users can have read access to a shared directory like
> above and once they log in their shell will invoke an environment file that
> will have the above classpath plus additional parameters like $JAVA_HOME
> etc are set up for them.
>
> However, if the user chooses to run spark through spark-submit with yarn,
> then the only way this will work in my research is to add the jar file as
> follows on every node of Spark cluster
>
> in $SPARK_HOME/conf/spark-defaults.conf
>
> Add the jar path to the following:
>
> spark.executor.extraClassPath   /user_jars/ojdbc6.jar
>
> Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
> will cause initialisation error
>
> ERROR SparkContext: Error initializing SparkContext.
> org.apache.spark.SparkException: Found both spark.executor.extraClassPath
> and SPARK_CLASSPATH. Use only the former.
>
> I was wondering if there are other ways of making this work in YARN mode,
> where every node of cluster will require this JAR file?
>
> Thanks
>


Location for the additional jar files in Spark

2016-12-27 Thread Mich Talebzadeh
When one runs in Local mode (one JVM) on an edge host (the host user
accesses the cluster), it is possible to put additional jar file say
accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works

export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar

Normally a group of users can have read access to a shared directory like
above and once they log in their shell will invoke an environment file that
will have the above classpath plus additional parameters like $JAVA_HOME
etc are set up for them.

However, if the user chooses to run spark through spark-submit with yarn,
then the only way this will work in my research is to add the jar file as
follows on every node of Spark cluster

in $SPARK_HOME/conf/spark-defaults.conf

Add the jar path to the following:

spark.executor.extraClassPath   /user_jars/ojdbc6.jar

Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
will cause initialisation error

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Found both spark.executor.extraClassPath
and SPARK_CLASSPATH. Use only the former.

I was wondering if there are other ways of making this work in YARN mode,
where every node of cluster will require this JAR file?

Thanks