Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
at 22:43, Mich Talebzadeh wrote: > Thanks again all. > > Hi Sean, > > As I understood from your statement, you are suggesting just use > --packages without worrying about individual jar dependencies? > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Ivy resolution figure it out. It is not true that everything in .ivy2 is > on the classpath. > > On Tue, Oct 20, 2020 at 3:48 PM Mich Talebzadeh > wrote: > >> Hi Nicolas, >> >> I removed ~/.iv2 and reran the spark job with the package included (the >> on

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
One way to think of this is --packages is better when you have third > party > > dependency and --jars is better when you have custom in-house built jars. > > > > On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh < > mich.talebza...@gmail.com> > > wrote: > > >

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
wrote: > --jar Adds only that jar > --package adds the Jar and a it's dependencies listed in maven > > On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi, >> >> I have a scenario that I use in Spark submit as fol

Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh
Hi, I have a scenario that I use in Spark submit as follows: spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* As you can see the jar files needed

Re: Count distinct and driver memory

2020-10-19 Thread Mich Talebzadeh
Best to check this in Spark GUI under storage and see what is causing the issue. HTH LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh
damage or destruction. On Sun, 11 Oct 2020 at 20:46, Mich Talebzadeh wrote: > Hi, > > With regard to your statement below > > ".technology choices are agnostic to use cases according to you" > > If I may say, I do not think that was the message implied. What was

Re: How to Scale Streaming Application to Multiple Workers

2020-10-15 Thread Mich Talebzadeh
Hi, This in general depends on how many topics you want to process at the same time and whether this is done on-premise running Spark in cluster mode. Have you looked at Spark GUI to see if one worker (one JVM) is adequate for the task? Also how these small files are read and processed. Is it

Re: The equivalent of Scala mapping in Pyspark

2020-10-15 Thread Mich Talebzadeh
rquet format. If table exists, new rows are appended. Any feedback will be much appreciated (negative or positive so to speak). Thanks, Mich *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on

The equivalent of Scala mapping in Pyspark

2020-10-13 Thread Mich Talebzadeh
Hi, I generate an array of random data and create a DF in Spark scala as follows val end = start + numRows - 1 println (" starting at ID = " + start + " , ending on = " + end ) val usedFunctions = new UsedFunctions *val text = ( start to end ).map(i =>* * (* *

The simplest Syntax for saprk/Scala collect.foreach(println) in Pyspark

2020-10-12 Thread Mich Talebzadeh
Hi In Spark/Scala one can do scala> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) Started at [12/10/2020 22:29:19.19] I believe foreach(println) is a special syntax in this case. I can also do a verbose one

Re: Spark as computing engine vs spark cluster

2020-10-12 Thread Mich Talebzadeh
Hi Santosh, Generally speaking, there are two ways of making a process faster: 1. Do more intelligent work by creating indexes, cubes etc thus reducing the processing time 2. Throw hardware and memory at it using something like Spark multi-cluster with fully managed cloud service

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
s according to you? This is > interesting, really interesting. Perhaps I stand corrected. > > Regards, > Gourav > > On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh > wrote: > >> if we take Spark and its massive parallel processing and in-memory >> cache away, then one c

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
became head of machine learning >>>>>> somewhere else and he loved C and Python. So Python was a gift in >>>>>> disguise. >>>>>> I think Python appeals to those who are very familiar with CLI and shell >>>>>> programming

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Stephen Boesch, wrote: >> >>> I agree with Wim's assessment of data engineering / ETL vs Data >>> Science.I wrote pipelines/frameworks for large companies and scala was >>> a much better choice. But for ad-hoc work interfacing directly with data >>> sc

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh
to Python just for the sake of it. Disclaimer: These are opinions and not facts so to speak :) Cheers, Mich On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh wrote: > I have come across occasions when the teams use Python with Spark for ETL, > for example processing data from S3 b

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
e > so there won't be a big difference between python and scala. > > On Fri, Oct 9, 2020 at 3:57 PM Mich Talebzadeh > wrote: > >> I have come across occasions when the teams use Python with Spark for >> ETL, for example processing data from S3 buckets into Snowflake with

Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
I have come across occasions when the teams use Python with Spark for ETL, for example processing data from S3 buckets into Snowflake with Spark. The only reason I think they are choosing Python as opposed to Scala is because they are more familiar with Python. Since Spark is written in Scala,

Reading BigQuery data from Spark in Google Dataproc

2020-10-05 Thread Mich Talebzadeh
Hi, I have testest few JDBC BigQuery providers like Progress Direct and Simba but none of them seem to work properly through Spark. The only way I can read and write to BigQuery is through Spark BigQuery API using the following scenario spark-shell

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh
ark execution. > It doesn't seem like it helps though - you are just swallowing the cause. > Just let it fly? > > On Fri, Oct 2, 2020 at 9:34 AM Mich Talebzadeh > wrote: > >> As a side question consider the following read JDBC read >> >> >> val lowerBound =

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh
technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 2 Oct 2020 at 05:33, Mich Talebzadeh wrote: > Many thanks Russell. That worked > > val *HiveDF* = Try(spark.read. >

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh
> > option("password", HybridServerPassword). > > load()) match { > > *case Success(validDf) => validDf* > >case Failure(e) => throw new Exception("Error > Encountered reading Hive table") > >

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh
vars and it ends up ambiguous. Just rename > one. > > On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh > wrote: > >> Hi, >> >> >> Spark version 2.3.3 on Google Dataproc >> >> >> I am trying to use databricks to other databases >> >> &

Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh
Hi, Spark version 2.3.3 on Google Dataproc I am trying to use databricks to other databases https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html to read from Hive table on Prem using Spark in Cloud This works OK without a Try enclosure. import spark.implicits._ import

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-28 Thread Mich Talebzadeh
ch may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 27 Aug 2020 at 17:34, wrote: > Mich, > > That's right, referring to you guys. >

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-27 Thread Mich Talebzadeh
> 18.3 jar. > You can ask them to use either full URL or tns alias format URL with > tns_admin path set as either connection property or system property. > > Regards, Kuassi > > On 8/26/20 2:11 PM, Mich Talebzadeh wrote: > > And this is a test using Oracle supplied

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 26 Aug 2020 at 21:58, Mich Talebzadeh wrote: > Hi Kuassi, > > This is the error. Only test running on local mode >

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
y monetary damages arising from such loss, damage or destruction. On Wed, 26 Aug 2020 at 21:09, wrote: > Mich, > > All looks fine. > Perhaps some special chars in username or password? > > it is recommended not to use such characters like '@', '.' in your > password. > &

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
pecial chars in username or password? > > it is recommended not to use such characters like '@', '.' in your > password. > > Best, Kuassi > > On 8/26/20 12:52 PM, Mich Talebzadeh wrote: > > Thanks Kuassi. > > This is the version of jar file that work OK with JDBC connection via

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
> *#javax.net.ssl.keyStorePassword=* > > Alternatively, if you want to use JKS< then you need to comment out the > firts line and un-comment the other lines and set the values. > > Kuassi > On 8/26/20 11:58 AM, Mich Talebzadeh wrote: > > Hi, > > The connectio

Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh
Hi, The connection from Spark to Oracle 12c etc are well established using ojdb6.jar. I am attempting to connect to Oracle Autonomous Data warehouse (ADW) version *Oracle Database 19c Enterprise Edition Release 19.0.0.0.0* Oracle document suggest using ojdbc8.jar to connect to the database

Accessing Teradata DW data from Spark

2020-06-10 Thread Mich Talebzadeh
Using JDBC drivers much like accessing Oracle data, one can utilise the power of Spark on Teradata via JDBC drivers. I have seen connections in some articles which indicates this process is pretty mature. My question is if anyone has done this work and how is performance in Spark vis-a-vis

Re: ETL Using Spark

2020-05-21 Thread Mich Talebzadeh
Ok 1. What information are you fetching from MSSQL. Is this reference data? 2. What information are you processing through Spark via topics? 3. Assuming you are combining data from MSSQL and Spark and enriching it are you posting back to another table in the same database?

Re: Unit testing Spark/Scala code with Mockito

2020-05-20 Thread Mich Talebzadeh
damage or destruction. On Wed, 20 May 2020 at 11:58, Mich Talebzadeh wrote: > Hi, > > I have a spark job that reads an XML file from HDFS, process it and port > data to Hive tables, one good and one exception table > > The Code itself works fine. I need to create Unit Te

Unit testing Spark/Scala code with Mockito

2020-05-20 Thread Mich Talebzadeh
Hi, I have a spark job that reads an XML file from HDFS, process it and port data to Hive tables, one good and one exception table The Code itself works fine. I need to create Unit Test with Mockito for it.. A unit test should test

Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-12 Thread Mich Talebzadeh
Hi Linna, Please provide a background to it and your solution. The assumption is that there is a solution. as suggested. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: AnalysisException - Infer schema for the Parquet path

2020-05-09 Thread Mich Talebzadeh
equest"). load("/tmp/broadcast.xml")) match { case Success(df) => df case Failure(exception) => throw new Exception("foo") } HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAA

Re: Which SQL flavor does Spark SQL follow?

2020-05-06 Thread Mich Talebzadeh
it closely follows Hive sql. from the analytical functions its is similar to Oracle. Anyway if you know good SQL as opposed to Java programmer turned to SQL writer you should be OK. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Hi Brandon. In dealing with df case Failure(e) => throw new Exception("foo") Can one print the Exception message? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
t").load("/tmp/broadcast.xml")) match {case Success(df) => df case Failure(e) => throw new Exception("foo")} df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint ... 6 more fields] regards, Dr Mich Talebzadeh LinkedIn * https

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
quot;, "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match {case Success(df) => df case Failure(e) => throw new Exception("foo")} Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEA

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
uot;sms_request").load("/tmp/broadcast.xml")) Match {case Success(df) => df case Failure(e) => throw new Exception("foo")} ^ :47: error: not found: value Failure val df = Try(spark.read.format("com.databricks.spark.xml").op

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
None | } | } res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, ocis_party_id: bigint ... 6 more fields]) scala> scala> df.printSchema :48: error: not found: value df df.printSchema data frame seems to be lost! Thanks, Dr Mi

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
} . . def xmlFileExists(hdfsDirectory: String): Boolean = { val hadoopConf = new org.apache.hadoop.conf.Configuration() val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf) fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory)) } And checked it. It works. Dr Mich Talebzadeh LinkedI

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Thanks Brandon! i should have remembered that. basically the code gets out with sys.exit(1) if it cannot find the file I guess there is no easy way of validating DF except actioning it by show(1,0) etc and checking if it works? Regards, Dr Mich Talebzadeh LinkedIn * https

Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
StackTrace sys.exit() } Now the issue I have is that what if the xml file /tmp/broadcast.xml does not exist or deleted? I won't be able to catch the error until the hive table is populated. Of course I can write a shell script to check if the file exist before running the job or put small c

Modularising Spark/Scala program

2020-05-02 Thread Mich Talebzadeh
into the main table again using tmp table I was wondering if this is correct approach? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
s parameters val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter(*"*length(target_mobile_no) != broadcastStagingConfig.mobileNoLength OR substring(target_mobile_no,1,1) != broadcastStagingConfig.ukMobileNoStart*"

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
ile_no,1,1) != ${broadcastStagingConfig.ukMobileNoStart}") Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.

Re: Lightbend Scala professional training & certification

2020-04-29 Thread Mich Talebzadeh
I don't think that will be free! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Lightbend Scala professional training & certification

2020-04-29 Thread Mich Talebzadeh
Hi, Has anyone had experience of taking training courses with Lightbend training <https://www.lightbend.com/services/training>on Scala I believe they are offering free Scala courses and certifications. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/v

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
Hi Zhang, Yes the SQL way worked fine val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'") Many thanks, Dr Mich T

Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh
ubstring(col("target_mobile_no"),1,1) !=== "7") ^ :49: error: value || is not a member of Int filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mob

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh
ards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own r

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh
Unfortunately that did not work. any other suggestions? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh
Thanks Neeraj, I'll check it out. ! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh
. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any a

Static and dynamic partition loads in Hive table through Spark

2020-04-26 Thread Mich Talebzadeh
tition* it will belong to will only be known after the row is inserted. Is this assertion correct? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcP

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-19 Thread Mich Talebzadeh
;"" org.apache.spark.sql.catalyst.parser.ParseException: missing STRING at ','(line 2, pos 85) == SQL == INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId = broadcastValue, brand) -^^^ SELECT ocis_party_id AS partyId , target_mobile_no AS pho

Re: Save Spark dataframe as dynamic partitioned table in Hive

2020-04-16 Thread Mich Talebzadeh
la:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) ... 55 elided The thing is that if I replace broadcastId = broadcastValue with broadcastId = " 123456789" it works! Tha

Re: Going it alone.

2020-04-16 Thread Mich Talebzadeh
good for you. right move Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at yo

Re: wot no toggle ?

2020-04-16 Thread Mich Talebzadeh
.* In UK it is an offence to troll and they will cut one through the service provider or prison terms. Whoever does it should take this as a stern warning.. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Going it alone.

2020-04-16 Thread Mich Talebzadeh
I refer you to the answer I gave in similar thread. Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession

How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh
y_id AS partyId , target_mobile_no AS phoneNumber FROM tmp scala> spark.sql($sqltext) :41: error: not found: value $sqltext spark.sql($sqltext) Any ideas? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU

Save Spark dataframe as dynamic partitioned table in Hive

2020-04-15 Thread Mich Talebzadeh
ne 2, pos 85) == SQL == INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId = broadcastValue, brand = dummy) -^^^ SELECT ocis_party_id AS partyId , target_mobile_no AS phoneN

Re: Spark hangs while reading from jdbc - does nothing

2020-04-11 Thread Mich Talebzadeh
D a.sid = b.sid AND a.username is not null --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') --AND CURRENT_DATE - logon_time > 0 --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- exclude me --AND (b.block_gets + b.consistent_gets) > 0 ORDE

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
hospitals in two-three weeks, there is nothing to stop us building something pretty quick, modular, pluggable and self contained. HTH, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thank you for your remarks. Points taken. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disc

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thanks but nobody claimed we can fix it. However, we can all contribute to it. When it utilizes the cloud then it become a global digitization issue. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thanks. Agreed, computers are not the end but means to an end. We all have to start from somewhere. It all helps. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsi

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Mich Talebzadeh
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 17 Feb 2020 at 22:27, Mich Talebzadeh wrote: > I stripped everything from the jar list. This is all I have > > sspark-shell --jars shc-core-1.1.1-2

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
020 at 21:37, Mich Talebzadeh wrote: > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > >

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any a

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
2020 at 20:28, Muthu Jayakumar wrote: > I suspect the spark job is somehow having an incorrect (newer) version of > json4s in the classpath. json4s 3.5.3 is the utmost version that can be > used. > > Thanks, > Muthu > > On Mon, Feb 17, 2020, 06:43 Mich Talebzadeh >

Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh
Hi, Spark version 2.4.3 Hbase 1.2.7 Data is stored in Hbase as Json. example of a row shown below [image: image.png] I am trying to read this table in Spark Scala import org.apache.spark.sql.{SQLContext, _} import org.apache.spark.sql.execution.datasources.hbase._ import

Re: OrderBy Year and Month is not displaying correctly

2020-01-06 Thread Mich Talebzadeh
coming Per Month|outgoing Per Month| ++-+--+--+ |2019|9|13,958.58 |17,920.31 | |2019|10 |10,029.00 |10,067.52 | |2019|11 |4,032.30 |4,225.30 | |2019|12 |742.00|814.49| |

OrderBy Year and Month is not displaying correctly

2020-01-05 Thread Mich Talebzadeh
however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. Any suggestions? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profil

Spark streaming when a node or nodes go down

2019-12-11 Thread Mich Talebzadeh
single node? Is that correct! Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Re: org.apache.spark.util.SparkUncaughtExceptionHandler

2019-10-10 Thread Mich Talebzadeh
with the executor size (typically 6-10%). HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Re: Control Sqoop job from Spark job

2019-09-02 Thread Mich Talebzadeh
= HiveContext.read.format("jdbc").options( Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", "partitionColumn" -> "

Re: Control Sqoop job from Spark job

2019-08-31 Thread Mich Talebzadeh
Spark is an excellent ETL tool to lift data from source and put it in target. Spark uses JDBC connection similar to Sqoop. I don't see the need for Sqoop with Spark here. Where is the source (Oracle MSSQL, etc) and target (Hive?) here HTH Dr Mich Talebzadeh LinkedIn * https

Re: Questions for platform to choose

2019-08-21 Thread Mich Talebzadeh
latency architecture must be treated within that context. Have a look at this article of mine: https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Unable to write data from Spark into a Hive Managed table

2019-08-09 Thread Mich Talebzadeh
, TRIM(transactiontype) , TRIM(description) FROM """ spark.sql(sqltext) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr

Re: Spark SQL reads all leaf directories on a partitioned Hive table

2019-08-08 Thread Mich Talebzadeh
also need others as well using soft link ls -l cd $SPARK_HOME/conf hive-site.xml -> ${HIVE_HOME/conf/hive-site.xml core-site.xml -> ${HADOOP_HOME}/etc/hadoop/core-site.xml hdfs-site.xml -> ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml Dr Mich Talebzadeh LinkedIn * https://www.linkedin.co

Sharing ideas on using Databricks Delta Lake

2019-08-07 Thread Mich Talebzadeh
m sure someone can explain this. Regards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:*

Re: Hive external table not working in sparkSQL when subdirectories are present

2019-08-07 Thread Mich Talebzadeh
Have you updated partition statistics by any chance? I assume you can access the table and data though Hive itself? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

How to read configuration file parameters in Spark without mapping each parameter

2019-08-06 Thread Mich Talebzadeh
Hi, Assume that I have a configuration file as below with static parameters some Strings, Integer and Double: md_AerospikeAerospike { dbHost = "rhes75" dbPort = "3000" dbConnection = "trading_user_RW" namespace = "trading" dbSetRead = "MARKETDATAAEROSPIKEBATCH" dbSetWrite =

Re: Hive external table not working in sparkSQL when subdirectories are present

2019-08-06 Thread Mich Talebzadeh
which versions of Spark and Hive are you using. what will happen if you use parquet tables instead? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Reading configuration file in Spark Scala throws error

2019-08-04 Thread Mich Talebzadeh
t;priceWatch") val op_type = conf.getInt("op_type") val currency = conf.getString("currency") val tickerType = conf.getString("tickerType") val tickerClass = conf.getString("tickerClass") val tickerStatus = conf.getString(

Re: Reading configuration file in Spark Scala throws error

2019-08-04 Thread Mich Talebzadeh
hes75 schemaRegistryURL = http://rhes75:8081 tickerType = short zooKeeperClientPort = 2181 priceWatch = 300 batchInterval = 2 op_type = 1 sparkStreamingReceiverMaxRateValue = 0 dbDatabase = trading Two things please. They are read in in a different order and secondly *the String values are not

Reading configuration file in Spark Scala throws error

2019-08-03 Thread Mich Talebzadeh
quot; bootstrapServers = "rhes75:9092" } I appreciate any hint Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* htt

Re: Spark SaveMode

2019-07-21 Thread Mich Talebzadeh
oadcast(connectionProperties) val saveMode = SaveMode.Append sql(sqltext).write.mode(saveMode).jdbc(_ORACLEserver, _dbschema+"."+_dbtable, connectionProperties) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://

Re: Spark SaveMode

2019-07-20 Thread Mich Talebzadeh
" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", "partitionColumn" -> "ID", "lowerBound" -> minID, "upperBound" -> maxID, "numPartitions&quo

Re: Spark SaveMode

2019-07-20 Thread Mich Talebzadeh
Oracle table into a DF and do a result set with Oracle DF and your DF and insert only those records into Oracle. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh
values var rowkey = row._2.split(',').view(0).split(':').view(1).toString.drop(1).dropRight(1).trim var ticker = row._2.split(',').view(1). split(':').view(1).toString.drop(1).dropRight(1).trim var timeissued = row._2.split(',').view(2). toString.substri

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh
Sure. Do you have an example of a record from Cassandra read into df by any chance? Only columns that need to go into Oracle. df.select('col1, 'col2, 'jsonCol).take(1).foreach(println) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Spark dataset to explode json string

2019-07-19 Thread Mich Talebzadeh
, 'priceInfo.getItem("timeissued").as("timeissued") , 'priceInfo.getItem("price").as("price") , 'priceInfo.getItem("currency").as("currency") , 'operation.getItem("o

Reading JSON RDD in Spark Streaming

2019-06-18 Thread Mich Talebzadeh
t;timeissued":"2019-06-18T22:10:26", "price":555.75}) {"rowkey":"ba7e6bdc-2a92-4dc3-8e28-a75e1a7d58f2" "SBRY" //corrrect "2019-06-18T22 // missing half 555.75} // incorrect Is there any way reading JSON data systematically? Thanks Dr Mic

<    4   5   6   7   8   9   10   11   12   13   >