from:"Mich Talebzadeh"

substitution invocator for a variable in PyCharm sql

2020-12-07 Thread Mich Talebzadeh

In Spark/Scala you can use 's' substitution invocator for a variable in sql call, for example var sqltext = s""" INSERT INTO TABLE ${broadcastStagingConfig.broadcastTable} PARTITION (broadcastId = ${broadcastStagingConfig.broadcastValue},brand) SELECT

Re: In windows 10, accessing Hive from PySpark with PyCharm throws error

2020-12-04 Thread Mich Talebzadeh

path may be different). So add this path to > your PATH environmental variable in your command shell before running > spark-submit again. > > -- ND > On 12/3/20 6:28 PM, Mich Talebzadeh wrote: > > This is becoming serious pain. > > using powershell I am using spark

Re: In windows 10, accessing Hive from PySpark with PyCharm throws error

2020-12-03 Thread Mich Talebzadeh

te: > Apparently this is a OS dynamic lib link error. Make sure you have the > LD_LIBRARY_PATH (in Linux) or PATH (windows) set up properly for the right > .so or .dll file... > On 12/2/20 5:31 PM, Mich Talebzadeh wrote: > > Hi, > > I have a simple code that tries to creat

In windows 10, accessing Hive from PySpark with PyCharm throws error

2020-12-02 Thread Mich Talebzadeh

Hi, I have a simple code that tries to create Hive derby database as follows: from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.types import StringType,

Separating storage from compute layer with Spark and data warehouses offering ML capabilities

2020-11-29 Thread Mich Talebzadeh

This is a generic question with regard to an optimum design. Many Cloud Data Warehouses like Google BigQuery (BQ) or Oracle Autonomous Data Warehouse (ADW), nowadays

Error in PyCharm with PySpark

2020-11-26 Thread Mich Talebzadeh

Hi, I do not know why I am getting this error in Pycharm! if __name__ == "__main__" : contract_json_path = os.path. \ join("../", "../", "conf/contractterms_app.json") default_json_path = os.path.join( "../", "../",

spark-sql on windows throws Exception in thread "main" java.lang.UnsatisfiedLinkError:

2020-11-16 Thread Mich Talebzadeh

Need to create some hive test tables for pyCharm SPARK_HOME is set up as D:\temp\spark-3.0.1-bin-hadoop2.7 HADOOP_HOME is c:\hadoop\ spark-shell works. Trying to run spark-sql, I get the following errors PS C:\tmp\hive> spark-sql log4j:WARN No appenders could be found for logger

Re: PyCharm IDE throws spark error

2020-11-15 Thread Mich Talebzadeh

ri, 13 Nov 2020 at 23:25, Wim Van Leuven wrote: > No Java installed? Or process can but find it? Java-home not set? > > On Fri, 13 Nov 2020 at 23:24, Mich Talebzadeh > wrote: > >> Hi, >> >> This is basically a simple module >> >> from pyspark import

PyCharm IDE throws spark error

2020-11-13 Thread Mich Talebzadeh

Hi, This is basically a simple module from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.sql.types import StringType, ArrayType from pyspark.sql.functions import

Re: Path of jars added to a Spark Job - spark-submit // // Override jars in spark submit

2020-11-12 Thread Mich Talebzadeh

As I understand Spark expects the jar files to be available on all nodes or if applicable on HDFS directory Putting Spark Jar files on HDFS In Yarn mode, *it is important that Spark jar files are available throughout the Spark cluster*. I have spent a fair bit of time on this and I recommend

Creating hive table through df.write.mode("overwrite").saveAsTable("DB.TABLE")

2020-11-10 Thread Mich Talebzadeh

Hi, In Spark I specifically specify the format of the table to be created sqltext = """ CREATE TABLE test.randomDataPy( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC VARCHAR(50) , PADDING VARCHAR(4000)

Re: repartition in Spark

2020-11-09 Thread Mich Talebzadeh

As a generic answer in a distributed environment like spark, making sure that data is distributed evenly among all nodes (assuming every node is the same or similar) can help performance repartition thus controls the data distribution among all nodes. However, it is not that straight forward.

Re: Custom JdbcConnectionProvider

2020-10-28 Thread Mich Talebzadeh

I think you can pickup your custom build driver from the command line itself Here I am using a custom build third-party driver to access Oracle Table on-premisses from cloud val jdbUrl = "jdbc:datadirect:ddhybrid://"+HybridServer+":"+HybridPort+";hybridDataPipelineDataSource="+

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Mich Talebzadeh

ut agree with Sean. That is mostly not true. > > In your previous posts you also mentioned this . The only reason we > sometimes have to bail out to Scala is for performance with certain udfs > > On Thu, 22 Oct 2020 at 23:11, Mich Talebzadeh > wrote: > >> Thanks for the

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh

Thanks for the feedback Sean. Kind regards, Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all

Re: Spark hive build and connectivity

2020-10-22 Thread Mich Talebzadeh

see that whenever i build spark with hive support (-Phive > -Phive-thriftserver) , it gets built with hive 2.3.7 jars. So , will it be > ok if i access tables created using my hive 3.2.1 cluster ? > - Do i have to add hive 3.2.1 jars to spark's (SPARK_DIST_CLASSPATH) ? > > >

Re: Spark hive build and connectivity

2020-10-22 Thread Mich Talebzadeh

Hi Ravi, What exactly are you trying to do? You want to enhance Spark SQl or you want to run Hive on Spark engine? HTH LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh

. On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh wrote: > I have come across occasions when the teams use Python with Spark for ETL, > for example processing data from S3 buckets into Snowflake with Spark. > > The only reason I think they are choosing Python as opposed to Scala

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Mich Talebzadeh

How about PySpark? What process can that go through to not depend on external repo access in production LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Why spark-submit works with package not with jar

2020-10-21 Thread Mich Talebzadeh

e internet or even the internal > proxying artefect repository. > > Also, wasn't uberjars an antipattern? For some reason I don't like them... > > Kind regards > -wim > > > > On Wed, 21 Oct 2020 at 01:06, Mich Talebzadeh > wrote: > >> Thanks again all. >> &g

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

s no 100% guarantee that conflicting dependencies are resolved in a > way that works in every single case, which you run into sometimes when > using incompatible libraries, but yes this is the point of --packages and > Ivy. > > On Tue, Oct 20, 2020 at 4:43 PM Mich Talebzadeh > wrot

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

at 22:43, Mich Talebzadeh wrote: > Thanks again all. > > Hi Sean, > > As I understood from your statement, you are suggesting just use > --packages without worrying about individual jar dependencies? > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

Ivy resolution figure it out. It is not true that everything in .ivy2 is > on the classpath. > > On Tue, Oct 20, 2020 at 3:48 PM Mich Talebzadeh > wrote: > >> Hi Nicolas, >> >> I removed ~/.iv2 and reran the spark job with the package included (the >> on

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

One way to think of this is --packages is better when you have third > party > > dependency and --jars is better when you have custom in-house built jars. > > > > On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh < > mich.talebza...@gmail.com> > > wrote: > > >

Re: Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

wrote: > --jar Adds only that jar > --package adds the Jar and a it's dependencies listed in maven > > On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi, >> >> I have a scenario that I use in Spark submit as fol

Why spark-submit works with package not with jar

2020-10-20 Thread Mich Talebzadeh

Hi, I have a scenario that I use in Spark submit as follows: spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar, */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar* As you can see the jar files needed

Re: Count distinct and driver memory

2020-10-19 Thread Mich Talebzadeh

Best to check this in Spark GUI under storage and see what is causing the issue. HTH LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh

damage or destruction. On Sun, 11 Oct 2020 at 20:46, Mich Talebzadeh wrote: > Hi, > > With regard to your statement below > > ".technology choices are agnostic to use cases according to you" > > If I may say, I do not think that was the message implied. What was

Re: How to Scale Streaming Application to Multiple Workers

2020-10-15 Thread Mich Talebzadeh

Hi, This in general depends on how many topics you want to process at the same time and whether this is done on-premise running Spark in cluster mode. Have you looked at Spark GUI to see if one worker (one JVM) is adequate for the task? Also how these small files are read and processed. Is it

Re: The equivalent of Scala mapping in Pyspark

2020-10-15 Thread Mich Talebzadeh

rquet format. If table exists, new rows are appended. Any feedback will be much appreciated (negative or positive so to speak). Thanks, Mich *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on

The equivalent of Scala mapping in Pyspark

2020-10-13 Thread Mich Talebzadeh

Hi, I generate an array of random data and create a DF in Spark scala as follows val end = start + numRows - 1 println (" starting at ID = " + start + " , ending on = " + end ) val usedFunctions = new UsedFunctions *val text = ( start to end ).map(i =>* * (* *

The simplest Syntax for saprk/Scala collect.foreach(println) in Pyspark

2020-10-12 Thread Mich Talebzadeh

Hi In Spark/Scala one can do scala> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println) Started at [12/10/2020 22:29:19.19] I believe foreach(println) is a special syntax in this case. I can also do a verbose one

Re: Spark as computing engine vs spark cluster

2020-10-12 Thread Mich Talebzadeh

Hi Santosh, Generally speaking, there are two ways of making a process faster: 1. Do more intelligent work by creating indexes, cubes etc thus reducing the processing time 2. Throw hardware and memory at it using something like Spark multi-cluster with fully managed cloud service

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh

s according to you? This is > interesting, really interesting. Perhaps I stand corrected. > > Regards, > Gourav > > On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh > wrote: > >> if we take Spark and its massive parallel processing and in-memory >> cache away, then one c

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh

became head of machine learning >>>>>> somewhere else and he loved C and Python. So Python was a gift in >>>>>> disguise. >>>>>> I think Python appeals to those who are very familiar with CLI and shell >>>>>> programming

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh

Stephen Boesch, wrote: >> >>> I agree with Wim's assessment of data engineering / ETL vs Data >>> Science.I wrote pipelines/frameworks for large companies and scala was >>> a much better choice. But for ad-hoc work interfacing directly with data >>> sc

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh

to Python just for the sake of it. Disclaimer: These are opinions and not facts so to speak :) Cheers, Mich On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh wrote: > I have come across occasions when the teams use Python with Spark for ETL, > for example processing data from S3 b

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh

e > so there won't be a big difference between python and scala. > > On Fri, Oct 9, 2020 at 3:57 PM Mich Talebzadeh > wrote: > >> I have come across occasions when the teams use Python with Spark for >> ETL, for example processing data from S3 buckets into Snowflake with

Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh

I have come across occasions when the teams use Python with Spark for ETL, for example processing data from S3 buckets into Snowflake with Spark. The only reason I think they are choosing Python as opposed to Scala is because they are more familiar with Python. Since Spark is written in Scala,

Reading BigQuery data from Spark in Google Dataproc

2020-10-05 Thread Mich Talebzadeh

Hi, I have testest few JDBC BigQuery providers like Progress Direct and Simba but none of them seem to work properly through Spark. The only way I can read and write to BigQuery is through Spark BigQuery API using the following scenario spark-shell

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh

ark execution. > It doesn't seem like it helps though - you are just swallowing the cause. > Just let it fly? > > On Fri, Oct 2, 2020 at 9:34 AM Mich Talebzadeh > wrote: > >> As a side question consider the following read JDBC read >> >> >> val lowerBound =

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh

technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 2 Oct 2020 at 05:33, Mich Talebzadeh wrote: > Many thanks Russell. That worked > > val *HiveDF* = Try(spark.read. >

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh

> > option("password", HybridServerPassword). > > load()) match { > > *case Success(validDf) => validDf* > >case Failure(e) => throw new Exception("Error > Encountered reading Hive table") > >

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh

vars and it ends up ambiguous. Just rename > one. > > On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh > wrote: > >> Hi, >> >> >> Spark version 2.3.3 on Google Dataproc >> >> >> I am trying to use databricks to other databases >> >> &

Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh

Hi, Spark version 2.3.3 on Google Dataproc I am trying to use databricks to other databases https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html to read from Hive table on Prem using Spark in Cloud This works OK without a Try enclosure. import spark.implicits._ import

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-28 Thread Mich Talebzadeh

ch may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 27 Aug 2020 at 17:34, wrote: > Mich, > > That's right, referring to you guys. >

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-27 Thread Mich Talebzadeh

> 18.3 jar. > You can ask them to use either full URL or tns alias format URL with > tns_admin path set as either connection property or system property. > > Regards, Kuassi > > On 8/26/20 2:11 PM, Mich Talebzadeh wrote: > > And this is a test using Oracle supplied

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh

's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 26 Aug 2020 at 21:58, Mich Talebzadeh wrote: > Hi Kuassi, > > This is the error. Only test running on local mode >

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh

y monetary damages arising from such loss, damage or destruction. On Wed, 26 Aug 2020 at 21:09, wrote: > Mich, > > All looks fine. > Perhaps some special chars in username or password? > > it is recommended not to use such characters like '@', '.' in your > password. > &

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh

pecial chars in username or password? > > it is recommended not to use such characters like '@', '.' in your > password. > > Best, Kuassi > > On 8/26/20 12:52 PM, Mich Talebzadeh wrote: > > Thanks Kuassi. > > This is the version of jar file that work OK with JDBC connection via

Re: Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh

> *#javax.net.ssl.keyStorePassword=* > > Alternatively, if you want to use JKS< then you need to comment out the > firts line and un-comment the other lines and set the values. > > Kuassi > On 8/26/20 11:58 AM, Mich Talebzadeh wrote: > > Hi, > > The connectio

Connecting to Oracle Autonomous Data warehouse (ADW) from Spark via JDBC

2020-08-26 Thread Mich Talebzadeh

Hi, The connection from Spark to Oracle 12c etc are well established using ojdb6.jar. I am attempting to connect to Oracle Autonomous Data warehouse (ADW) version *Oracle Database 19c Enterprise Edition Release 19.0.0.0.0* Oracle document suggest using ojdbc8.jar to connect to the database

Accessing Teradata DW data from Spark

2020-06-10 Thread Mich Talebzadeh

Using JDBC drivers much like accessing Oracle data, one can utilise the power of Spark on Teradata via JDBC drivers. I have seen connections in some articles which indicates this process is pretty mature. My question is if anyone has done this work and how is performance in Spark vis-a-vis

Re: ETL Using Spark

2020-05-21 Thread Mich Talebzadeh

Ok 1. What information are you fetching from MSSQL. Is this reference data? 2. What information are you processing through Spark via topics? 3. Assuming you are combining data from MSSQL and Spark and enriching it are you posting back to another table in the same database?

Re: Unit testing Spark/Scala code with Mockito

2020-05-20 Thread Mich Talebzadeh

damage or destruction. On Wed, 20 May 2020 at 11:58, Mich Talebzadeh wrote: > Hi, > > I have a spark job that reads an XML file from HDFS, process it and port > data to Hive tables, one good and one exception table > > The Code itself works fine. I need to create Unit Te

Unit testing Spark/Scala code with Mockito

2020-05-20 Thread Mich Talebzadeh

Hi, I have a spark job that reads an XML file from HDFS, process it and port data to Hive tables, one good and one exception table The Code itself works fine. I need to create Unit Test with Mockito for it.. A unit test should test

Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-12 Thread Mich Talebzadeh

Hi Linna, Please provide a background to it and your solution. The assumption is that there is a solution. as suggested. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: AnalysisException - Infer schema for the Parquet path

2020-05-09 Thread Mich Talebzadeh

equest"). load("/tmp/broadcast.xml")) match { case Success(df) => df case Failure(exception) => throw new Exception("foo") } HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAA

Re: Which SQL flavor does Spark SQL follow?

2020-05-06 Thread Mich Talebzadeh

it closely follows Hive sql. from the analytical functions its is similar to Oracle. Anyway if you know good SQL as opposed to Java programmer turned to SQL writer you should be OK. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

Hi Brandon. In dealing with df case Failure(e) => throw new Exception("foo") Can one print the Exception message? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

t").load("/tmp/broadcast.xml")) match {case Success(df) => df case Failure(e) => throw new Exception("foo")} df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint ... 6 more fields] regards, Dr Mich Talebzadeh LinkedIn * https

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

quot;, "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match {case Success(df) => df case Failure(e) => throw new Exception("foo")} Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEA

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

uot;sms_request").load("/tmp/broadcast.xml")) Match {case Success(df) => df case Failure(e) => throw new Exception("foo")} ^ :47: error: not found: value Failure val df = Try(spark.read.format("com.databricks.spark.xml").op

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

None | } | } res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, ocis_party_id: bigint ... 6 more fields]) scala> scala> df.printSchema :48: error: not found: value df df.printSchema data frame seems to be lost! Thanks, Dr Mi

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

} . . def xmlFileExists(hdfsDirectory: String): Boolean = { val hadoopConf = new org.apache.hadoop.conf.Configuration() val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf) fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory)) } And checked it. It works. Dr Mich Talebzadeh LinkedI

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

Thanks Brandon! i should have remembered that. basically the code gets out with sys.exit(1) if it cannot find the file I guess there is no easy way of validating DF except actioning it by show(1,0) etc and checking if it works? Regards, Dr Mich Talebzadeh LinkedIn * https

Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

StackTrace sys.exit() } Now the issue I have is that what if the xml file /tmp/broadcast.xml does not exist or deleted? I won't be able to catch the error until the hive table is populated. Of course I can write a shell script to check if the file exist before running the job or put small c

Modularising Spark/Scala program

2020-05-02 Thread Mich Talebzadeh

into the main table again using tmp table I was wondering if this is correct approach? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh

s parameters val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter(*"*length(target_mobile_no) != broadcastStagingConfig.mobileNoLength OR substring(target_mobile_no,1,1) != broadcastStagingConfig.ukMobileNoStart*"

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh

ile_no,1,1) != ${broadcastStagingConfig.ukMobileNoStart}") Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.

Re: Lightbend Scala professional training & certification

2020-04-29 Thread Mich Talebzadeh

I don't think that will be free! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Lightbend Scala professional training & certification

2020-04-29 Thread Mich Talebzadeh

Hi, Has anyone had experience of taking training courses with Lightbend training <https://www.lightbend.com/services/training>on Scala I believe they are offering free Scala courses and certifications. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/v

Re: Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh

Hi Zhang, Yes the SQL way worked fine val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)). filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1) != '7'") Many thanks, Dr Mich T

Filtering on multiple columns in spark

2020-04-29 Thread Mich Talebzadeh

ubstring(col("target_mobile_no"),1,1) !=== "7") ^ :49: error: value || is not a member of Int filter(length(col("target_mobile_no")) !=== 10 || substring(col("target_mob

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh

ards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own r

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh

Unfortunately that did not work. any other suggestions? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh

Thanks Neeraj, I'll check it out. ! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer

Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Mich Talebzadeh

. Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any a

Static and dynamic partition loads in Hive table through Spark

2020-04-26 Thread Mich Talebzadeh

tition* it will belong to will only be known after the row is inserted. Is this assertion correct? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcP

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-19 Thread Mich Talebzadeh

;"" org.apache.spark.sql.catalyst.parser.ParseException: missing STRING at ','(line 2, pos 85) == SQL == INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId = broadcastValue, brand) -^^^ SELECT ocis_party_id AS partyId , target_mobile_no AS pho

Re: Save Spark dataframe as dynamic partitioned table in Hive

2020-04-16 Thread Mich Talebzadeh

la:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) ... 55 elided The thing is that if I replace broadcastId = broadcastValue with broadcastId = " 123456789" it works! Tha

Re: Going it alone.

2020-04-16 Thread Mich Talebzadeh

good for you. right move Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at yo

Re: wot no toggle ?

2020-04-16 Thread Mich Talebzadeh

.* In UK it is an offence to troll and they will cut one through the service provider or prison terms. Whoever does it should take this as a stern warning.. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: Going it alone.

2020-04-16 Thread Mich Talebzadeh

I refer you to the answer I gave in similar thread. Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh

at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession

How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh

y_id AS partyId , target_mobile_no AS phoneNumber FROM tmp scala> spark.sql($sqltext) :41: error: not found: value $sqltext spark.sql($sqltext) Any ideas? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU

Save Spark dataframe as dynamic partitioned table in Hive

2020-04-15 Thread Mich Talebzadeh

ne 2, pos 85) == SQL == INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId = broadcastValue, brand = dummy) -^^^ SELECT ocis_party_id AS partyId , target_mobile_no AS phoneN

Re: Spark hangs while reading from jdbc - does nothing

2020-04-11 Thread Mich Talebzadeh

D a.sid = b.sid AND a.username is not null --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') --AND CURRENT_DATE - logon_time > 0 --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- exclude me --AND (b.block_gets + b.consistent_gets) > 0 ORDE

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh

hospitals in two-three weeks, there is nothing to stop us building something pretty quick, modular, pluggable and self contained. HTH, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh

Thank you for your remarks. Points taken. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disc

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh

Thanks but nobody claimed we can fix it. However, we can all contribute to it. When it utilizes the cloud then it become a global digitization issue. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh

Thanks. Agreed, computers are not the end but means to an end. We all have to start from somewhere. It all helps. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view

can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh

Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsi

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Mich Talebzadeh

is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 17 Feb 2020 at 22:27, Mich Talebzadeh wrote: > I stripped everything from the jar list. This is all I have > > sspark-shell --jars shc-core-1.1.1-2

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh

020 at 21:37, Mich Talebzadeh wrote: > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > >

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh

Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any a

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh

2020 at 20:28, Muthu Jayakumar wrote: > I suspect the spark job is somehow having an incorrect (newer) version of > json4s in the classpath. json4s 3.5.3 is the utmost version that can be > used. > > Thanks, > Muthu > > On Mon, Feb 17, 2020, 06:43 Mich Talebzadeh >

Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-17 Thread Mich Talebzadeh

Hi, Spark version 2.4.3 Hbase 1.2.7 Data is stored in Hbase as Json. example of a row shown below [image: image.png] I am trying to read this table in Spark Scala import org.apache.spark.sql.{SQLContext, _} import org.apache.spark.sql.execution.datasources.hbase._ import

Re: OrderBy Year and Month is not displaying correctly

2020-01-06 Thread Mich Talebzadeh

coming Per Month|outgoing Per Month| ++-+--+--+ |2019|9|13,958.58 |17,920.31 | |2019|10 |10,029.00 |10,067.52 | |2019|11 |4,032.30 |4,225.30 | |2019|12 |742.00|814.49| |

OrderBy Year and Month is not displaying correctly

2020-01-05 Thread Mich Talebzadeh

however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. Any suggestions? Thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profil

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 2083 matches

Mail list logo