subject:"Re\: Exception handling in Spark"

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh

Thanks Sean. I guess I was being pedantic. In any case if the source table
does not exist as spark.read is a collection, then it is going to fall over
one way or another!




On Fri, 2 Oct 2020 at 15:55, Sean Owen  wrote:

> It would be quite trivial. None of that affects any of the Spark execution.
> It doesn't seem like it helps though - you are just swallowing the cause.
> Just let it fly?
>
> On Fri, Oct 2, 2020 at 9:34 AM Mich Talebzadeh 
> wrote:
>
>> As a side question consider the following read JDBC read
>>
>>
>> val lowerBound = 1L
>>
>> val upperBound = 100L
>>
>> val numPartitions = 10
>>
>> val partitionColumn = "id"
>>
>>
>> val HiveDF = Try(spark.read.
>>
>> format("jdbc").
>>
>> option("url", jdbcUrl).
>>
>> option("driver", HybridServerDriverName).
>>
>> option("dbtable", HiveSchema+"."+HiveTable).
>>
>> option("user", HybridServerUserName).
>>
>> option("password", HybridServerPassword).
>>
>> option("partitionColumn", partitionColumn).
>>
>> option("lowerBound", lowerBound).
>>
>> option("upperBound", upperBound).
>>
>> option("numPartitions", numPartitions).
>>
>> load()) match {
>>
>>case Success(df) => df
>>
>>case Failure(e) => throw new Exception("Error
>> Encountered reading Hive table")
>>
>>  }
>>
>> Are there any performance implications of having Try, Success, Failure
>> enclosure around DF?
>>
>>>

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Sean Owen

It would be quite trivial. None of that affects any of the Spark execution.
It doesn't seem like it helps though - you are just swallowing the cause.
Just let it fly?

On Fri, Oct 2, 2020 at 9:34 AM Mich Talebzadeh 
wrote:

> As a side question consider the following read JDBC read
>
>
> val lowerBound = 1L
>
> val upperBound = 100L
>
> val numPartitions = 10
>
> val partitionColumn = "id"
>
>
> val HiveDF = Try(spark.read.
>
> format("jdbc").
>
> option("url", jdbcUrl).
>
> option("driver", HybridServerDriverName).
>
> option("dbtable", HiveSchema+"."+HiveTable).
>
> option("user", HybridServerUserName).
>
> option("password", HybridServerPassword).
>
> option("partitionColumn", partitionColumn).
>
> option("lowerBound", lowerBound).
>
> option("upperBound", upperBound).
>
> option("numPartitions", numPartitions).
>
> load()) match {
>
>case Success(df) => df
>
>case Failure(e) => throw new Exception("Error
> Encountered reading Hive table")
>
>  }
>
> Are there any performance implications of having Try, Success, Failure
> enclosure around DF?
>
>>

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-02 Thread Mich Talebzadeh

As a side question consider the following read JDBC read


val lowerBound = 1L

val upperBound = 100L

val numPartitions = 10

val partitionColumn = "id"


val HiveDF = Try(spark.read.

format("jdbc").

option("url", jdbcUrl).

option("driver", HybridServerDriverName).

option("dbtable", HiveSchema+"."+HiveTable).

option("user", HybridServerUserName).

option("password", HybridServerPassword).

option("partitionColumn", partitionColumn).

option("lowerBound", lowerBound).

option("upperBound", upperBound).

option("numPartitions", numPartitions).

load()) match {

   case Success(df) => df

   case Failure(e) => throw new Exception("Error
Encountered reading Hive table")

 }

Are there any performance implications of having Try, Success, Failure
enclosure around DF?

Thanks

 Mich


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 2 Oct 2020 at 05:33, Mich Talebzadeh 
wrote:

> Many thanks Russell. That worked
>
> val *HiveDF* = Try(spark.read.
>  format("jdbc").
>  option("url", jdbcUrl).
>  option("dbtable", HiveSchema+"."+HiveTable).
>  option("user", HybridServerUserName).
>  option("password", HybridServerPassword).
>  load()) match {
>*case Success(df) => df*
>case Failure(e) => throw new Exception("Error
> Encountered reading Hive table")
>  }
>
> HiveDF: org.apache.spark.sql.DataFrame = [id: int, clustered: int ... 5
> more fields]
>
> Appreciated your help Sean and Russell
>
>
> Mich
>
> On Fri, 2 Oct 2020 at 01:22, Russell Spitzer 
> wrote:
>
>> You can't use df as the name of the return from the try and the name of
>> the match variable in success. You also probably want to match the name of
>> the variable in the match with the return from the match.
>>
>> So
>>
>> val df = Try(spark.read.
>>
>>  format("jdbc").
>>
>>  option("url", jdbcUrl).
>>
>>  option("dbtable", HiveSchema+"."+HiveTable).
>>
>>  option("user", HybridServerUserName).
>>
>>  option("password", HybridServerPassword).
>>
>>  load()) match {
>>
>>*case Success(validDf) => validDf*
>>
>>case Failure(e) => throw new Exception("Error
>> Encountered reading Hive table")
>>
>>  }
>>
>> On Thu, Oct 1, 2020 at 5:53 PM Mich Talebzadeh 
>> wrote:
>>
>>>
>>> Many thanks SEan.
>>>
>>>
>>> Maybe I misunderstood your point?
>>>
>>>
>>> var DF = Try(spark.read.
>>>
>>>  format("jdbc").
>>>
>>>  option("url", jdbcUrl).
>>>
>>>  option("dbtable", HiveSchema+"."+HiveTable).
>>>
>>>  option("user", HybridServerUserName).
>>>
>>>  option("password", HybridServerPassword).
>>>
>>>  load()) match {
>>>
>>>*case Success(DF) => HiveDF*
>>>
>>>case Failure(e) => throw new Exception("Error
>>> Encountered reading Hive table")
>>>
>>>  }
>>>
>>> Still getting the error
>>>
>>>
>>> :74: error: recursive method DF needs type
>>>
>>>   case Success(DF) => HiveDF
>>>
>>> Do I need to define DF as DataFrame beforehand because at that moment
>>> Spark does not know what DF type is
>>>
>>> Thanks again
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 1 Oct 2020 at 23:08, Sean Owen  wrote:
>>>
 You are reusing HiveDF for two vars and it ends up ambiguous. Just
 rename one.

 On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh 
 wrote:

> Hi,
>
>
> Spark version 2.3.3 on Google Dataproc
>
>
> I am trying to use databricks to other databases
>
>
> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>
>
>  to read from Hive table on Prem using Spark in Cloud
>
>
> This works OK without a Try enclosure.
>
>
> import spark.implicits._
>
> import scala.util.{Try, Success, Failure}
>
> val HiveDF = Try(spark.read.
>
>  format("jdbc").
>
>  option("url", jdbcUrl).
>
>  option("dbtable", HiveSchema+"."+HiveTable).
>
>  option("user", HybridServerUserName).
>
>  option("password", HybridServerPassword).
>
>  load()) match {
>
>case Success(HiveDF) => HiveDF
>
>case Failure(e) => throw new

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh

Many thanks Russell. That worked

val *HiveDF* = Try(spark.read.
 format("jdbc").
 option("url", jdbcUrl).
 option("dbtable", HiveSchema+"."+HiveTable).
 option("user", HybridServerUserName).
 option("password", HybridServerPassword).
 load()) match {
   *case Success(df) => df*
   case Failure(e) => throw new Exception("Error
Encountered reading Hive table")
 }

HiveDF: org.apache.spark.sql.DataFrame = [id: int, clustered: int ... 5
more fields]

Appreciated your help Sean and Russell


Mich

On Fri, 2 Oct 2020 at 01:22, Russell Spitzer 
wrote:

> You can't use df as the name of the return from the try and the name of
> the match variable in success. You also probably want to match the name of
> the variable in the match with the return from the match.
>
> So
>
> val df = Try(spark.read.
>
>  format("jdbc").
>
>  option("url", jdbcUrl).
>
>  option("dbtable", HiveSchema+"."+HiveTable).
>
>  option("user", HybridServerUserName).
>
>  option("password", HybridServerPassword).
>
>  load()) match {
>
>*case Success(validDf) => validDf*
>
>case Failure(e) => throw new Exception("Error
> Encountered reading Hive table")
>
>  }
>
> On Thu, Oct 1, 2020 at 5:53 PM Mich Talebzadeh 
> wrote:
>
>>
>> Many thanks SEan.
>>
>>
>> Maybe I misunderstood your point?
>>
>>
>> var DF = Try(spark.read.
>>
>>  format("jdbc").
>>
>>  option("url", jdbcUrl).
>>
>>  option("dbtable", HiveSchema+"."+HiveTable).
>>
>>  option("user", HybridServerUserName).
>>
>>  option("password", HybridServerPassword).
>>
>>  load()) match {
>>
>>*case Success(DF) => HiveDF*
>>
>>case Failure(e) => throw new Exception("Error
>> Encountered reading Hive table")
>>
>>  }
>>
>> Still getting the error
>>
>>
>> :74: error: recursive method DF needs type
>>
>>   case Success(DF) => HiveDF
>>
>> Do I need to define DF as DataFrame beforehand because at that moment
>> Spark does not know what DF type is
>>
>> Thanks again
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 1 Oct 2020 at 23:08, Sean Owen  wrote:
>>
>>> You are reusing HiveDF for two vars and it ends up ambiguous. Just
>>> rename one.
>>>
>>> On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh 
>>> wrote:
>>>
 Hi,


 Spark version 2.3.3 on Google Dataproc


 I am trying to use databricks to other databases


 https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html


  to read from Hive table on Prem using Spark in Cloud


 This works OK without a Try enclosure.


 import spark.implicits._

 import scala.util.{Try, Success, Failure}

 val HiveDF = Try(spark.read.

  format("jdbc").

  option("url", jdbcUrl).

  option("dbtable", HiveSchema+"."+HiveTable).

  option("user", HybridServerUserName).

  option("password", HybridServerPassword).

  load()) match {

case Success(HiveDF) => HiveDF

case Failure(e) => throw new Exception("Error
 Encountered reading Hive table")

  }

 However, with Try I am getting the following error


 :66: error: recursive value HiveDF needs type

   case Success(HiveDF) => HiveDF

 Wondering what is causing this. I have used it before (say reading from
 an XML file) and it worked the,

 Thanks





 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.



>>>

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Russell Spitzer

You can't use df as the name of the return from the try and the name of the
match variable in success. You also probably want to match the name of the
variable in the match with the return from the match.

So

val df = Try(spark.read.

 format("jdbc").

 option("url", jdbcUrl).

 option("dbtable", HiveSchema+"."+HiveTable).

 option("user", HybridServerUserName).

 option("password", HybridServerPassword).

 load()) match {

   *case Success(validDf) => validDf*

   case Failure(e) => throw new Exception("Error
Encountered reading Hive table")

 }

On Thu, Oct 1, 2020 at 5:53 PM Mich Talebzadeh 
wrote:

>
> Many thanks SEan.
>
>
> Maybe I misunderstood your point?
>
>
> var DF = Try(spark.read.
>
>  format("jdbc").
>
>  option("url", jdbcUrl).
>
>  option("dbtable", HiveSchema+"."+HiveTable).
>
>  option("user", HybridServerUserName).
>
>  option("password", HybridServerPassword).
>
>  load()) match {
>
>*case Success(DF) => HiveDF*
>
>case Failure(e) => throw new Exception("Error
> Encountered reading Hive table")
>
>  }
>
> Still getting the error
>
>
> :74: error: recursive method DF needs type
>
>   case Success(DF) => HiveDF
>
> Do I need to define DF as DataFrame beforehand because at that moment
> Spark does not know what DF type is
>
> Thanks again
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 1 Oct 2020 at 23:08, Sean Owen  wrote:
>
>> You are reusing HiveDF for two vars and it ends up ambiguous. Just rename
>> one.
>>
>> On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> Spark version 2.3.3 on Google Dataproc
>>>
>>>
>>> I am trying to use databricks to other databases
>>>
>>>
>>> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>>>
>>>
>>>  to read from Hive table on Prem using Spark in Cloud
>>>
>>>
>>> This works OK without a Try enclosure.
>>>
>>>
>>> import spark.implicits._
>>>
>>> import scala.util.{Try, Success, Failure}
>>>
>>> val HiveDF = Try(spark.read.
>>>
>>>  format("jdbc").
>>>
>>>  option("url", jdbcUrl).
>>>
>>>  option("dbtable", HiveSchema+"."+HiveTable).
>>>
>>>  option("user", HybridServerUserName).
>>>
>>>  option("password", HybridServerPassword).
>>>
>>>  load()) match {
>>>
>>>case Success(HiveDF) => HiveDF
>>>
>>>case Failure(e) => throw new Exception("Error
>>> Encountered reading Hive table")
>>>
>>>  }
>>>
>>> However, with Try I am getting the following error
>>>
>>>
>>> :66: error: recursive value HiveDF needs type
>>>
>>>   case Success(HiveDF) => HiveDF
>>>
>>> Wondering what is causing this. I have used it before (say reading from
>>> an XML file) and it worked the,
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Mich Talebzadeh

Many thanks SEan.


Maybe I misunderstood your point?


var DF = Try(spark.read.

 format("jdbc").

 option("url", jdbcUrl).

 option("dbtable", HiveSchema+"."+HiveTable).

 option("user", HybridServerUserName).

 option("password", HybridServerPassword).

 load()) match {

   *case Success(DF) => HiveDF*

   case Failure(e) => throw new Exception("Error
Encountered reading Hive table")

 }

Still getting the error


:74: error: recursive method DF needs type

  case Success(DF) => HiveDF

Do I need to define DF as DataFrame beforehand because at that moment Spark
does not know what DF type is

Thanks again


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Oct 2020 at 23:08, Sean Owen  wrote:

> You are reusing HiveDF for two vars and it ends up ambiguous. Just rename
> one.
>
> On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>>
>> Spark version 2.3.3 on Google Dataproc
>>
>>
>> I am trying to use databricks to other databases
>>
>>
>> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>>
>>
>>  to read from Hive table on Prem using Spark in Cloud
>>
>>
>> This works OK without a Try enclosure.
>>
>>
>> import spark.implicits._
>>
>> import scala.util.{Try, Success, Failure}
>>
>> val HiveDF = Try(spark.read.
>>
>>  format("jdbc").
>>
>>  option("url", jdbcUrl).
>>
>>  option("dbtable", HiveSchema+"."+HiveTable).
>>
>>  option("user", HybridServerUserName).
>>
>>  option("password", HybridServerPassword).
>>
>>  load()) match {
>>
>>case Success(HiveDF) => HiveDF
>>
>>case Failure(e) => throw new Exception("Error
>> Encountered reading Hive table")
>>
>>  }
>>
>> However, with Try I am getting the following error
>>
>>
>> :66: error: recursive value HiveDF needs type
>>
>>   case Success(HiveDF) => HiveDF
>>
>> Wondering what is causing this. I have used it before (say reading from
>> an XML file) and it worked the,
>>
>> Thanks
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Re: Exception handling in Spark throws recursive value for DF needs type error

2020-10-01 Thread Sean Owen

You are reusing HiveDF for two vars and it ends up ambiguous. Just rename
one.

On Thu, Oct 1, 2020, 5:02 PM Mich Talebzadeh 
wrote:

> Hi,
>
>
> Spark version 2.3.3 on Google Dataproc
>
>
> I am trying to use databricks to other databases
>
>
> https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
>
>
>  to read from Hive table on Prem using Spark in Cloud
>
>
> This works OK without a Try enclosure.
>
>
> import spark.implicits._
>
> import scala.util.{Try, Success, Failure}
>
> val HiveDF = Try(spark.read.
>
>  format("jdbc").
>
>  option("url", jdbcUrl).
>
>  option("dbtable", HiveSchema+"."+HiveTable).
>
>  option("user", HybridServerUserName).
>
>  option("password", HybridServerPassword).
>
>  load()) match {
>
>case Success(HiveDF) => HiveDF
>
>case Failure(e) => throw new Exception("Error
> Encountered reading Hive table")
>
>  }
>
> However, with Try I am getting the following error
>
>
> :66: error: recursive value HiveDF needs type
>
>   case Success(HiveDF) => HiveDF
>
> Wondering what is causing this. I have used it before (say reading from an
> XML file) and it worked the,
>
> Thanks
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Sure, just do case Failure(e) => throw e

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:36 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

Hi Brandon.

In dealing with 

df case Failure(e) => throw new Exception("foo") 

Can one print the Exception message?

Thanks

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:15, Mich Talebzadeh  wrote:

OK looking promising thanks

scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint ... 
6 more fields]

regards,

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:

Match needs to be lower case “match”

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:13 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

scala> import scala.util.{Try, Success, Failure}

import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of 
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:10, Mich Talebzadeh  wrote:

This is what I get

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

Hi Brandon.

In dealing with

df case Failure(e) => throw new Exception("foo")

Can one print the Exception message?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:15, Mich Talebzadeh 
wrote:

> OK looking promising thanks
>
> scala> import scala.util.{Try, Success, Failure}
> import scala.util.{Try, Success, Failure}
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint
> ... 6 more fields]
>
>
> regards,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:
>
>> Match needs to be lower case “match”
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 6:13 PM
>> *To: *Brandon Geise 
>> *Cc: *Todd Nist , "user @spark" <
>> user@spark.apache.org>
>> *Subject: *Re: Exception handling in Spark
>>
>>
>>
>>
>> scala> import scala.util.{Try, Success, Failure}
>>
>> import scala.util.{Try, Success, Failure}
>>
>> scala> val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>> :48: error: value Match is not a member of
>> scala.util.Try[org.apache.spark.sql.DataFrame]
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> LinkedIn  
>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
>> wrote:
>>
>> This is what I get
>>
>>
>>
>> scala> val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>> :47: error: not found: value Try
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load(

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

OK looking promising thanks

scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint
... 6 more fields]


regards,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:

> Match needs to be lower case “match”
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 6:13 PM
> *To: *Brandon Geise 
> *Cc: *Todd Nist , "user @spark"  >
> *Subject: *Re: Exception handling in Spark
>
>
>
>
> scala> import scala.util.{Try, Success, Failure}
>
> import scala.util.{Try, Success, Failure}
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :48: error: value Match is not a member of
> scala.util.Try[org.apache.spark.sql.DataFrame]
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>
> Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
> wrote:
>
> This is what I get
>
>
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :47: error: not found: value Try
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> ^
> :47: error: not found: value Success
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>^
> :47: error: not found: value Failure
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Match needs to be lower case “match”

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:13 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

scala> import scala.util.{Try, Success, Failure}

import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of 
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:10, Mich Talebzadeh  wrote:

This is what I get

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

This is what I had in mind.  Can you give this approach a try?

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

I am trying this approach

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

scala> try {
 |   val df = spark.read.

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}


Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
wrote:

> This is what I get
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :47: error: not found: value Try
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> ^
> :47: error: not found: value Success
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>^
> :47: error: not found: value Failure
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:
>
>> This is what I had in mind.  Can you give this approach a try?
>>
>>
>>
>> val df = Try(spark.read.csv("")) match {
>>
>>   case Success(df) => df
>>
>>   case Failure(e) => throw new Exception("foo")
>>
>>   }
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 5:17 PM
>> *To: *Todd Nist 
>> *Cc: *Brandon Geise , "user @spark" <
>> user@spark.apache.org>
>> *Subject: *Re: Exception handling in Spark
>>
>>
>>
>> I am trying this approach
>>
>>
>>
>>
>> val broadcastValue = "123456789"  // I assume this will be sent as a
>> constant for the batch
>> // Create a DF on top of XML
>> try {
>>   val df = spark.read.
>> format("com.databricks.spark.xml").
>> option("rootTag", "hierarchy").
>> option("rowTag", "sms_request").
>> load("/tmp/broadcast.xml")
>>   df
>> }  catch {
>> case ex: Fi

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Import scala.util.Try

Import scala.util.Success

Import scala.util.Failure

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:11 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

This is what I get

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

This is what I had in mind.  Can you give this approach a try?

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

I am trying this approach

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, 
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema

data frame seems to be lost!

Thanks,

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monet

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

This is what I get

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}


 ^
:47: error: not found: value Failure
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

> This is what I had in mind.  Can you give this approach a try?
>
>
>
> val df = Try(spark.read.csv("")) match {
>
>   case Success(df) => df
>
>   case Failure(e) => throw new Exception("foo")
>
>   }
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 5:17 PM
> *To: *Todd Nist 
> *Cc: *Brandon Geise , "user @spark" <
> user@spark.apache.org>
> *Subject: *Re: Exception handling in Spark
>
>
>
> I am trying this approach
>
>
>
>
> val broadcastValue = "123456789"  // I assume this will be sent as a
> constant for the batch
> // Create a DF on top of XML
> try {
>   val df = spark.read.
> format("com.databricks.spark.xml").
> option("rootTag", "hierarchy").
> option("rowTag", "sms_request").
> load("/tmp/broadcast.xml")
>   df
> }  catch {
> case ex: FileNotFoundException => {
> println (s"\nFile /tmp/broadcast.xml not found\n")
> None
> }
>case unknown: Exception => {
>  println(s"\n Error encountered $unknown\n")
> None
> }
> }
>
> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>
> But this does not work
>
>
>
> scala> try {
>  |   val df = spark.read.
>  | format("com.databricks.spark.xml").
>  | option("rootTag", "hierarchy").
>  | option("rowTag", "sms_request").
>  | load("/tmp/broadcast.xml")
>  |   Some(df)
>  | }  catch {
>  | case ex: FileNotFoundException => {
>  | println (s"\nFile /tmp/broadcast.xml not found\n")
>  | None
>  | }
>  |case unknown: Exception => {
>  |  println(s"\n Error encountered $unknown\n")
>  | None
>  | }
>  | }
> res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string,
> ocis_party_id: bigint ... 6 more fields])
>
> scala>
>
> scala> df.printSchema
> :48: error: not found: value df
>df.printSchema
>
> data frame seems to be lost!
>
>
>
> Thanks,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Di

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

This is what I had in mind.  Can you give this approach a try?

 

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

 

I am trying this approach

 


val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

 

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, 
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema
   
data frame seems to be lost!

 

Thanks,

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 18:08, Mich Talebzadeh  wrote:

Thanks Todd. This is what I did before creating DF on top of that file

 

var exists = true

exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)

if(!exists) {

  println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath} does 
not exist, aborting!\n")

 sys.exit(1)

}

.

.

def xmlFileExists(hdfsDirectory: String): Boolean = {

   val hadoopConf = new org.apache.hadoop.conf.Configuration()

   val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)

   fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))

 }

 

And checked it. It works.

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:

Could you do something like this prior to calling the action.

 

// Create FileSystem object from Hadoop Configuration

val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

// This methods returns Boolean (true - if file exists, false - if file doesn't 
exist

val fileExists = fs.exists(new Path(""))

if (fileExists) println("File exists!")

else println("File doesn't exist!")

 

Not sure that will help you or not, just a thought.

 

-Todd

 

 

 

 

On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh  
wrote:

Thanks  Brandon!

 

i should have remembered that.

 

basically the code gets out with sys.exit(1)  if it cannot find the file

 

I guess there is no easy way of validating DF except actioning it by show(1,0) 
etc and checking if it works?

 

Regards,


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

I am trying this approach


val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string,
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema

data frame seems to be lost!

Thanks,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 18:08, Mich Talebzadeh 
wrote:

> Thanks Todd. This is what I did before creating DF on top of that file
>
> var exists = true
> exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)
> if(!exists) {
>   println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath}
> does not exist, aborting!\n")
>  sys.exit(1)
> }
> .
> .
> def xmlFileExists(hdfsDirectory: String): Boolean = {
>val hadoopConf = new org.apache.hadoop.conf.Configuration()
>val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
>fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))
>  }
>
> And checked it. It works.
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:
>
>> Could you do something like this prior to calling the action.
>>
>> // Create FileSystem object from Hadoop Configuration
>> val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
>> // This methods returns Boolean (true - if file exists, false - if file
>> doesn't exist
>> val fileExists = fs.exists(new Path(""))
>> if (fileExists) println("File exists!")
>> else println("File doesn't exist!")
>>
>> Not sure that will help you or not, just a thought.
>>
>> -Todd
>>
>>
>>
>>
>> On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks  Brandon!
>>>
>>> i should have remembered that.
>>>
>>> basically the code gets out with sys.exit(1)  if it cannot find the file
>>>
>>> I guess there is no easy way of validating DF except actioning it by
>>> show(1,0) etc and checking if it works?
>>>
>>> Regards,
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 5 May 2020 at 16:41,

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

Thanks Todd. This is what I did before creating DF on top of that file

var exists = true
exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)
if(!exists) {
  println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath}
does not exist, aborting!\n")
 sys.exit(1)
}
.
.
def xmlFileExists(hdfsDirectory: String): Boolean = {
   val hadoopConf = new org.apache.hadoop.conf.Configuration()
   val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
   fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))
 }

And checked it. It works.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:

> Could you do something like this prior to calling the action.
>
> // Create FileSystem object from Hadoop Configuration
> val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
> // This methods returns Boolean (true - if file exists, false - if file
> doesn't exist
> val fileExists = fs.exists(new Path(""))
> if (fileExists) println("File exists!")
> else println("File doesn't exist!")
>
> Not sure that will help you or not, just a thought.
>
> -Todd
>
>
>
>
> On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh 
> wrote:
>
>> Thanks  Brandon!
>>
>> i should have remembered that.
>>
>> basically the code gets out with sys.exit(1)  if it cannot find the file
>>
>> I guess there is no easy way of validating DF except actioning it by
>> show(1,0) etc and checking if it works?
>>
>> Regards,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 5 May 2020 at 16:41, Brandon Geise 
>> wrote:
>>
>>> You could use the Hadoop API and check if the file exists.
>>>
>>>
>>>
>>> *From: *Mich Talebzadeh 
>>> *Date: *Tuesday, May 5, 2020 at 11:25 AM
>>> *To: *"user @spark" 
>>> *Subject: *Exception handling in Spark
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> As I understand exception handling in Spark only makes sense if one
>>> attempts an action as opposed to lazy transformations?
>>>
>>>
>>>
>>> Let us assume that I am reading an XML file from the HDFS directory  and
>>> create a dataframe DF on it
>>>
>>>
>>>
>>> val broadcastValue = "123456789"  // I assume this will be sent as a
>>> constant for the batch
>>>
>>> // Create a DF on top of XML
>>> val df = spark.read.
>>> format("com.databricks.spark.xml").
>>> option("rootTag", "hierarchy").
>>> option("rowTag", "sms_request").
>>> load("/tmp/broadcast.xml")
>>>
>>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>>
>>> newDF.createOrReplaceTempView("tmp")
>>>
>>>   // Put data in Hive table
>>>   //
>>>   sqltext = """
>>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>>> (broadcastid="123456", brand)
>>>   SELECT
>>>   ocis_party_id AS partyId
>>> , target_mobile_no AS phoneNumber
>>> , brand
>>> , broadcastid
>>>   FROM tmp
>>>   """
>>> //
>>>
>>> // Here I am performing a collection
>>>
>>> try  {
>>>
>>>  spark.sql(sqltext)
>>>
>>> } catch {
>>>
>>> case e: SQLException => e.printStackTrace
>>>
>>> sys.exit()
>>>
>>> }
>>>
>>>
>>>
>>> Now the issue I have is that what if the xml file  /tmp/broadcast.xml
>>> does not exist or deleted? I won't be able to catch the error until the
>>> hive table is populated. Of course I can write a shell script to check if
>>> the file exist before running the job or put small collection like
>>> df.show(1,0). Are there more general alternatives?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn  
>>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Read is an action, so you could wrap it in a Try (or whatever you want)

scala> val df = Try(spark.read.csv("test"))

df: scala.util.Try[org.apache.spark.sql.DataFrame] = 
Failure(org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/test;)

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 12:45 PM
To: Brandon Geise 
Cc: "user @spark" 
Subject: Re: Exception handling in Spark

 

Thanks  Brandon!

 

i should have remembered that.

 

basically the code gets out with sys.exit(1)  if it cannot find the file

 

I guess there is no easy way of validating DF except actioning it by show(1,0) 
etc and checking if it works?

 

Regards,


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:

You could use the Hadoop API and check if the file exists.

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 11:25 AM
To: "user @spark" 
Subject: Exception handling in Spark

 

Hi,

 

As I understand exception handling in Spark only makes sense if one attempts an 
action as opposed to lazy transformations?

 

Let us assume that I am reading an XML file from the HDFS directory  and create 
a dataframe DF on it

 

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch

// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456", 
brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
  FROM tmp
  """
//

// Here I am performing a collection 

try  {

 spark.sql(sqltext)

} catch {

case e: SQLException => e.printStackTrace

sys.exit()

}

 

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does not 
exist or deleted? I won't be able to catch the error until the hive table is 
populated. Of course I can write a shell script to check if the file exist 
before running the job or put small collection like df.show(1,0). Are there 
more general alternatives?

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.

Re: Exception handling in Spark

2020-05-05 Thread Todd Nist

Could you do something like this prior to calling the action.

// Create FileSystem object from Hadoop Configuration
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
// This methods returns Boolean (true - if file exists, false - if file
doesn't exist
val fileExists = fs.exists(new Path(""))
if (fileExists) println("File exists!")
else println("File doesn't exist!")

Not sure that will help you or not, just a thought.

-Todd




On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh 
wrote:

> Thanks  Brandon!
>
> i should have remembered that.
>
> basically the code gets out with sys.exit(1)  if it cannot find the file
>
> I guess there is no easy way of validating DF except actioning it by
> show(1,0) etc and checking if it works?
>
> Regards,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:
>
>> You could use the Hadoop API and check if the file exists.
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 11:25 AM
>> *To: *"user @spark" 
>> *Subject: *Exception handling in Spark
>>
>>
>>
>> Hi,
>>
>>
>>
>> As I understand exception handling in Spark only makes sense if one
>> attempts an action as opposed to lazy transformations?
>>
>>
>>
>> Let us assume that I am reading an XML file from the HDFS directory  and
>> create a dataframe DF on it
>>
>>
>>
>> val broadcastValue = "123456789"  // I assume this will be sent as a
>> constant for the batch
>>
>> // Create a DF on top of XML
>> val df = spark.read.
>> format("com.databricks.spark.xml").
>> option("rootTag", "hierarchy").
>> option("rowTag", "sms_request").
>> load("/tmp/broadcast.xml")
>>
>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>
>> newDF.createOrReplaceTempView("tmp")
>>
>>   // Put data in Hive table
>>   //
>>   sqltext = """
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> (broadcastid="123456", brand)
>>   SELECT
>>   ocis_party_id AS partyId
>> , target_mobile_no AS phoneNumber
>> , brand
>> , broadcastid
>>   FROM tmp
>>   """
>> //
>>
>> // Here I am performing a collection
>>
>> try  {
>>
>>  spark.sql(sqltext)
>>
>> } catch {
>>
>> case e: SQLException => e.printStackTrace
>>
>> sys.exit()
>>
>> }
>>
>>
>>
>> Now the issue I have is that what if the xml file  /tmp/broadcast.xml
>> does not exist or deleted? I won't be able to catch the error until the
>> hive table is populated. Of course I can write a shell script to check if
>> the file exist before running the job or put small collection like
>> df.show(1,0). Are there more general alternatives?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn  
>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh

Thanks  Brandon!

i should have remembered that.

basically the code gets out with sys.exit(1)  if it cannot find the file

I guess there is no easy way of validating DF except actioning it by
show(1,0) etc and checking if it works?

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:

> You could use the Hadoop API and check if the file exists.
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 11:25 AM
> *To: *"user @spark" 
> *Subject: *Exception handling in Spark
>
>
>
> Hi,
>
>
>
> As I understand exception handling in Spark only makes sense if one
> attempts an action as opposed to lazy transformations?
>
>
>
> Let us assume that I am reading an XML file from the HDFS directory  and
> create a dataframe DF on it
>
>
>
> val broadcastValue = "123456789"  // I assume this will be sent as a
> constant for the batch
>
> // Create a DF on top of XML
> val df = spark.read.
> format("com.databricks.spark.xml").
> option("rootTag", "hierarchy").
> option("rowTag", "sms_request").
> load("/tmp/broadcast.xml")
>
> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>
> newDF.createOrReplaceTempView("tmp")
>
>   // Put data in Hive table
>   //
>   sqltext = """
>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
> (broadcastid="123456", brand)
>   SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
> , brand
> , broadcastid
>   FROM tmp
>   """
> //
>
> // Here I am performing a collection
>
> try  {
>
>  spark.sql(sqltext)
>
> } catch {
>
> case e: SQLException => e.printStackTrace
>
> sys.exit()
>
> }
>
>
>
> Now the issue I have is that what if the xml file  /tmp/broadcast.xml does
> not exist or deleted? I won't be able to catch the error until the hive
> table is populated. Of course I can write a shell script to check if the
> file exist before running the job or put small collection like
> df.show(1,0). Are there more general alternatives?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

You could use the Hadoop API and check if the file exists.

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 11:25 AM
To: "user @spark" 
Subject: Exception handling in Spark

Hi,

As I understand exception handling in Spark only makes sense if one attempts an 
action as opposed to lazy transformations?

Let us assume that I am reading an XML file from the HDFS directory  and create 
a dataframe DF on it

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch

// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456", 
brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
  FROM tmp
  """
//

// Here I am performing a collection 

try  {

 spark.sql(sqltext)

} catch {

case e: SQLException => e.printStackTrace

sys.exit()

}

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does not 
exist or deleted? I won't be able to catch the error until the hive table is 
populated. Of course I can write a shell script to check if the file exist 
before running the job or put small collection like df.show(1,0). Are there 
more general alternatives?

Thanks

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark throws recursive value for DF needs type error

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

21 matches

Site Navigation

Mail list logo

Footer information