Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Sure, just do case Failure(e) => throw e From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:36 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark Hi Brandon. In dealing with df case Failure(e) => throw new Exception

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Match needs to be lower case “match” From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:13 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark scala> import scala.util.{Try, Success, Failure} import scala.util.{Try, Success, Fa

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Import scala.util.Try Import scala.util.Success Import scala.util.Failure From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:11 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark This is what I get scala> val df = Try(spar

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
This is what I had in mind.  Can you give this approach a try? val df = Try(spark.read.csv("")) match {   case Success(df) => df   case Failure(e) => throw new Exception("foo")   } From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 5:17 PM To: To

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Date: Tuesday, May 5, 2020 at 12:45 PM To: Brandon Geise Cc: "user @spark" Subject: Re: Exception handling in Spark Thanks Brandon! i should have remembered that. basically the code gets out with sys.exit(1) if it cannot find the file I guess there is no easy way

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
You could use the Hadoop API and check if the file exists. From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 11:25 AM To: "user @spark" Subject: Exception handling in Spark Hi, As I understand exception handling in Spark only makes sense if one attempts an action as opposed to

Re: How to print DataFrame.show(100) to text file at HDFS

2019-04-14 Thread Brandon Geise
Use .limit on the dataframe followed by .write On Apr 14, 2019, 5:10 AM, at 5:10 AM, Chetan Khatri wrote: >Nuthan, > >Thank you for reply. the solution proposed will give everything. for me >is >like one Dataframe show(100) in 3000 lines of Scala Spark code. >However, yarn logs --applicationId

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Brandon Geise
I recently came across this (haven’t tried it out yet) but maybe it can help guide you to identify the root cause. https://github.com/groupon/sparklint From: Vitaliy Pisarev Date: Thursday, November 15, 2018 at 10:08 AM To: user Cc: David Markovitz Subject: How to address seemingly low

Re: Timestamp Difference/operations

2018-10-15 Thread Brandon Geise
How about select unix_timestamp(timestamp2) – unix_timestamp(timestamp1)? From: Paras Agarwal Date: Monday, October 15, 2018 at 2:41 AM To: John Zhuge Cc: user , dev Subject: Re: Timestamp Difference/operations Thanks John, Actually need full date and time difference not just

Re: CSV parser - how to parse column containing json data

2018-10-02 Thread Brandon Geise
CSV as well. As per your solution, I am creating SructType only for Json field. So how am I going to mix and match here? i.e. do type inference for all fields but json field and use custom json_schema for json field. On Thu, Aug 30, 2018 at 5:29 PM Brandon Geise wrote: If you

Re: CSV parser - how to parse column containing json data

2018-08-30 Thread Brandon Geise
If you know your json schema you can create a struct and then apply that using from_json: val json_schema = StructType(Array(StructField(“x”, StringType, true), StructField(“y”, StringType, true), StructField(“z”, IntegerType, true))) .withColumn("_c3",

from_json schema order

2018-08-15 Thread Brandon Geise
Hi, Can someone confirm whether ordering matters between the schema and underlying JSON string? Thanks, Brandon

Re: Union of multiple data frames

2018-04-05 Thread Brandon Geise
Maybe something like var finalDF = spark.sqlContext.emptyDataFrame for (df <- dfs){     finalDF = finalDF.union(df) } Where dfs is a Seq of dataframes. From: Cesar Date: Thursday, April 5, 2018 at 2:17 PM To: user Subject: Union of

Re: how to create all possible combinations from an array? how to join and explode row array?

2018-03-30 Thread Brandon Geise
Possibly instead of doing the initial grouping, just do a full outer join on zyzy.  This is in scala but should be easily convertible to python. val data = Array(("john", "red"), ("john", "blue"), ("john", "red"), ("bill", "blue"), ("bill", "red"), ("sam", "green"))     val distData:

[Spark CSV DataframeWriter] Quote options for columns on write

2018-03-06 Thread Brandon Geise
My problem is related to the need to have all records in a specific column quoted when writing a CSV.  I assumed that by setting the options escapeQuotes to false in the options, that fields would not have any type of quoting applied, even when that delimiter exists.  Unless I am