Re: Subqueries

2017-12-29 Thread Nicholas Hakobian
This sounds like a perfect example of using windowing functions. Have you tried something like the following: select ACCT_ID, CR_RVKD_STAT_CD, ACCT_SFX_NUM, SCURT_FRD_STAT_CD, CLSD_REAS_CD from (select *, max(instnc_id) *over ()* as max_inst_id FROM Stat_hist) where instnc_id=max_inst_id

Re: [Structured Streaming] Reuse computation result

2017-12-29 Thread Lalwani, Jayesh
There is no way to solve this within spark. One option you could do is break up your application into multiple application. First application can filter and write the filtered results into a kafka queue. Second application can read from queue and sum. Third application can read from queue and

Subqueries

2017-12-29 Thread Lalwani, Jayesh
I have a table, and I want to find the latest records in the table. The table has a column called instnc_id that is incremented everyday. So, I want to find the records that have the max instnc_id. I am trying to do this using subqueries, but it gives me an error. For example, when I try this

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Shushant Arora
you may have to recreate your cluster with below configuration at emr creation "Configurations": [ { "Properties": { "maximizeResourceAllocation": "false" }, "Classification": "spark" } ] On

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > Dynamic allocation is great; but sometimes I’ve found explicitly setting the > num executors, cores per executor, and memory per executor to be a better > alternative. No difference with spark.dynamicAllocation.enabled

Fwd: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
Hello, Just a quick update as I did not made much progress yet. On 28 Dec 2017, at 21:09, Gourav Sengupta wrote: > can you try to then use the EMR version 5.10 instead or EMR version 5.11 > instead? Same issue with EMR 5.11.0. Task 0 in one stage never finishes. >

Custom line/record delimiter

2017-12-29 Thread sk skk
Hi, Do we have an option to write a csv or text file with a custom record/line separator through spark ? I could not find any ref on the api. I have a issue while loading data into a warehouse as one of the column on csv have a new line character and the warehouse is not letting to escape that