Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread Jungtaek Lim
Refined more: I just got rid of wrapping fields into struct, but the type of result for UDAF is still struct. I need to extract the fields one by one, but I guess I just haven't find a function which does the thing. I crafted this code without IDE and ran from spark-shell, so there should be many

Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread Jungtaek Lim
I think I missed something: self-join is not needed via defining UDAF and using it from aggregation. Since it requires all fields to be accessed, I can't find any other approach than wrap fields into struct and unwrap afterwards. There doesn't look like way to pass multiple fields in UDAF, at

Re: Structured streaming: Tried to fetch $offset but the returned record offset was ${record.offset}"

2018-04-17 Thread Cody Koeninger
Is this possibly related to the recent post on https://issues.apache.org/jira/browse/SPARK-18057 ? On Mon, Apr 16, 2018 at 11:57 AM, ARAVIND SETHURATHNAM < asethurath...@homeaway.com.invalid> wrote: > Hi, > > We have several structured streaming jobs (spark version 2.2.0) consuming > from kafka

Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread Jungtaek Lim
That might be simple if you want to get aggregated values for both amount and my_timestamp: val schema = StructType(Seq( StructField("ID", IntegerType, true), StructField("AMOUNT", IntegerType, true), StructField("MY_TIMESTAMP", DateType, true) )) val query =

spark hbase connector

2018-04-17 Thread Lian Jiang
Hi, My spark jobs need to talk to hbase and I am not sure which spark hbase connector is recommended: https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://phoenix.apache.org/phoenix_spark.html Or there is any other better solutions. Appreciate any guidance.

Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread kant kodali
Hi TD, Thanks for that. The only reason I ask is I don't see any alternative solution to solve the problem below using raw sql. How to select the max row for every group in spark structured streaming 2.3.0 without using order by since it requires complete mode or mapGroupWithState? *Input:*

An exception makes different phenomnon

2018-04-17 Thread big data
> Hi all, > > we have two environments for spark streaming job, which consumes Kafka > topic to do calculation. > > Now in one environment, spark streaming job consume an non-standard > data from kafka and throw an excepiton(not catch it in code), then the > sreaming job is down. > > But in

Re: Warning from user@spark.apache.org

2018-04-17 Thread Ahmed B.S.B Seye
hi, I confirme. Le mar. 17 avr. 2018 à 01:29, Prasad Velagaleti a écrit : > Hello, >I got a message saying , messages sent to me (my gmail id) from the > mailing list got bounced ? > Wonder why ? > > thanks, > Prasad. > > On Mon, Apr 16, 2018 at 6:16 PM,

Re: pyspark execution

2018-04-17 Thread hemant singh
If it contains only SQL then you can use a function as below - import subprocess def run_sql(sql_file_path, your_db_name ,location): subprocess.call(["spark-sql","-S","--hivevar","",,"--hivevar","LOCATION",location,"-f",sql_file_path]) In you have other pieces like spark code and not only sql