Refined more: I just got rid of wrapping fields into struct, but the type
of result for UDAF is still struct. I need to extract the fields one by
one, but I guess I just haven't find a function which does the thing.
I crafted this code without IDE and ran from spark-shell, so there should
be many
I think I missed something: self-join is not needed via defining UDAF and
using it from aggregation. Since it requires all fields to be accessed, I
can't find any other approach than wrap fields into struct and unwrap
afterwards. There doesn't look like way to pass multiple fields in UDAF, at
Is this possibly related to the recent post on
https://issues.apache.org/jira/browse/SPARK-18057 ?
On Mon, Apr 16, 2018 at 11:57 AM, ARAVIND SETHURATHNAM <
asethurath...@homeaway.com.invalid> wrote:
> Hi,
>
> We have several structured streaming jobs (spark version 2.2.0) consuming
> from kafka
That might be simple if you want to get aggregated values for both amount
and my_timestamp:
val schema = StructType(Seq(
StructField("ID", IntegerType, true),
StructField("AMOUNT", IntegerType, true),
StructField("MY_TIMESTAMP", DateType, true)
))
val query =
Hi,
My spark jobs need to talk to hbase and I am not sure which spark hbase
connector is recommended:
https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
https://phoenix.apache.org/phoenix_spark.html
Or there is any other better solutions. Appreciate any guidance.
Hi TD,
Thanks for that. The only reason I ask is I don't see any alternative
solution to solve the problem below using raw sql.
How to select the max row for every group in spark structured streaming
2.3.0 without using order by since it requires complete mode or
mapGroupWithState?
*Input:*
> Hi all,
>
> we have two environments for spark streaming job, which consumes Kafka
> topic to do calculation.
>
> Now in one environment, spark streaming job consume an non-standard
> data from kafka and throw an excepiton(not catch it in code), then the
> sreaming job is down.
>
> But in
hi,
I confirme.
Le mar. 17 avr. 2018 à 01:29, Prasad Velagaleti a
écrit :
> Hello,
>I got a message saying , messages sent to me (my gmail id) from the
> mailing list got bounced ?
> Wonder why ?
>
> thanks,
> Prasad.
>
> On Mon, Apr 16, 2018 at 6:16 PM,
If it contains only SQL then you can use a function as below -
import subprocess
def run_sql(sql_file_path, your_db_name ,location):
subprocess.call(["spark-sql","-S","--hivevar","",,"--hivevar","LOCATION",location,"-f",sql_file_path])
In you have other pieces like spark code and not only sql