Re: Structured Streaming + UDF - logic based on checking if a column is present in the Dataframe

2022-02-25 Thread Gourav Sengupta
instead > df.select(convertStructToStr(*df.columns)) \ > .write \ > .format("console") \ > .option("numRows",100)\ > .option("checkpointLocation", > "/Users/karanalang/PycharmProjects/Kafka/checkpoint") \ > .option("outputM

Structured Streaming + UDF - logic based on checking if a column is present in the Dataframe

2022-02-23 Thread karan alang
ot;checkpointLocation", "/Users/karanalang/PycharmProjects/Kafka/checkpoint") \ .option("outputMode", "complete")\ .save("output") Additional Details in stackoverflow : https://stackoverflow.com/questions/71243726/structured-streaming-udf-logi

Re: Checking if cascading graph computation is possible in Spark

2019-04-05 Thread Jason Nerothin
*I guess I was focusing on this:* #2 I want to do the above as a event driven way, *without using the batches* (i tried micro batches, but I realised that’s not what I want), i.e., *for each arriving event or as soon as a event message come my stream, not by accumulating the event * If you want

Re: Checking if cascading graph computation is possible in Spark

2019-04-05 Thread Basavaraj
I have checked broadcast of accumulated values, but not satellite stateful stabbing But, I am not sure how that helps here On Fri, 5 Apr 2019, 10:13 pm Jason Nerothin, wrote: > Have you looked at Arbitrary Stateful Streaming and Broadcast Accumulators? > > On Fri, Apr 5, 2019 at 10:55 AM

Re: Checking if cascading graph computation is possible in Spark

2019-04-05 Thread Jason Nerothin
Have you looked at Arbitrary Stateful Streaming and Broadcast Accumulators? On Fri, Apr 5, 2019 at 10:55 AM Basavaraj wrote: > Hi > > Have two questions > > #1 > I am trying to process events in realtime, outcome of the processing has > to find a node in the GraphX and update that node as well

Checking if cascading graph computation is possible in Spark

2019-04-05 Thread Basavaraj
HiHave two questions #1 I am trying to process events in realtime, outcome of the processing has to find a node in the GraphX and update that node as well (in case if any anomaly or state change), If a node is updated, I have to update the related nodes as well, want to know if GraphX can help in

Checking if cascading graph computation is possible in Spark

2019-04-05 Thread Basavaraj
Hi Have two questions #1 I am trying to process events in realtime, outcome of the processing has to find a node in the GraphX and update that node as well (in case if any anomaly or state change), If a node is updated, I have to update the related nodes as well, want to know if GraphX can

RE: Checking for null values when mapping

2016-02-20 Thread Mich Talebzadeh
understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any r

Re: Checking for null values when mapping

2016-02-20 Thread Chandeep Singh
Chandeep Singh [mailto:c...@chandeep.com] > Sent: 20 February 2016 14:27 > To: Mich Talebzadeh <m...@peridale.co.uk> > Cc: user @spark <user@spark.apache.org> > Subject: Re: Checking for null values when mapping > > Also, have you looked into Dos2Unix (http://dos2unix.sourceforg

Re: Checking for null values when mapping

2016-02-20 Thread Chandeep Singh
ies nor their >> employees accept any responsibility. >> >> >> From: Chandeep Singh [mailto:c...@chandeep.com <mailto:c...@chandeep.com>] >> Sent: 20 February 2016 13:47 >> To: Mich Talebzadeh <m...@peridale.co.uk <mailto:m...@peridale.co.u

RE: Checking for null values when mapping

2016-02-20 Thread Mich Talebzadeh
Singh [mailto:c...@chandeep.com] Sent: 20 February 2016 14:27 To: Mich Talebzadeh <m...@peridale.co.uk> Cc: user @spark <user@spark.apache.org> Subject: Re: Checking for null values when mapping Also, have you looked into Dos2Unix (http://dos2unix.sourceforge.net/) Has helped me i

Re: Checking for null values when mapping

2016-02-20 Thread Chandeep Singh
efore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > From: Chandeep Singh [mailto:c...@chandeep.com] > Sent: 20 February 2016 13:47 > To: Mich Talebzadeh <m...@peridale.co.uk> > Cc: user @spark <user@spark.apach

RE: Checking for null values when mapping

2016-02-20 Thread Mich Talebzadeh
From: Chandeep Singh [mailto:c...@chandeep.com] Sent: 20 February 2016 13:47 To: Mich Talebzadeh <m...@peridale.co.uk> Cc: user @spark <user@spark.apache.org> Subject: Re: Checking for null values when mapping Looks like you’re using substring just to get rid of the ‘?’. Wh

Re: Checking for null values when mapping

2016-02-20 Thread Chandeep Singh
Looks like you’re using substring just to get rid of the ‘?’. Why not use replace for that as well? And then you wouldn’t run into issues with index out of bound. val a = "?1,187.50" val b = "" println(a.substring(1).replace(",", "”)) —> 1187.50 println(a.replace("?", "").replace(",", "”))

Re: Checking for null values when mapping

2016-02-20 Thread Ted Yu
erstood as given or endorsed by Peridale Technology Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Technology Ltd, its subsidiaries nor their > empl

RE: Checking for null values when mapping

2016-02-20 Thread Mich Talebzadeh
responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Michał Zieliński [mailto:zielinski.mich...@gmail.com] Sent: 20 February 2016 08:59 To: Mich Talebzadeh &l

Re: Checking for null values when mapping

2016-02-20 Thread Michał Zieliński
You can use filter and isNotNull on Column before the map. On 20 February 2016 at 08:24, Mich Talebzadeh wrote: > > > I have a DF like below reading a csv file > > > > > > val df = >

Checking for null values when mapping

2016-02-20 Thread Mich Talebzadeh
I have a DF like below reading a csv file val df = HiveContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("/data/stg/table2") val a = df.map(x => (x.getString(0), x.getString(1), x.getString(2).substring(1).replace(",",

Re: efficient checking the existence of an item in a rdd

2015-12-31 Thread domibd
discussion below: > > http://apache-spark-user-list.1001560.n3.nabble.com/efficient-checking-the-existence-of-an-item-in-a-rdd-tp25839.html > > To start a new topic under Apache Spark User List, email [hidden > email] > To unsubscribe from Apache Spark User

Re: efficient checking the existence of an item in a rdd

2015-12-31 Thread Nick Peterson
ge rdd > > in a prallelised way such that the process stop as soon as > > the item is found (if it is found)? > > > > thanks a lot > > > > Dominique > > > > > > >

Re: performance when checking if data frame is empty or not

2015-09-09 Thread Ted Yu
Have you tried: df.rdd.isEmpty Cheers On Tue, Sep 8, 2015 at 1:22 PM, Axel Dahl wrote: > I have a join, that fails when one of the data frames is empty. > > To avoid this I am hoping to check if the dataframe is empty or not before > the join. > > The question is

performance when checking if data frame is empty or not

2015-09-08 Thread Axel Dahl
I have a join, that fails when one of the data frames is empty. To avoid this I am hoping to check if the dataframe is empty or not before the join. The question is what's the most performant way to do that? should I do df.count() or df.first() or something else? Thanks in advance, -Axel

Incorrect ACL checking for partitioned table in Spark SQL-1.4

2015-06-16 Thread Karthik Subramanian
-list.1001560.n3.nabble.com/Incorrect-ACL-checking-for-partitioned-table-in-Spark-SQL-1-4-tp23355.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Checking Data Integrity in Spark

2015-03-27 Thread Sathish Kumaran Vairavelu
Hello, I want to check if there is any way to check the data integrity of the data files. The use case is perform data integrity check on large files 100+ columns and reject records (write it another file) that does not meet criteria's (such as NOT NULL, date format, etc). Since there are lot of

Re: Checking Data Integrity in Spark

2015-03-27 Thread Arush Kharbanda
Its not possible to configure Spark to do checks based on xmls. You would need to write jobs to do the validations you need. On Fri, Mar 27, 2015 at 5:13 PM, Sathish Kumaran Vairavelu vsathishkuma...@gmail.com wrote: Hello, I want to check if there is any way to check the data integrity of

Re: checking

2015-02-06 Thread Arush Kharbanda
Yes they are. On Fri, Feb 6, 2015 at 5:06 PM, Mohit Durgapal durgapalmo...@gmail.com wrote: Just wanted to know If my emails are reaching the user list. Regards Mohit -- [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead

Re: Checking spark cache percentage programatically. And how to clear cache.

2014-05-28 Thread Matei Zaharia
internals. Matei On May 28, 2014, at 5:32 PM, Sung Hwan Chung coded...@cs.stanford.edu wrote: Hi, Is there a programmatic way of checking whether RDD has been 100% cached or not? I'd like to do this to have two different path ways. Additionally, how do you clear cache (e.g. if you want