Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
OK I found a workaround. Basically each stream state is not kept and I have two streams. One is a business topic and the other one created to shut down spark structured streaming gracefully. I was interested to print the value for the most recent batch Id for the business topic called "md" here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This might help https://docs.databricks.com/structured-streaming/foreach.html streamingDF.writeStream.foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of the streaming query. It takes two parameters: a DataFrame or Dataset that has the

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
I am aware of your point that global don't work in a distributed environment. With regard to your other point, these are two different topics with their own streams. The point of second stream is to set the status to false, so it can gracefully shutdown the main stream (the one called "md") here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
I don't quite get it - aren't you applying to the same stream, and batches? worst case why not apply these as one function? Otherwise, how do you mean to associate one call to another? globals don't help here. They aren't global beyond the driver, and, which one would be which batch? On Sat, Mar

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
It's the same batch ID already, no? Or why not simply put the logic of both in one function? or write one function that calls both? On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh wrote: > > This is probably pretty straight forward but somehow is does not look > that way > > > > On Spark

How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This is probably pretty straight forward but somehow is does not look that way On Spark Structured Streaming, "foreachBatch" performs custom write logic on each micro-batch through a call function. Example, foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as DataFrame or

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
Why not just use STDDEV_SAMP? it's probably more accurate than the differences-of-squares calculation. You can write an aggregate UDF that calls numpy and register it for SQL, but, it is already a built-in. On Thu, Dec 24, 2020 at 8:12 AM Mich Talebzadeh wrote: > Thanks for the feedback. > > I

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Thanks for the feedback. I have a question here. I want to use numpy STD as well but just using sql in pyspark. Like below sqltext = f""" SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount , rs.Average_order ,

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
I don't know which one is 'correct' (it's not standard SQL?) or whether it's the sample stdev for a good reason or just historical now. But you can always call STDDEV_SAMP (in any DB) if needed. It's equivalent to numpy.std with ddof=1, the Bessel-corrected standard deviation. On Thu, Dec 24,

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Well the truth is that we had this discussion in 2016 :(. what Hive calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified yet! Spark-sql, Oracle and Sybase point STDDEV to STDDEV_SAMP and not STDDEV_POP. Run a test on *Hive* SELECT

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Sean Owen
Why do you want to use this function instead of the built-in stddev function? On Wed, Dec 23, 2020 at 2:52 PM Mich Talebzadeh wrote: > Hi, > > > This is a shot in the dark so to speak. > > > I would like to use the standard deviation std offered by numpy in > PySpark. I am using SQL for now > >

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
OK Thanks for the tip. I found this link useful for Python from Databricks User-defined functions - Python — Databricks Documentation LinkedIn *

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Peyman Mohajerian
https://stackoverflow.com/questions/43484269/how-to-register-udf-to-use-in-sql-and-dataframe On Wed, Dec 23, 2020 at 12:52 PM Mich Talebzadeh wrote: > Hi, > > > This is a shot in the dark so to speak. > > > I would like to use the standard deviation std offered by numpy in > PySpark. I am using

Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
Hi, This is a shot in the dark so to speak. I would like to use the standard deviation std offered by numpy in PySpark. I am using SQL for now The code as below sqltext = f""" SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
You could have posted just the error, which is at the end of my response. Why are you trying to use WebHDFS? I'm not really sure how authentication works with that. But generally applications use HDFS (which uses a different URI scheme), and Spark should work fine with that. Error:

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Sure - I wanted to check with admin before sharing. I’ve attached it now, does this help? Many thanks again, G Container: container_e34_1479877553404_0174_01_03 on hdp-node12.xcat.cluster_45454_1481228528201

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
Then you probably have a configuration error somewhere. Since you haven't actually posted the error you're seeing, it's kinda hard to help any further. On Thu, Dec 8, 2016 at 11:17 AM, Gerard Casey wrote: > Right. I’m confident that is setup correctly. > > I can run

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Right. I’m confident that is setup correctly. I can run the SparkPi test script. The main difference between it and my application is that it doesn’t access HDFS. > On 8 Dec 2016, at 18:43, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey wrote: > To be specific, where exactly should spark.authenticate be set to true? spark.authenticate has nothing to do with kerberos. It's for authentication between different Spark processes belonging to the same app. --

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcin, That seems to be the case. It explains why there is no documentation on this part too! To be specific, where exactly should spark.authenticate be set to true? Many thanks, Gerry > On 8 Dec 2016, at 08:46, Marcin Pastecki wrote: > > My understanding

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcin Pastecki
My understanding is that the token generation is handled by Spark itself as long as you were authenticated in Kerberos when submitting the job and spark.authenticate is set to true. --keytab and --principal options should be used for "long" running job, when you may need to do ticket renewal.

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
I just read an interesting comment on cloudera: What does it mean by “when the job is submitted,and you have a kinit, you will have TOKEN to access HDFS, you would need to pass that on, or the KERBEROS ticket” ? Reference

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo. I’ve completely removed it. Ok - even if I read/write from HDFS? Trying to the SparkPi example now G > On 7 Dec 2016, at 22:10, Marcelo Vanzin wrote: > > Have you removed all the code dealing with Kerberos that you posted? > You should not be setting

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
Have you removed all the code dealing with Kerberos that you posted? You should not be setting those principal / keytab configs. Literally all you have to do is login with kinit then run spark-submit. Try with the SparkPi example for instance, instead of your own code. If that doesn't work, you

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks. I’ve checked the TGT, principal and key tab. Where to next?! > On 7 Dec 2016, at 22:03, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey > wrote: >> Can anyone point me to a tutorial or a run through of how to

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey wrote: > Can anyone point me to a tutorial or a run through of how to use Spark with > Kerberos? This is proving to be quite confusing. Most search results on the > topic point to what needs inputted at the point of `sparks

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo, Turns out I had missed setup steps in the actual file itself. Thanks to Richard for the help here. He pointed me to some java implementations. I’m using the import org.apache.hadoop.security API. I now have: /* graphx_sp.scala */ import scala.util.Try import scala.io.Source

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
That's not the error, that's just telling you the application failed. You have to look at the YARN logs for application_1479877553404_0041 to see why it failed. On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey wrote: > Thanks Marcelo, > > My understanding from a few

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Thanks Marcelo, My understanding from a few pointers is that this may be due to insufficient read permissions to the key tab or a corrupt key tab. I have checked the read permissions and they are ok. I can see that it is initially configuring correctly: INFO

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
There's generally an exception in these cases, and you haven't posted it, so it's hard to tell you what's wrong. The most probable cause, without the extra information the exception provides, is that you're using the wrong Hadoop configuration when submitting the job to YARN. On Mon, Dec 5, 2016

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Jorge Sánchez
Hi Gerard, have you tried running in yarn-client mode? If so, do you still get that same error? Regards. 2016-12-05 12:49 GMT+00:00 Gerard Casey : > Edit. From here > > I > read that

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Edit. From here I read that you can pass a `key tab` option to spark-submit. I thus tried spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --deploy-mode cluster --executor-memory 13G

Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Hello all, I am using Spark with Kerberos authentication. I can run my code using `spark-shell` fine and I can also use `spark-submit` in local mode (e.g. —master local[16]). Both function as expected. local mode - spark-submit --class "graphx_sp" --master local[16] --driver-memory

Running window functions in spark dataframe

2016-01-13 Thread rakesh sharma
Hi all I am getting hivecontext error when trying to run to run window functions like over on ordering clause. Any help to go about. I am running spark locally Sent from Ouertlook Mobile -- Forwarded message -- From: "King sami"

How can I know currently supported functions in Spark SQL

2015-08-06 Thread Netwaver
Hi All, I am using Spark 1.4.1, and I want to know how can I find the complete function list supported in Spark SQL, currently I only know 'sum','count','min','max'. Thanks a lot.

Re: Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Pedro Rodriguez
Worth noting that Spark 1.5 is extending that list of Spark SQL functions quite a bit. Not sure where in the docs they would be yet, but the JIRA is here: https://issues.apache.org/jira/browse/SPARK-8159 On Thu, Aug 6, 2015 at 7:27 PM, Netwaver wanglong_...@163.com wrote: Thanks for your kindly

Re:Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Netwaver
Thanks for your kindly help At 2015-08-06 19:28:10, Todd Nist tsind...@gmail.com wrote: They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wanglong_...@163.com wrote: Hi

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Todd Nist
They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wanglong_...@163.com wrote: Hi All, I am using Spark 1.4.1, and I want to know how can I find the complete function

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Ted Yu
Have you looked at this? http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ On Aug 6, 2015, at 2:52 AM, Netwaver wanglong_...@163.com wrote: Hi All, I am using Spark 1.4.1, and I want to know how can I find the complete function list

Functions in Spark SQL

2015-07-27 Thread vinod kumar
Hi, May I know how to use the functions mentioned in http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ in spark sql? when I use like Select last(column) from tablename I am getting error like 15/07/27 03:00:00 INFO exec.FunctionRegistry: Unable to lookup

Re: Functions in Spark SQL

2015-07-27 Thread fightf...@163.com
: Functions in Spark SQL Hi, May I know how to use the functions mentioned in http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ in spark sql? when I use like Select last(column) from tablename I am getting error like 15/07/27 03:00:00 INFO exec.FunctionRegistry

Re: Functions in Spark SQL

2015-07-27 Thread vinod kumar
code here ? And which version of Spark are u using ? Best, Sun. -- fightf...@163.com *From:* vinod kumar vinodsachin...@gmail.com *Date:* 2015-07-27 15:04 *To:* User user@spark.apache.org *Subject:* Functions in Spark SQL Hi, May I know how to use

Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread Sourav Mazumder
Hi, Though the documentation does not explicitly mention support for Windowing and Analytics function in Spark SQL, looks like it is not supported. I tried running a query like Select Lead(column name, 1) over (Partition By column name order by column name) from table name and I got error saying

Re: Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread ayan guha
1.4 supports it On 23 Jun 2015 02:59, Sourav Mazumder sourav.mazumde...@gmail.com wrote: Hi, Though the documentation does not explicitly mention support for Windowing and Analytics function in Spark SQL, looks like it is not supported. I tried running a query like Select Lead(column name,

RE: Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread Cheng, Hao
Yes, with should be with HiveContext, not SQLContext. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Tuesday, June 23, 2015 2:51 AM To: smazumder Cc: user Subject: Re: Support for Windowing and Analytics functions in Spark SQL 1.4 supports it On 23 Jun 2015 02:59, Sourav Mazumder

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Masf
and Analytics functions supported in Spark SQL (with HiveContext or not)? For example in Hive is supported https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Some tutorial or documentation where I can see all features supported by Spark SQL? Thanks!!! -- Regards

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Arush Kharbanda
function support in 1.4.0. But it's not a promise yet. Cheng On 3/26/15 7:27 PM, Arush Kharbanda wrote: Its not yet implemented. https://issues.apache.org/jira/browse/SPARK-1442 On Thu, Mar 26, 2015 at 4:39 PM, Masf masfwo...@gmail.com wrote: Hi. Are the Windowing and Analytics functions

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Arush Kharbanda
Its not yet implemented. https://issues.apache.org/jira/browse/SPARK-1442 On Thu, Mar 26, 2015 at 4:39 PM, Masf masfwo...@gmail.com wrote: Hi. Are the Windowing and Analytics functions supported in Spark SQL (with HiveContext or not)? For example in Hive is supported https

Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Masf
Hi. Are the Windowing and Analytics functions supported in Spark SQL (with HiveContext or not)? For example in Hive is supported https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Some tutorial or documentation where I can see all features supported by Spark

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Cheng Lian
, 2015 at 4:39 PM, Masf masfwo...@gmail.com mailto:masfwo...@gmail.com wrote: Hi. Are the Windowing and Analytics functions supported in Spark SQL (with HiveContext or not)? For example in Hive is supported https://cwiki.apache.org/confluence/display/Hive/LanguageManual

Re: Mathematical functions in spark sql

2015-01-27 Thread Ted Yu
double to int or something similar? Also it will be cool to get list of functions supported by spark sql. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mathematical-functions-in-spark-sql-tp21383.html Sent from the Apache Spark User List mailing

Re: Mathematical functions in spark sql

2015-01-27 Thread Cheng Lian
...@gmail.com mailto:alexey.romanc...@gmail.com wrote: Hello everyone! I try execute select 2/3 and I get 0.. Is there any way to cast double to int or something similar? Also it will be cool to get list of functions supported by spark

Mathematical functions in spark sql

2015-01-26 Thread 1esha
Hello everyone! I try execute select 2/3 and I get 0.. Is there any way to cast double to int or something similar? Also it will be cool to get list of functions supported by spark sql. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Mathematical functions in spark sql

2015-01-26 Thread Ted Yu
way to cast double to int or something similar? Also it will be cool to get list of functions supported by spark sql. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mathematical-functions-in-spark-sql-tp21383.html Sent from the Apache Spark

Re: Mathematical functions in spark sql

2015-01-26 Thread Alexey Romanchuk
:29 PM, 1esha alexey.romanc...@gmail.com wrote: Hello everyone! I try execute select 2/3 and I get 0.. Is there any way to cast double to int or something similar? Also it will be cool to get list of functions supported by spark sql. Thanks! -- View this message

Re: Functions in Spark

2014-11-17 Thread Gerard Maas
One 'rule of thumbs' is to use rdd.toDebugString and check the lineage for ShuffleRDD. As long as there's no need for restructuring the RDD, operations can be pipelined on each partition. rdd.toDebugString is your friend :-) -kr, Gerard. On Mon, Nov 17, 2014 at 7:37 AM, Mukesh Jha

Functions in Spark

2014-11-16 Thread Deep Pradhan
Hi, Is there any way to know which of my functions perform better in Spark? In other words, say I have achieved same thing using two different implementations. How do I judge as to which implementation is better than the other. Is processing time the only metric that we can use to claim the

Re: Functions in Spark

2014-11-16 Thread Samarth Mailinglist
Check this video out: https://www.youtube.com/watch?v=dmL0N3qfSc8list=UURzsq7k4-kT-h3TDUBQ82-w On Mon, Nov 17, 2014 at 9:43 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, Is there any way to know which of my functions perform better in Spark? In other words, say I have achieved same

Re: Functions in Spark

2014-11-16 Thread Mukesh Jha
Thanks I did go through the video it was very informative, but I think I's looking for the Transformations section @ page https://spark.apache.org/docs/0.9.1/scala-programming-guide.html. On Mon, Nov 17, 2014 at 10:31 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: Check this

Support for Percentile and Variance Aggregation functions in Spark with HiveContext

2014-07-25 Thread vinay . kashyap
Hi all, I am using Spark 1.0.0 with CDH 5.1.0. I want to aggregate the data in a raw table using a simple query like below SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4), year,month,day FROM  raw_data_table  GROUP BY year, month, day MIN, MAX and AVG functions work fine for

Mechanics of passing functions to Spark?

2014-07-09 Thread Seref Arikan
Greetings, The documentation at http://spark.apache.org/docs/latest/programming-guide.html#passing-functions-to-spark says: Note that while it is also possible to pass a reference to a method in a class instance (as opposed to a singleton object), this requires sending the object that contains

How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
will be displayed, but I dont know how to use these functions. Your help is greatly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191.html Sent from the Apache Spark User List mailing

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Gerard Maas
for this RDD will be displayed, but I dont know how to use these functions. Your help is greatly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191.html Sent from

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
Thank you very much Gerard. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Nicholas Chammas
-for-the-functions-in-Spark-shell-tp7191p7193.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Using Java functions in Spark

2014-06-07 Thread Oleg Proudnikov
Increasing number of partitions on data file solved the problem. On 6 June 2014 18:46, Oleg Proudnikov oleg.proudni...@gmail.com wrote: Additional observation - the map and mapValues are pipelined and executed - as expected - in pairs. This means that there is a simple sequence of steps -

Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues does processing of these text blocks into very small ArrayLists.

Re: Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Additional observation - the map and mapValues are pipelined and executed - as expected - in pairs. This means that there is a simple sequence of steps - first read from Cassandra and then processing for each value of K. This is the exact behaviour of a normal Java loop with these two steps