Re: Jdbc Hook in Spark Batch Application

2020-12-24 Thread lec ssmi
Thanks. But there is a problem that the classes referenced in the code need to be modified. I want to try not to change the existing code. Gabor Somogyi 于2020年12月25日周五 上午12:16写道: > One can wrap the JDBC driver and such a way eveything can be sniffed. > > On Thu, 24 Dec 2020, 03:51 lec ssmi,

unsubscribe

2020-12-24 Thread Richardson, Jeff

Re: Jdbc Hook in Spark Batch Application

2020-12-24 Thread Gabor Somogyi
One can wrap the JDBC driver and such a way eveything can be sniffed. On Thu, 24 Dec 2020, 03:51 lec ssmi, wrote: > Hi: >guys, I have some spark programs that have database connection > operations. I want to acquire the connection information, such as jdbc > connection properties , but

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
Why not just use STDDEV_SAMP? it's probably more accurate than the differences-of-squares calculation. You can write an aggregate UDF that calls numpy and register it for SQL, but, it is already a built-in. On Thu, Dec 24, 2020 at 8:12 AM Mich Talebzadeh wrote: > Thanks for the feedback. > > I

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Thanks for the feedback. I have a question here. I want to use numpy STD as well but just using sql in pyspark. Like below sqltext = f""" SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount , rs.Average_order ,

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
I don't know which one is 'correct' (it's not standard SQL?) or whether it's the sample stdev for a good reason or just historical now. But you can always call STDDEV_SAMP (in any DB) if needed. It's equivalent to numpy.std with ddof=1, the Bessel-corrected standard deviation. On Thu, Dec 24,

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Well the truth is that we had this discussion in 2016 :(. what Hive calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified yet! Spark-sql, Oracle and Sybase point STDDEV to STDDEV_SAMP and not STDDEV_POP. Run a test on *Hive* SELECT