Question on List to DF

2022-03-15 Thread Bitfox
I am wondering why the list in scala spark can be converted into a dataframe directly? scala> val df = List("apple","orange","cherry").toDF("fruit") *df*: *org.apache.spark.sql.DataFrame* = [fruit: string] scala> df.show +--+ | fruit| +--+ | apple| |orange| |cherry| +--+ I

Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread Sean Owen
Are you just trying to avoid writing the function call 30 times? Just put this in a loop over all the columns instead, which adds a new corr col every time to a list. On Tue, Mar 15, 2022, 10:30 PM wrote: > Hi all, > > I am stuck at a correlation calculation problem. I have a dataframe like >

calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread ckgppl_yan
Hi all, I am stuck at a correlation calculation problem. I have a dataframe like below:groupiddatacol1datacol2datacol3datacol*corr_co112345123465242175289325371235335315I want to calculate the correlation between all datacol columns and corr_col column by each

Re: pivoting panda dataframe

2022-03-15 Thread Mich Talebzadeh
Thanks, I don't want to use Spark, otherwise I can do this. p_dfm = df.toPandas() # converting spark DF to Pandas DF Can I do it without using Spark? view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: pivoting panda dataframe

2022-03-15 Thread Bjørn Jørgensen
Colums bind in r is concatinat in pandas https://www.datasciencemadesimple.com/append-concatenate-columns-python-pandas-column-bind/ Please start a now thread for each questions. tir. 15. mar. 2022, 22:59 skrev Andrew Davidson : > Many many thanks! > > > > I have been looking for a pyspark

Re: pivoting panda dataframe

2022-03-15 Thread Bjørn Jørgensen
You have a pyspark dataframe and you want to convert it to pandas? Convert it first to pandas api on spark pf01 = f01.to_pandas_on_spark() Then convert it to pandas pf01 = f01.to_pandas() Or? tir. 15. mar. 2022, 22:56 skrev Mich Talebzadeh : > Thanks everyone. > > I want to do the

Re: pivoting panda dataframe

2022-03-15 Thread Andrew Davidson
Many many thanks! I have been looking for a pyspark data frame column_bind() solution for several months. Hopefully pyspark.pandas works. The only other solutions I was aware of was to use spark.dataframe.join(). This does not scale for obvious reason. Andy From: Bjørn Jørgensen Date:

Re: pivoting panda dataframe

2022-03-15 Thread Mich Talebzadeh
Thanks everyone. I want to do the following in pandas and numpy without using spark. This is what I do in spark to generate some random data using class UsedFunctions (not important). class UsedFunctions: def randomString(self,length): letters = string.ascii_letters result_str =

Re: Continuous ML model training in stream mode

2022-03-15 Thread Artemis User
Thanks Sean!  Well, it looks like we have to abandon our structured streaming model to use DStream for this, or do you see possibility to use structured streaming with ml instead of mllib? On 3/15/22 4:51 PM, Sean Owen wrote: There is a streaming k-means example in Spark.

Re: pivoting panda dataframe

2022-03-15 Thread Bjørn Jørgensen
Hi Andrew. Mitch asked, and I answered transpose() https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.transpose.html . And now you are asking in the same thread about pandas API on spark and the transform(). Apache Spark have pandas API on

Re: How Spark establishes connectivity to Hive

2022-03-15 Thread Artemis User
I guess it's really depends on your configuration.  The Hive metastore is providing just the metadata/schema data for your database, not actual data storage.  Hive is running on top of Hadoop. If you configure your Spark to run on the same Hadoop cluster using Yarn, your SQL dataframe in Spark

Re: Continuous ML model training in stream mode

2022-03-15 Thread Sean Owen
There is a streaming k-means example in Spark. https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means On Tue, Mar 15, 2022, 3:46 PM Artemis User wrote: > Has anyone done any experiments of training an ML model using stream > data? especially for unsupervised models? Any

Continuous ML model training in stream mode

2022-03-15 Thread Artemis User
Has anyone done any experiments of training an ML model using stream data? especially for unsupervised models?   Any suggestions/references are highly appreciated... - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: pivoting panda dataframe

2022-03-15 Thread Andrew Davidson
Hi Bjorn I have been looking for spark transform for a while. Can you send me a link to the pyspark function? I assume pandas transform is not really an option. I think it will try to pull the entire dataframe into the drivers memory. Kind regards Andy p.s. My real problem is that spark

Re: pivoting panda dataframe

2022-03-15 Thread Bjørn Jørgensen
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html we have that transpose in pandas api for spark to. You also have stack() and multilevel https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html tir. 15. mar. 2022 kl. 17:50 skrev Mich Talebzadeh <

pivoting panda dataframe

2022-03-15 Thread Mich Talebzadeh
hi, Is it possible to pivot a panda dataframe by making the row column heading? thanks view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility