Re: 回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread Enrico Minack
of columns to a new data frame. It seems that there is no direct  API to do this. - 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame 日期:2022年03月16日 11点55分 Are you just

回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread ckgppl_yan
columns and one specific column after groupby the spark data frame 日期:2022年03月16日 11点55分 Are you just trying to avoid writing the function call 30 times? Just put this in a loop over all the columns instead, which adds a new corr col every time to a list. On Tue, Mar 15, 2022, 10:30 PM wrote

Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread Sean Owen
Are you just trying to avoid writing the function call 30 times? Just put this in a loop over all the columns instead, which adds a new corr col every time to a list. On Tue, Mar 15, 2022, 10:30 PM wrote: > Hi all, > > I am stuck at a correlation calculation problem. I have a dataframe like >

calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-15 Thread ckgppl_yan
Hi all, I am stuck at a correlation calculation problem. I have a dataframe like below:groupiddatacol1datacol2datacol3datacol*corr_co112345123465242175289325371235335315I want to calculate the correlation between all datacol columns and corr_col column by each

Re: Type Casting Error in Spark Data Frame

2018-01-31 Thread vijay.bvp
formatted = Assuming MessageHelper.sqlMapping schema is correctly mapped with input json (it would help if the schema and sample json is shared) here is explode function with dataframes similar functionality is available with SQL import sparkSession.implicits._ import

Re: Type Casting Error in Spark Data Frame

2018-01-31 Thread vijay.bvp
Assuming MessageHelper.sqlMapping schema is correctly mapped with input json (it would help if the schema and sample json is shared)here is explode function with dataframes similar functionality is available with SQL import sparkSession.implicits._import org.apache.spark.sql.functions._val

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Jean Georges Perrin
You can try to create new columns with the nested value, > On Jan 29, 2018, at 15:26, Arnav kumar wrote: > > Hello Experts, > > I would need your advice in resolving the below issue when I am trying to > retrieving the data from a dataframe. > > Can you please let me

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Patrick McCarthy
You can't select from an array like that, try instead using 'lateral view explode' in the query for that element, or before the sql stage (py)spark.sql.functions.explode. On Mon, Jan 29, 2018 at 4:26 PM, Arnav kumar wrote: > Hello Experts, > > I would need your advice in

Type Casting Error in Spark Data Frame

2018-01-29 Thread Arnav kumar
Hello Experts, I would need your advice in resolving the below issue when I am trying to retrieving the data from a dataframe. Can you please let me know where I am going wrong. code : // create the dataframe by parsing the json // Message Helper describes the JSON Struct //data out is the

Re: Spark Data Frame. PreSorded partitions

2017-11-28 Thread Michael Artz
stom DataSource for Spark Data Frame API and > have a question: > > If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort > data inside a partition in my data source. > > Do I have a built-in option to tell spark that data from each partition > already sorted?

Spark Data Frame. PreSorded partitions

2017-11-28 Thread Николай Ижиков
Hello, guys! I work on implementation of custom DataSource for Spark Data Frame API and have a question: If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort data inside a partition in my data source. Do I have a built-in option to tell spark that data from each partition

Re: Spark Data Frame Writer - Range Partiotioning

2017-07-25 Thread Jain, Nishit
ain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark Data Frame Writer - Range Partiotioning How about creating a part

Re: Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread ayan guha
How about creating a partituon column and use it? On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nja...@underarmour.com> wrote: > Is it possible to have Spark Data Frame Writer write based on > RangePartioning? > > For Ex - > > I have 10 distinct values for column_a, s

Spark Data Frame Writer - Range Partiotioning

2017-07-21 Thread Jain, Nishit
Is it possible to have Spark Data Frame Writer write based on RangePartioning? For Ex - I have 10 distinct values for column_a, say 1 to 10. df.write .partitionBy("column_a") Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10 I want to see if it i

Re: Spark data frame map problem

2017-03-23 Thread Yan Facai
Could you give more details of your code? On Wed, Mar 22, 2017 at 2:40 AM, Shashank Mandil <mandil.shash...@gmail.com> wrote: > Hi All, > > I have a spark data frame which has 992 rows inside it. > When I run a map on this data frame I expect that the map should work for

Spark data frame map problem

2017-03-21 Thread Shashank Mandil
Hi All, I have a spark data frame which has 992 rows inside it. When I run a map on this data frame I expect that the map should work for all the 992 rows. As a mapper runs on an executor on a cluster I did a distributed count of the number of rows the mapper is being run on. dataframe.map(r

RE: as.Date can't be applied to Spark data frame in SparkR

2016-09-19 Thread xingye
Update: the job can finish, but takes a long time on a 10M row data. is there a better solution? From: xing_ma...@hotmail.com To: user@spark.apache.org Subject: as.Date can't be applied to Spark data frame in SparkR Date: Tue, 20 Sep 2016 10:22:17 +0800 Hi, all I've noticed that as.Date can't

as.Date can't be applied to Spark data frame in SparkR

2016-09-19 Thread xingye
Hi, all I've noticed that as.Date can't be applied to Spark data frame. I've created the following UDF and used dapply to change a integer column "aa" to a date with origin as 1960-01-01. change_date<-function(df){ df<-as.POSIXlt(as.Date(df$aa, origin = "1

Re: Spark data frame

2015-12-22 Thread Dean Wampler
Dean Wampler <deanwamp...@gmail.com> > Cc: Gaurav Agarwal <gaurav130...@gmail.com>, "user@spark.apache.org" < > user@spark.apache.org> > Subject: Re: Spark data frame > > Dean, > > RDD in memory and then the collect() resulting in a collection, where both

Re: Spark data frame

2015-12-22 Thread Dean Wampler
You can call the collect() method to return a collection, but be careful. If your data is too big to fit in the driver's memory, it will crash. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe

Re: Spark data frame

2015-12-22 Thread Silvio Fiorito
org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark data frame Dean, RDD in memory and then the collect() resulting in a collection, where both are alive at the same time. (Again not sure how Tungsten plays in to t

Spark data frame

2015-12-22 Thread Gaurav Agarwal
We are able to retrieve data frame by filtering the rdd object . I need to convert that data frame into java pojo. Any idea how to do that

Re: spark data frame write.mode("append") bug

2015-12-12 Thread sri hari kali charan Tummala
com/kali786516/ScalaDB/blob/master/src/main/java/com/kali/db/SaprkSourceToTargetBulkLoad.scala >> >> Spring Config File:- >> >> https://github.com/kali786516/ScalaDB/blob/master/src/main/resources/SourceToTargetBulkLoad.xml >&

Re: spark data frame write.mode("append") bug

2015-12-12 Thread kali.tumm...@gmail.com
also include the database name. Try(conn.prepareStatement(s"SELECT 1 FROM $table where 1=2").executeQuery().next()).isSuccess } Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-frame-write-mode-append-bug-tp25650p25693.html Sent fr

Re: spark data frame write.mode("append") bug

2015-12-12 Thread Michael Armbrust
all > // SQL database systems, considering "table" could also include the > database name. > Try(conn.prepareStatement(s"SELECT 1 FROM $table where > 1=2").executeQuery().next()).isSuccess > } > > > > Thanks > > > > -- > V

spark data frame write.mode("append") bug

2015-12-09 Thread kali.tumm...@gmail.com
ToTargetBulkLoad.xml Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-frame-write-mode-append-bug-tp25650.html Sent from the Apache Spark User List mailing list a

Re: spark data frame write.mode("append") bug

2015-12-09 Thread Seongduk Cheon
rkSourceToTargetBulkLoad.scala > > Spring Config File:- > > https://github.com/kali786516/ScalaDB/blob/master/src/main/resources/SourceToTargetBulkLoad.xml > > > Thanks > Sri > > > > -- > View this message in context: > http://apache-spark-user-list.10015

Re: Hive ORC Malformed while loading into spark data frame

2015-10-04 Thread Umesh Kacha
>>>> >>>>> If you generate table using have but read it by data frame, it may >>>>> have some comparability issue. >>>>> >>>>> Thanks >>>>> >>>>> Zhan Zhang >>>>> >>>>

Re: Hive ORC Malformed while loading into spark data frame

2015-10-03 Thread Umesh Kacha
; > partitions. It works well I can read data back into hive table using >>>> hive >>>> > console. But if I try further process orc files generated by Spark >>>> job by >>>> > loading into dataframe then I get the following exception >>&

Re: Hive ORC Malformed while loading into spark data frame

2015-09-30 Thread Umesh Kacha
hich creates hive tables in orc format with >>> > partitions. It works well I can read data back into hive table using >>> hive >>> > console. But if I try further process orc files generated by Spark job >>> by >>> > loading into dataframe then

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Hortonworks
/user/hive/warehouse/partorc/part_tiny.txt. Invalid >> > postscript. >> > >> > Dataframe df = hiveContext.read().format("orc").load(to/path); >> > >> > Please g

Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread unk1102
m/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-ma

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Umesh Kacha
art_tiny.txt. Invalid > > postscript. > > > > Dataframe df = hiveContext.read().format("orc").load(to/path); > > > > Please guide. > > > > > > > > -- > > View this message in context: > http://apache-sp

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Hortonworks
ntext: > http://apache-spark-user-list.1001560.n3.nabble.com/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To uns

Re: Hive ORC Malformed while loading into spark data frame

2015-09-29 Thread Umesh Kacha
st:9000/user/hive/warehouse/partorc/part_tiny.txt. Invalid >> > postscript. >> > >> > Dataframe df = hiveContext.read().format("orc").load(to/path); >> > >> > Please guide. >> > >> > >> > >> > -- >> > View