[ https://issues.apache.org/jira/browse/SPARK-26901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-26901. --------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23810 [https://github.com/apache/spark/pull/23810] > Vectorized gapply should not prune columns > ------------------------------------------ > > Key: SPARK-26901 > URL: https://issues.apache.org/jira/browse/SPARK-26901 > Project: Spark > Issue Type: Bug > Components: R, SQL > Affects Versions: 3.0.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Fix For: 3.0.0 > > > Currently, if some columns can be pushed, it's being pushed through > {{FlatMapGroupsInRWithArrow}}. > {code} > explain(count(gapply(df, > "gear", > function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp)) > }, > structType("gear double, disp double"))), TRUE) > {code} > {code} > *(4) HashAggregate(keys=[], functions=[count(1)], output=[count#64L]) > +- Exchange SinglePartition > +- *(3) HashAggregate(keys=[], functions=[partial_count(1)], > output=[count#67L]) > +- *(3) Project > +- FlatMapGroupsInRWithArrow [...] > +- *(2) Sort [gear#9 ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(gear#9, 200) > +- *(1) Project [gear#9] > +- *(1) Scan ExistingRDD > arrow[mpg#0,cyl#1,disp#2,hp#3,drat#4,wt#5,qsec#6,vs#7,am#8,gear#9,carb#10] > {code} > This causes to send corrupt values R workers when the R native functions are > executed. > {code} > c(5, 5, 5, 5, 5) > c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 2.47032822920623e-323) > c(0, 0, 0, 0, 2.05578399548861e-314) > c(3.4483079184909e-313, 3.4483079184909e-313, 3.4483079184909e-313, > 5.31146529464635e-315, 0) > c(0, 0, 0, 0, -2.63230705887168e+228) > c(5, 5, 5, 0, 2.47032822920623e-323) > c(7.90505033345994e-323, 7.90505033345994e-323, 0, 0, 4.17777978645388e-314) > c(0, 0, 0, 0, -2.18328530492023e+219) > c(3.4483079184909e-313, 5.31146529464635e-315, 0, 0, -2.63230127529109e+228) > c(0, 0, 0, 0, 2.47032822920623e-323) > c(5, 0, 0, 0, 4.17777978645388e-314) > c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3) > c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 2.47032822920623e-323) > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.05578399548861e-314) > c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 5.30757980430645e-315, 0) > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.73302088532611e+228) > c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2.47032822920623e-323) > c(7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, > 7.90505033345994e-323, 7.90505033345994e-323, 7.90505033345994e-323, 0, 0, > 4.17777978645388e-314) > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.04669129845114e+219) > c(3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 3.4482690635875e-313, > 3.4482690635875e-313, 3.4482690635875e-313, 5.30757980430645e-315, 0, 0, > -2.73301510174552e+228) > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.47032822920623e-323) > c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 4.17777978645388e-314) > {code} > which should be: > {code} > c(21, 21, 22.8, 24.4, 22.8, 19.2, 17.8, 32.4, 30.4, 33.9, 27.3, 21.4) > c(6, 6, 4, 4, 4, 6, 6, 4, 4, 4, 4, 4) > c(160, 160, 108, 146.7, 140.8, 167.6, 167.6, 78.7, 75.7, 71.1, 79, 121) > c(110, 110, 93, 62, 95, 123, 123, 66, 52, 65, 66, 109) > c(3.9, 3.9, 3.85, 3.69, 3.92, 3.92, 3.92, 4.08, 4.93, 4.22, 4.08, 4.11) > c(2.62, 2.875, 2.32, 3.19, 3.15, 3.44, 3.44, 2.2, 1.615, 1.835, 1.935, 2.78) > c(16.46, 17.02, 18.61, 20, 22.9, 18.3, 18.9, 19.47, 18.52, 19.9, 18.9, 18.6) > c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) > c(1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1) > c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4) > c(4, 4, 1, 2, 2, 4, 4, 1, 2, 1, 1, 2) > c(26, 30.4, 15.8, 19.7, 15) > c(4, 4, 8, 6, 8) > c(120.3, 95.1, 351, 145, 301) > c(91, 113, 264, 175, 335) > c(4.43, 3.77, 4.22, 3.62, 3.54) > c(2.14, 1.513, 3.17, 2.77, 3.57) > c(16.7, 16.9, 14.5, 15.5, 14.6) > c(0, 1, 0, 0, 0) > c(1, 1, 1, 1, 1) > c(5, 5, 5, 5, 5) > c(2, 2, 4, 6, 8) > c(21.4, 18.7, 18.1, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 21.5, 15.5, > 15.2, 13.3, 19.2) > c(6, 8, 6, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8) > c(258, 360, 225, 360, 275.8, 275.8, 275.8, 472, 460, 440, 120.1, 318, 304, > 350, 400) > c(110, 175, 105, 245, 180, 180, 180, 205, 215, 230, 97, 150, 150, 245, 175) > c(3.08, 3.15, 2.76, 3.21, 3.07, 3.07, 3.07, 2.93, 3, 3.23, 3.7, 2.76, 3.15, > 3.73, 3.08) > c(3.215, 3.44, 3.46, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.465, > 3.52, 3.435, 3.84, 3.845) > c(19.44, 17.02, 20.22, 15.84, 17.4, 17.6, 18, 17.98, 17.82, 17.42, 20.01, > 16.87, 17.3, 15.41, 17.05) > c(1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) > c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3) > c(1, 2, 1, 4, 3, 3, 3, 4, 4, 4, 1, 2, 2, 4, 2) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org