[jira] [Resolved] (SPARK-43627) Enable pyspark.pandas.spark.functions.skew in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43627. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41604 [https://github.com/apache/spark/pull/41604] > Enable pyspark.pandas.spark.functions.skew in Spark Connect. > > > Key: SPARK-43627 > URL: https://issues.apache.org/jira/browse/SPARK-43627 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.skew in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43626) Enable pyspark.pandas.spark.functions.kurt in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43626. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41604 [https://github.com/apache/spark/pull/41604] > Enable pyspark.pandas.spark.functions.kurt in Spark Connect. > > > Key: SPARK-43626 > URL: https://issues.apache.org/jira/browse/SPARK-43626 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.kurt in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44064) Maven test `ProductAggSuite` aborted
Yang Jie created SPARK-44064: Summary: Maven test `ProductAggSuite` aborted Key: SPARK-44064 URL: https://issues.apache.org/jira/browse/SPARK-44064 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yang Jie run {code:java} ./build/mvn -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl clean install build/mvn test -pl sql/catalyst {code} aborted {code:java} ProductAggSuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$ at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64) at org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44064) Maven test `ProductAggSuite` aborted
[ https://issues.apache.org/jira/browse/SPARK-44064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44064: - Issue Type: Bug (was: Improvement) > Maven test `ProductAggSuite` aborted > > > Key: SPARK-44064 > URL: https://issues.apache.org/jira/browse/SPARK-44064 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > run > > {code:java} > ./build/mvn -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive > -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl clean install > build/mvn test -pl sql/catalyst {code} > aborted > > {code:java} > ProductAggSuite: > *** RUN ABORTED *** > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$ > at > org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.variable(javaCode.scala:64) > at > org.apache.spark.sql.catalyst.expressions.codegen.JavaCode$.isNullVariable(javaCode.scala:77) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:196) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.$anonfun$create$1(GenerateSafeProjection.scala:156) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:153) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1369) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44058) Remove deprecated API usage in HiveShim.scala
[ https://issues.apache.org/jira/browse/SPARK-44058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732851#comment-17732851 ] Yuming Wang commented on SPARK-44058: - This is used to connect Hive metastore 0.12. > Remove deprecated API usage in HiveShim.scala > - > > Key: SPARK-44058 > URL: https://issues.apache.org/jira/browse/SPARK-44058 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.4.0 >Reporter: Aman Raj >Priority: Major > > Spark's HiveShim.scala calls this particular method in Hive : > createPartitionMethod.invoke( > hive, > table, > spec, > location, > params, // partParams > null, // inputFormat > null, // outputFormat > -1: JInteger, // numBuckets > null, // cols > null, // serializationLib > null, // serdeParams > null, // bucketCols > null) // sortCols > } > > We do not have any such implementation of createPartition in Hive. We only > have this definition : > public Partition createPartition(Table tbl, Map partSpec) > throws HiveException { > try > { org.apache.hadoop.hive.metastore.api.Partition part = > Partition.createMetaPartitionObject(tbl, partSpec, null); > AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, > tbl); part.setWriteId(tableSnapshot != null ? > tableSnapshot.getWriteId() : 0); return new Partition(tbl, > getMSC().add_partition(part)); } > catch (Exception e) > { LOG.error(StringUtils.stringifyException(e)); throw new > HiveException(e); } > } > *The 12 parameter implementation was removed in HIVE-5951* > > The issue is that this 12 parameter implementation of createPartition method > was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was > used in Spark, SPARK-15334 commit in Spark added this 12 parameters > implementation. But after Hive migrated to newer APIs somehow this was not > changed in Spark OSS and it looks to us like a Bug from the Spark end. > > We need to migrate to the newest implementation of Hive createPartition > method otherwise this flow can break -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732843#comment-17732843 ] Abhijeet Singh commented on SPARK-43259: i want to work on this issue. raised a pr for same https://github.com/apache/spark/pull/41607 > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43937) Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43937: -- Summary: Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python (was: Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python) > Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python > --- > > Key: SPARK-43937 > URL: https://issues.apache.org/jira/browse/SPARK-43937 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * -not- > * -if- > * ifnull > * isnotnull > * equal_null > * nullif > * nvl > * nvl2 > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43937) Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43937: -- Description: Add following functions: * -not- * -if- * ifnull * isnotnull * equal_null * nullif * nvl * nvl2 to: * Scala API * Python API * Spark Connect Scala Client * Spark Connect Python Client was: Add following functions: * not * if * ifnull * isnotnull * equal_null * nullif * nvl * nvl2 to: * Scala API * Python API * Spark Connect Scala Client * Spark Connect Python Client > Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python > --- > > Key: SPARK-43937 > URL: https://issues.apache.org/jira/browse/SPARK-43937 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * -not- > * -if- > * ifnull > * isnotnull > * equal_null > * nullif > * nvl > * nvl2 > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43937) Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43937: -- Summary: Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python (was: Add not,if,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python) > Add not,ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python > --- > > Key: SPARK-43937 > URL: https://issues.apache.org/jira/browse/SPARK-43937 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * not > * if > * ifnull > * isnotnull > * equal_null > * nullif > * nvl > * nvl2 > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44058) Remove deprecated API usage in HiveShim.scala
[ https://issues.apache.org/jira/browse/SPARK-44058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732840#comment-17732840 ] Aman Raj commented on SPARK-44058: -- [~yumwang] In that case this function of createPartition is not required right? > Remove deprecated API usage in HiveShim.scala > - > > Key: SPARK-44058 > URL: https://issues.apache.org/jira/browse/SPARK-44058 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.4.0 >Reporter: Aman Raj >Priority: Major > > Spark's HiveShim.scala calls this particular method in Hive : > createPartitionMethod.invoke( > hive, > table, > spec, > location, > params, // partParams > null, // inputFormat > null, // outputFormat > -1: JInteger, // numBuckets > null, // cols > null, // serializationLib > null, // serdeParams > null, // bucketCols > null) // sortCols > } > > We do not have any such implementation of createPartition in Hive. We only > have this definition : > public Partition createPartition(Table tbl, Map partSpec) > throws HiveException { > try > { org.apache.hadoop.hive.metastore.api.Partition part = > Partition.createMetaPartitionObject(tbl, partSpec, null); > AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, > tbl); part.setWriteId(tableSnapshot != null ? > tableSnapshot.getWriteId() : 0); return new Partition(tbl, > getMSC().add_partition(part)); } > catch (Exception e) > { LOG.error(StringUtils.stringifyException(e)); throw new > HiveException(e); } > } > *The 12 parameter implementation was removed in HIVE-5951* > > The issue is that this 12 parameter implementation of createPartition method > was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was > used in Spark, SPARK-15334 commit in Spark added this 12 parameters > implementation. But after Hive migrated to newer APIs somehow this was not > changed in Spark OSS and it looks to us like a Bug from the Spark end. > > We need to migrate to the newest implementation of Hive createPartition > method otherwise this flow can break -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44063) Revert SPARK-44047
[ https://issues.apache.org/jira/browse/SPARK-44063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan resolved SPARK-44063. - Resolution: Won't Fix It's my local environmental issue. > Revert SPARK-44047 > -- > > Key: SPARK-44063 > URL: https://issues.apache.org/jira/browse/SPARK-44063 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44063) Revert SPARK-44047
BingKun Pan created SPARK-44063: --- Summary: Revert SPARK-44047 Key: SPARK-44063 URL: https://issues.apache.org/jira/browse/SPARK-44063 Project: Spark Issue Type: Bug Components: Build, Connect Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43926) Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732825#comment-17732825 ] Ruifeng Zheng commented on SPARK-43926: --- [~ivoson] many thanks, please go ahead > Add array_agg, array_size, cardinality, > count_min_sketch,mask,named_struct,json_* to Scala and Python > - > > Key: SPARK-43926 > URL: https://issues.apache.org/jira/browse/SPARK-43926 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add array_agg, array_size, cardinality, count_min_sketch > Add following functions: > * array_agg > * array_size > * cardinality > * count_min_sketch > * named_struct > * json_array_length > * json_object_keys > * mask > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43627) Enable pyspark.pandas.spark.functions.skew in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43627: - Assignee: Ruifeng Zheng > Enable pyspark.pandas.spark.functions.skew in Spark Connect. > > > Key: SPARK-43627 > URL: https://issues.apache.org/jira/browse/SPARK-43627 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > > Enable pyspark.pandas.spark.functions.skew in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43626) Enable pyspark.pandas.spark.functions.kurt in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43626: - Assignee: Ruifeng Zheng > Enable pyspark.pandas.spark.functions.kurt in Spark Connect. > > > Key: SPARK-43626 > URL: https://issues.apache.org/jira/browse/SPARK-43626 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > > Enable pyspark.pandas.spark.functions.kurt in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43941) Add any_value, approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43941. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41588 [https://github.com/apache/spark/pull/41588] > Add any_value, > approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala > and Python > -- > > Key: SPARK-43941 > URL: https://issues.apache.org/jira/browse/SPARK-43941 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * any_value > * approx_percentile > * count_if > * first_value > * histogram_numeric > * last_value > * reduce > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43941) Add any_value, approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43941: - Assignee: jiaan.geng > Add any_value, > approx_percentile,count_if,first_value,histogram_numeric,last_value to Scala > and Python > -- > > Key: SPARK-43941 > URL: https://issues.apache.org/jira/browse/SPARK-43941 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: jiaan.geng >Priority: Major > > Add following functions: > * any_value > * approx_percentile > * count_if > * first_value > * histogram_numeric > * last_value > * reduce > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43659) Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq
[ https://issues.apache.org/jira/browse/SPARK-43659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43659. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41582 [https://github.com/apache/spark/pull/41582] > Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq > --- > > Key: SPARK-43659 > URL: https://issues.apache.org/jira/browse/SPARK-43659 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43659) Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq
[ https://issues.apache.org/jira/browse/SPARK-43659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43659: - Assignee: Haejoon Lee > Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq > --- > > Key: SPARK-43659 > URL: https://issues.apache.org/jira/browse/SPARK-43659 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44058) Remove deprecated API usage in HiveShim.scala
[ https://issues.apache.org/jira/browse/SPARK-44058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732806#comment-17732806 ] Yuming Wang commented on SPARK-44058: - For Hive 0.13 and later, we use https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L753-L768. > Remove deprecated API usage in HiveShim.scala > - > > Key: SPARK-44058 > URL: https://issues.apache.org/jira/browse/SPARK-44058 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.4.0 >Reporter: Aman Raj >Priority: Major > > Spark's HiveShim.scala calls this particular method in Hive : > createPartitionMethod.invoke( > hive, > table, > spec, > location, > params, // partParams > null, // inputFormat > null, // outputFormat > -1: JInteger, // numBuckets > null, // cols > null, // serializationLib > null, // serdeParams > null, // bucketCols > null) // sortCols > } > > We do not have any such implementation of createPartition in Hive. We only > have this definition : > public Partition createPartition(Table tbl, Map partSpec) > throws HiveException { > try > { org.apache.hadoop.hive.metastore.api.Partition part = > Partition.createMetaPartitionObject(tbl, partSpec, null); > AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, > tbl); part.setWriteId(tableSnapshot != null ? > tableSnapshot.getWriteId() : 0); return new Partition(tbl, > getMSC().add_partition(part)); } > catch (Exception e) > { LOG.error(StringUtils.stringifyException(e)); throw new > HiveException(e); } > } > *The 12 parameter implementation was removed in HIVE-5951* > > The issue is that this 12 parameter implementation of createPartition method > was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was > used in Spark, SPARK-15334 commit in Spark added this 12 parameters > implementation. But after Hive migrated to newer APIs somehow this was not > changed in Spark OSS and it looks to us like a Bug from the Spark end. > > We need to migrate to the newest implementation of Hive createPartition > method otherwise this flow can break -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43975) DataSource V2: Handle UPDATE commands for group-based sources
[ https://issues.apache.org/jira/browse/SPARK-43975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43975: - Assignee: Anton Okolnychyi > DataSource V2: Handle UPDATE commands for group-based sources > - > > Key: SPARK-43975 > URL: https://issues.apache.org/jira/browse/SPARK-43975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > We need to handle UPDATE commands for group-based sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43975) DataSource V2: Handle UPDATE commands for group-based sources
[ https://issues.apache.org/jira/browse/SPARK-43975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43975. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41600 [https://github.com/apache/spark/pull/41600] > DataSource V2: Handle UPDATE commands for group-based sources > - > > Key: SPARK-43975 > URL: https://issues.apache.org/jira/browse/SPARK-43975 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > We need to handle UPDATE commands for group-based sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44062) Add PySparkTestBase unit test class
Amanda Liu created SPARK-44062: -- Summary: Add PySparkTestBase unit test class Key: SPARK-44062 URL: https://issues.apache.org/jira/browse/SPARK-44062 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44061) Add assert_df_equality util function
Amanda Liu created SPARK-44061: -- Summary: Add assert_df_equality util function Key: SPARK-44061 URL: https://issues.apache.org/jira/browse/SPARK-44061 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44060) Code-gen for build side outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-44060: -- Description: Here, build side outer join means LEFT OUTER join with build left, or RIGHT OUTER join with build right. As a followup for https://github.com/apache/spark/pull/41398 (non-codegen build-side outer shuffled hash join), this task is to add code-gen for it. > Code-gen for build side outer shuffled hash join > > > Key: SPARK-44060 > URL: https://issues.apache.org/jira/browse/SPARK-44060 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Priority: Major > > Here, build side outer join means LEFT OUTER join with build left, or RIGHT > OUTER join with build right. > As a followup for https://github.com/apache/spark/pull/41398 (non-codegen > build-side outer shuffled hash join), this task is to add code-gen for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44060) Code-gen for build side outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-44060: -- Description: Here, build side outer join means LEFT OUTER join with build left, or RIGHT OUTER join with build right. As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 (non-codegen build-side outer shuffled hash join), this task is to add code-gen for it. was: Here, build side outer join means LEFT OUTER join with build left, or RIGHT OUTER join with build right. As a followup for https://github.com/apache/spark/pull/41398 (non-codegen build-side outer shuffled hash join), this task is to add code-gen for it. > Code-gen for build side outer shuffled hash join > > > Key: SPARK-44060 > URL: https://issues.apache.org/jira/browse/SPARK-44060 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Priority: Major > > Here, build side outer join means LEFT OUTER join with build left, or RIGHT > OUTER join with build right. > As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 > (non-codegen build-side outer shuffled hash join), this task is to add > code-gen for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44060) Code-gen for build side outer shuffled hash join
Szehon Ho created SPARK-44060: - Summary: Code-gen for build side outer shuffled hash join Key: SPARK-44060 URL: https://issues.apache.org/jira/browse/SPARK-44060 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Szehon Ho -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43440) Support registration of an Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43440: - Summary: Support registration of an Arrow Python UDF (was: Support registration of an Arrow-optimized Python UDF ) > Support registration of an Arrow Python UDF > > > Key: SPARK-43440 > URL: https://issues.apache.org/jira/browse/SPARK-43440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > Currently, when users register an Arrow-optimized Python UDF, it will be > registered as a pickled Python UDF and thus, executed without Arrow > optimization. > We should support Arrow-optimized Python UDFs registration and execute them > with Arrow optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43893: - Summary: Non-atomic data type support in Arrow Python UDF (was: Non-atomic data type support in Arrow-optimized Python UDF) > Non-atomic data type support in Arrow Python UDF > > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43412) Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-43412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43412: - Summary: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs (was: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs) > Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs > > > Key: SPARK-43412 > URL: https://issues.apache.org/jira/browse/SPARK-43412 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > We are about to improve nested non-atomic input/output support of an > Arrow-optimized Python UDF. > However, currently, it shares the same EvalType with a pickled Python UDF, > but the same implementation with a Pandas UDF. > Introducing an EvalType enables isolating the changes to Arrow-optimized > Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43082) Arrow Python UDFs in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43082: - Summary: Arrow Python UDFs in Spark Connect (was: Arrow-optimized Python UDFs in Spark Connect) > Arrow Python UDFs in Spark Connect > -- > > Key: SPARK-43082 > URL: https://issues.apache.org/jira/browse/SPARK-43082 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > Implement Arrow-optimized Python UDFs in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42893) Block Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42893: - Summary: Block Arrow Python UDFs (was: Block Arrow-optimized Python UDFs) > Block Arrow Python UDFs > --- > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40307) Introduce Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40307: - Summary: Introduce Arrow Python UDFs (was: Introduce Arrow-optimized Python UDFs) > Introduce Arrow Python UDFs > --- > > Key: SPARK-40307 > URL: https://issues.apache.org/jira/browse/SPARK-40307 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.4.0, 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Python user-defined function (UDF) enables users to run arbitrary code > against PySpark columns. It uses Pickle for (de)serialization and executes > row by row. > One major performance bottleneck of Python UDFs is (de)serialization, that > is, the data interchanging between the worker JVM and the spawned Python > subprocess which actually executes the UDF. We should seek an alternative to > handle the (de)serialization: Arrow, which is used in the (de)serialization > of Pandas UDF already. > There should be two ways to enable/disable the Arrow optimization for Python > UDFs: > - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, > disabled by default. > - the `useArrow` parameter of the `udf` function, None by default. > The Spark configuration takes effect only when `useArrow` is None. Otherwise, > `useArrow` decides whether a specific user-defined function is optimized by > Arrow or not. > The reason why we introduce these two ways is to provide both a convenient, > per-Spark-session control and a finer-grained, per-UDF control of the Arrow > optimization for Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43903: - Summary: Improve ArrayType input support in Arrow Python UDF (was: Improve ArrayType input support in Arrow-optimized Python UDF) > Improve ArrayType input support in Arrow Python UDF > --- > > Key: SPARK-43903 > URL: https://issues.apache.org/jira/browse/SPARK-43903 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43903: - Summary: Improve ArrayType input support in Arrow-optimized Python UDF (was: Non-atomic data type support in Arrow-optimized Python UDF) > Improve ArrayType input support in Arrow-optimized Python UDF > - > > Key: SPARK-43903 > URL: https://issues.apache.org/jira/browse/SPARK-43903 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43893: - Summary: Non-atomic data type support in Arrow-optimized Python UDF (was: StructType input/output support in Arrow-optimized Python UDF) > Non-atomic data type support in Arrow-optimized Python UDF > -- > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44057) Mark all `local-cluster` tests as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732751#comment-17732751 ] GridGain Integration commented on SPARK-44057: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/41601 > Mark all `local-cluster` tests as `ExtendedSQLTest` > --- > > Key: SPARK-44057 > URL: https://issues.apache.org/jira/browse/SPARK-44057 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > > This issue aims to mark all `local-cluster` tests as `ExtendedSQLTest` > https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/251144/signedlogcontent/12?urlExpires=2023-06-14T17%3A11%3A50.2399742Z&urlSigningMethod=HMACV1&urlSignature=%2FHTlrgaHtF2Jv65vw%2Fj4SzT69etebI0swSSM6dXC0tk%3D > {code} > $ git grep local-cluster sql/core/ > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala:// > Additional tests run in 'local-cluster' mode. > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala: > .setMaster("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala: > "--master", "local-cluster[1,1,1024]", > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: > .master("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > * Tests in this suite we need to run Spark in local-cluster mode. In > particular, the use of > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > .master("local-cluster[2,1,512]") > sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala: > .config(sparkConf.setMaster("local-cluster[2, 1, 1024]")) > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > // Create a new [[SparkSession]] running in local-cluster mode. > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > .master("local-cluster[2,1,1024]") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44057) Mark all `local-cluster` tests as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44057: - Assignee: Dongjoon Hyun > Mark all `local-cluster` tests as `ExtendedSQLTest` > --- > > Key: SPARK-44057 > URL: https://issues.apache.org/jira/browse/SPARK-44057 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > > This issue aims to mark all `local-cluster` tests as `ExtendedSQLTest` > https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/251144/signedlogcontent/12?urlExpires=2023-06-14T17%3A11%3A50.2399742Z&urlSigningMethod=HMACV1&urlSignature=%2FHTlrgaHtF2Jv65vw%2Fj4SzT69etebI0swSSM6dXC0tk%3D > {code} > $ git grep local-cluster sql/core/ > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala:// > Additional tests run in 'local-cluster' mode. > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala: > .setMaster("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala: > "--master", "local-cluster[1,1,1024]", > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: > .master("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > * Tests in this suite we need to run Spark in local-cluster mode. In > particular, the use of > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > .master("local-cluster[2,1,512]") > sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala: > .config(sparkConf.setMaster("local-cluster[2, 1, 1024]")) > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > // Create a new [[SparkSession]] running in local-cluster mode. > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > .master("local-cluster[2,1,1024]") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44057) Mark all `local-cluster` tests as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-44057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44057. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41601 [https://github.com/apache/spark/pull/41601] > Mark all `local-cluster` tests as `ExtendedSQLTest` > --- > > Key: SPARK-44057 > URL: https://issues.apache.org/jira/browse/SPARK-44057 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > > This issue aims to mark all `local-cluster` tests as `ExtendedSQLTest` > https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/251144/signedlogcontent/12?urlExpires=2023-06-14T17%3A11%3A50.2399742Z&urlSigningMethod=HMACV1&urlSignature=%2FHTlrgaHtF2Jv65vw%2Fj4SzT69etebI0swSSM6dXC0tk%3D > {code} > $ git grep local-cluster sql/core/ > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: > val session = SparkSession.builder().master("local-cluster[3, 1, > 1024]").getOrCreate() > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala:// > Additional tests run in 'local-cluster' mode. > sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala: > .setMaster("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala: > "--master", "local-cluster[1,1,1024]", > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: > .master("local-cluster[2,1,1024]") > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > * Tests in this suite we need to run Spark in local-cluster mode. In > particular, the use of > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: >* Create a new [[SparkSession]] running in local-cluster mode with unsafe > and codegen enabled. > sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: > .master("local-cluster[2,1,512]") > sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala: > .config(sparkConf.setMaster("local-cluster[2, 1, 1024]")) > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > // Create a new [[SparkSession]] running in local-cluster mode. > sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: > .master("local-cluster[2,1,1024]") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732747#comment-17732747 ] Dongjoon Hyun commented on SPARK-44041: --- Nice! Looking forwarding to seeing it. > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44059) Add named argument support for SQL functions
[ https://issues.apache.org/jira/browse/SPARK-44059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Yu updated SPARK-44059: --- Description: Today, there is increasing demand for named argument functions, especially as we continue to introduce longer and longer parameter lists in our SQL functions. In these functions, many arguments could have default values, making it a waste to specify them all even if it is redundant. This is an umbrella ticket to track smaller subtasks which would be completed for implementing this feature. Issues currently tracked: https://issues.apache.org/jira/browse/SPARK-43922 was:Today, there is increasing demand for named argument functions, especially as we continue to introduce longer and longer parameter lists in our SQL functions. In these functions, many arguments could have default values, making it a waste to specify them all even if it is redundant. This is an umbrella ticket to track smaller subtasks which would be completed for implementing this feature. > Add named argument support for SQL functions > > > Key: SPARK-44059 > URL: https://issues.apache.org/jira/browse/SPARK-44059 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Richard Yu >Priority: Major > > Today, there is increasing demand for named argument functions, especially as > we continue to introduce longer and longer parameter lists in our SQL > functions. In these functions, many arguments could have default values, > making it a waste to specify them all even if it is redundant. This is an > umbrella ticket to track smaller subtasks which would be completed for > implementing this feature. > Issues currently tracked: > https://issues.apache.org/jira/browse/SPARK-43922 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44059) Add named argument support for SQL functions
Richard Yu created SPARK-44059: -- Summary: Add named argument support for SQL functions Key: SPARK-44059 URL: https://issues.apache.org/jira/browse/SPARK-44059 Project: Spark Issue Type: New Feature Components: Spark Core, SQL Affects Versions: 3.5.0 Reporter: Richard Yu Today, there is increasing demand for named argument functions, especially as we continue to introduce longer and longer parameter lists in our SQL functions. In these functions, many arguments could have default values, making it a waste to specify them all even if it is redundant. This is an umbrella ticket to track smaller subtasks which would be completed for implementing this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44058) Remove deprecated API usage in HiveShim.scala
[ https://issues.apache.org/jira/browse/SPARK-44058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated SPARK-44058: - Description: Spark's HiveShim.scala calls this particular method in Hive : createPartitionMethod.invoke( hive, table, spec, location, params, // partParams null, // inputFormat null, // outputFormat -1: JInteger, // numBuckets null, // cols null, // serializationLib null, // serdeParams null, // bucketCols null) // sortCols } We do not have any such implementation of createPartition in Hive. We only have this definition : public Partition createPartition(Table tbl, Map partSpec) throws HiveException { try { org.apache.hadoop.hive.metastore.api.Partition part = Partition.createMetaPartitionObject(tbl, partSpec, null); AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl); part.setWriteId(tableSnapshot != null ? tableSnapshot.getWriteId() : 0); return new Partition(tbl, getMSC().add_partition(part)); } catch (Exception e) { LOG.error(StringUtils.stringifyException(e)); throw new HiveException(e); } } *The 12 parameter implementation was removed in HIVE-5951* The issue is that this 12 parameter implementation of createPartition method was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was used in Spark, SPARK-15334 commit in Spark added this 12 parameters implementation. But after Hive migrated to newer APIs somehow this was not changed in Spark OSS and it looks to us like a Bug from the Spark end. We need to migrate to the newest implementation of Hive createPartition method otherwise this flow can break was: Spark's HiveShim.scala calls this particular method in Hive : createPartitionMethod.invoke( hive, table, spec, location, params, // partParams null, // inputFormat null, // outputFormat -1: JInteger, // numBuckets null, // cols null, // serializationLib null, // serdeParams null, // bucketCols null) // sortCols } We do not have any such implementation of createPartition in Hive. We only have this definition : public Partition createPartition(Table tbl, Map partSpec) throws HiveException { try { org.apache.hadoop.hive.metastore.api.Partition part = Partition.createMetaPartitionObject(tbl, partSpec, null); AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl); part.setWriteId(tableSnapshot != null ? tableSnapshot.getWriteId() : 0); return new Partition(tbl, getMSC().add_partition(part)); } catch (Exception e) { LOG.error(StringUtils.stringifyException(e)); throw new HiveException(e); } } The issue is that this 12 parameter implementation of createPartition method was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was used in Spark, [SPARK-15334] commit in Spark added this 12 parameters implementation. But after Hive migrated to newer APIs somehow this was not changed in Spark OSS and it looks to us like a Bug from the Spark end. We need to migrate to the newest implementation of Hive createPartition method otherwise this flow can break > Remove deprecated API usage in HiveShim.scala > - > > Key: SPARK-44058 > URL: https://issues.apache.org/jira/browse/SPARK-44058 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.4.0 >Reporter: Aman Raj >Priority: Major > > Spark's HiveShim.scala calls this particular method in Hive : > createPartitionMethod.invoke( > hive, > table, > spec, > location, > params, // partParams > null, // inputFormat > null, // outputFormat > -1: JInteger, // numBuckets > null, // cols > null, // serializationLib > null, // serdeParams > null, // bucketCols > null) // sortCols > } > > We do not have any such implementation of createPartition in Hive. We only > have this definition : > public Partition createPartition(Table tbl, Map partSpec) > throws HiveException { > try > { org.apache.hadoop.hive.metastore.api.Partition part = > Partition.createMetaPartitionObject(tbl, partSpec, null); > AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, > tbl); part.setWriteId(tableSnapshot != null ? > tableSnapshot.getWriteId() : 0); return new Partition(tbl, > getMSC().add_partition(part)); } > catch (Exception e) > { LOG.error(StringUtils.stringifyException(e)); throw new > HiveException(e); } > } > *The 12 parameter implementation was removed in HIVE-5951* > > The issue is that this 12 parameter implementation of createPartition method > was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was > used in Spark, SPARK-15334 commit in Spark added this 12 parameters > implementati
[jira] [Created] (SPARK-44058) Remove deprecated API usage in HiveShim.scala
Aman Raj created SPARK-44058: Summary: Remove deprecated API usage in HiveShim.scala Key: SPARK-44058 URL: https://issues.apache.org/jira/browse/SPARK-44058 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 3.4.0 Reporter: Aman Raj Spark's HiveShim.scala calls this particular method in Hive : createPartitionMethod.invoke( hive, table, spec, location, params, // partParams null, // inputFormat null, // outputFormat -1: JInteger, // numBuckets null, // cols null, // serializationLib null, // serdeParams null, // bucketCols null) // sortCols } We do not have any such implementation of createPartition in Hive. We only have this definition : public Partition createPartition(Table tbl, Map partSpec) throws HiveException { try { org.apache.hadoop.hive.metastore.api.Partition part = Partition.createMetaPartitionObject(tbl, partSpec, null); AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl); part.setWriteId(tableSnapshot != null ? tableSnapshot.getWriteId() : 0); return new Partition(tbl, getMSC().add_partition(part)); } catch (Exception e) { LOG.error(StringUtils.stringifyException(e)); throw new HiveException(e); } } The issue is that this 12 parameter implementation of createPartition method was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was used in Spark, [SPARK-15334] commit in Spark added this 12 parameters implementation. But after Hive migrated to newer APIs somehow this was not changed in Spark OSS and it looks to us like a Bug from the Spark end. We need to migrate to the newest implementation of Hive createPartition method otherwise this flow can break -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44057) Mark all `local-cluster` tests as `ExtendedSQLTest`
Dongjoon Hyun created SPARK-44057: - Summary: Mark all `local-cluster` tests as `ExtendedSQLTest` Key: SPARK-44057 URL: https://issues.apache.org/jira/browse/SPARK-44057 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.5.0 Reporter: Dongjoon Hyun This issue aims to mark all `local-cluster` tests as `ExtendedSQLTest` https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/251144/signedlogcontent/12?urlExpires=2023-06-14T17%3A11%3A50.2399742Z&urlSigningMethod=HMACV1&urlSignature=%2FHTlrgaHtF2Jv65vw%2Fj4SzT69etebI0swSSM6dXC0tk%3D {code} $ git grep local-cluster sql/core/ sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: val session = SparkSession.builder().master("local-cluster[3, 1, 1024]").getOrCreate() sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala: val session = SparkSession.builder().master("local-cluster[3, 1, 1024]").getOrCreate() sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala:// Additional tests run in 'local-cluster' mode. sql/core/src/test/scala/org/apache/spark/sql/execution/BroadcastExchangeSuite.scala: .setMaster("local-cluster[2,1,1024]") sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala: "--master", "local-cluster[1,1,1024]", sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: * Create a new [[SparkSession]] running in local-cluster mode with unsafe and codegen enabled. sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala: .master("local-cluster[2,1,1024]") sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: * Tests in this suite we need to run Spark in local-cluster mode. In particular, the use of sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: * Create a new [[SparkSession]] running in local-cluster mode with unsafe and codegen enabled. sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala: .master("local-cluster[2,1,512]") sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala: .config(sparkConf.setMaster("local-cluster[2, 1, 1024]")) sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: // Create a new [[SparkSession]] running in local-cluster mode. sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala: .master("local-cluster[2,1,1024]") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732668#comment-17732668 ] Dongjoon Hyun commented on SPARK-44053: --- Apache ORC 1.9.0 PR will arrive soon in this month. > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1, 3.5.0 >Reporter: Yiqun Zhang >Assignee: Yiqun Zhang >Priority: Major > Fix For: 3.4.1, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44053: -- Affects Version/s: 3.5.0 > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1, 3.5.0 >Reporter: Yiqun Zhang >Assignee: Yiqun Zhang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44053: -- Fix Version/s: 3.4.1 > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1, 3.5.0 >Reporter: Yiqun Zhang >Assignee: Yiqun Zhang >Priority: Major > Fix For: 3.4.1, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44056) Improve error message when UDF execution fails
Rob Reeves created SPARK-44056: -- Summary: Improve error message when UDF execution fails Key: SPARK-44056 URL: https://issues.apache.org/jira/browse/SPARK-44056 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Rob Reeves If a user has multiple UDFs defined with the same method signature it is hard to figure out which one caused the issue from the function class alone. For example, in Spark 3.1.1: {code} Caused by: org.apache.spark.SparkException: Failed to execute user defined function(UDFRegistration$$Lambda$666/1969461119: (bigint, string) => string) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:249) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:248) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:131) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:523) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1535) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:526) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException This is the end of the stack trace. I didn't truncate it. {code} If the SQL API is used the ScalaUDF will have a name. It should be part of the error to help debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44053. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41593 [https://github.com/apache/spark/pull/41593] > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 >Reporter: Yiqun Zhang >Assignee: Yiqun Zhang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44053: - Assignee: Yiqun Zhang > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 >Reporter: Yiqun Zhang >Assignee: Yiqun Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44055) Remove redundant `override` from `CheckpointRDD`
Yang Jie created SPARK-44055: Summary: Remove redundant `override` from `CheckpointRDD` Key: SPARK-44055 URL: https://issues.apache.org/jira/browse/SPARK-44055 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-43819) Barrier Executor Stage Not Retried on Task Failure
[ https://issues.apache.org/jira/browse/SPARK-43819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Tieman closed SPARK-43819. -- > Barrier Executor Stage Not Retried on Task Failure > -- > > Key: SPARK-43819 > URL: https://issues.apache.org/jira/browse/SPARK-43819 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.3.2 >Reporter: Matthew Tieman >Priority: Major > > When running a stage using barrier executor, the expectation is that a > failure in a task will result in the stage being retried. However, if an > exception is thrown from a task, the stage is not retried and the job fails. > Running the pyspark code below will cause a single task to fail, failing the > stage without retrying. > {code:java} > def test_func(index: int) -> list: > if index == 0: > raise RuntimeError("Thrown from test func") > return [] > start_rdd = sc.parallelize([i for i in range(10)], 10) > result = start_rdd.barrier().mapPartitionsWithIndex(lambda i, c: test_func(i)) > result.collect(){code} > > This failure is seen running locally via the pyspark shell and on a K8s > cluster. > > Stack trace from local execution: > {noformat} > Traceback (most recent call last): > File "", line 1, in > File "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/rdd.py", > line 1197, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", > line 1321, in __call__ > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/sql/utils.py", > line 190, in deco > return f(*a, **kw) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", > line 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Could > not recover from a failed barrier ResultStage. Most recent failure reason: > Stage failed because barrier task ResultTask(0, 0) finished unsuccessfully. > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 686, in main > process() > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 676, in process > out_iter = func(split_index, iterator) > File "", line 1, in > File "", line 3, in test_func > RuntimeError: Thrown from test func > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:559) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:765) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:747) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:512) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at > org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) > at > org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at > org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) > at > org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1021) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:9
[jira] [Commented] (SPARK-43819) Barrier Executor Stage Not Retried on Task Failure
[ https://issues.apache.org/jira/browse/SPARK-43819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732599#comment-17732599 ] Matthew Tieman commented on SPARK-43819: After further debugging I found the issue was a combination of incorrect configuration and expectations. First, the incorrect expectation would be that on task failure, the stage would be retried. However, this only happens if failure happens in a shuffle map stage. If the failure happens in a result stage, the job will be aborted. Next, misconfiguration was occurring in the {{SparkApplication}} resource submitted to the K8s spark operator, specifically, the {{restartPolicy}} was being set to {{{}Never{}}}. The combination of the barrier failing the job and the spark operator being told not to retry applications on failure lead to the issue. The solution was to configure a restart policy with an appropriate number of retry attempts. > Barrier Executor Stage Not Retried on Task Failure > -- > > Key: SPARK-43819 > URL: https://issues.apache.org/jira/browse/SPARK-43819 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.3.2 >Reporter: Matthew Tieman >Priority: Major > > When running a stage using barrier executor, the expectation is that a > failure in a task will result in the stage being retried. However, if an > exception is thrown from a task, the stage is not retried and the job fails. > Running the pyspark code below will cause a single task to fail, failing the > stage without retrying. > {code:java} > def test_func(index: int) -> list: > if index == 0: > raise RuntimeError("Thrown from test func") > return [] > start_rdd = sc.parallelize([i for i in range(10)], 10) > result = start_rdd.barrier().mapPartitionsWithIndex(lambda i, c: test_func(i)) > result.collect(){code} > > This failure is seen running locally via the pyspark shell and on a K8s > cluster. > > Stack trace from local execution: > {noformat} > Traceback (most recent call last): > File "", line 1, in > File "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/rdd.py", > line 1197, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", > line 1321, in __call__ > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/sql/utils.py", > line 190, in deco > return f(*a, **kw) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", > line 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Could > not recover from a failed barrier ResultStage. Most recent failure reason: > Stage failed because barrier task ResultTask(0, 0) finished unsuccessfully. > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 686, in main > process() > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 676, in process > out_iter = func(split_index, iterator) > File "", line 1, in > File "", line 3, in test_func > RuntimeError: Thrown from test func > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:559) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:765) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:747) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:512) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at > org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) > at > o
[jira] [Resolved] (SPARK-43819) Barrier Executor Stage Not Retried on Task Failure
[ https://issues.apache.org/jira/browse/SPARK-43819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Tieman resolved SPARK-43819. Resolution: Not A Problem > Barrier Executor Stage Not Retried on Task Failure > -- > > Key: SPARK-43819 > URL: https://issues.apache.org/jira/browse/SPARK-43819 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.3.2 >Reporter: Matthew Tieman >Priority: Major > > When running a stage using barrier executor, the expectation is that a > failure in a task will result in the stage being retried. However, if an > exception is thrown from a task, the stage is not retried and the job fails. > Running the pyspark code below will cause a single task to fail, failing the > stage without retrying. > {code:java} > def test_func(index: int) -> list: > if index == 0: > raise RuntimeError("Thrown from test func") > return [] > start_rdd = sc.parallelize([i for i in range(10)], 10) > result = start_rdd.barrier().mapPartitionsWithIndex(lambda i, c: test_func(i)) > result.collect(){code} > > This failure is seen running locally via the pyspark shell and on a K8s > cluster. > > Stack trace from local execution: > {noformat} > Traceback (most recent call last): > File "", line 1, in > File "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/rdd.py", > line 1197, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", > line 1321, in __call__ > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/sql/utils.py", > line 190, in deco > return f(*a, **kw) > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", > line 326, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Could > not recover from a failed barrier ResultStage. Most recent failure reason: > Stage failed because barrier task ResultTask(0, 0) finished unsuccessfully. > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 686, in main > process() > File > "/opt/homebrew/anaconda3/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", > line 676, in process > out_iter = func(split_index, iterator) > File "", line 1, in > File "", line 3, in test_func > RuntimeError: Thrown from test func > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:559) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:765) > at > org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:747) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:512) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at > org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) > at > org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at > org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) > at > org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1021) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268) > at org.apache.spark.scheduler.Res
[jira] [Commented] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732572#comment-17732572 ] Yang Jie commented on SPARK-44041: -- I will give a pr when it can be downloaded from Maven > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44041) Upgrade ammonite to 2.5.9
[ https://issues.apache.org/jira/browse/SPARK-44041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732572#comment-17732572 ] Yang Jie edited comment on SPARK-44041 at 6/14/23 3:03 PM: --- I will give a pr when it can be downloaded by Maven was (Author: luciferyang): I will give a pr when it can be downloaded from Maven > Upgrade ammonite to 2.5.9 > - > > Key: SPARK-44041 > URL: https://issues.apache.org/jira/browse/SPARK-44041 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > For support Scala 2.12.18 & 2.13.11 > > already has a tag : > [https://github.com/com-lihaoyi/Ammonite/releases/tag/2.5.9] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44054) Make test cases inherit SparkFunSuite have a default timeout
Yang Jie created SPARK-44054: Summary: Make test cases inherit SparkFunSuite have a default timeout Key: SPARK-44054 URL: https://issues.apache.org/jira/browse/SPARK-44054 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44047) Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre
[ https://issues.apache.org/jira/browse/SPARK-44047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44047: Assignee: BingKun Pan > Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre > -- > > Key: SPARK-44047 > URL: https://issues.apache.org/jira/browse/SPARK-44047 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44047) Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre
[ https://issues.apache.org/jira/browse/SPARK-44047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44047. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41581 [https://github.com/apache/spark/pull/41581] > Upgrade google guava for connect from 31.0.1-jre to 32.0.1-jre > -- > > Key: SPARK-44047 > URL: https://issues.apache.org/jira/browse/SPARK-44047 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44051) Split `pyspark.pandas.tests.connect.data_type_ops.test_parity_num_ops`
[ https://issues.apache.org/jira/browse/SPARK-44051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44051. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41591 [https://github.com/apache/spark/pull/41591] > Split `pyspark.pandas.tests.connect.data_type_ops.test_parity_num_ops` > -- > > Key: SPARK-44051 > URL: https://issues.apache.org/jira/browse/SPARK-44051 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44051) Split `pyspark.pandas.tests.connect.data_type_ops.test_parity_num_ops`
[ https://issues.apache.org/jira/browse/SPARK-44051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44051: - Assignee: Ruifeng Zheng > Split `pyspark.pandas.tests.connect.data_type_ops.test_parity_num_ops` > -- > > Key: SPARK-44051 > URL: https://issues.apache.org/jira/browse/SPARK-44051 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44004) Assign name & improve error message for frequent LEGACY errors.
[ https://issues.apache.org/jira/browse/SPARK-44004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732465#comment-17732465 ] ASF GitHub Bot commented on SPARK-44004: User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41504 > Assign name & improve error message for frequent LEGACY errors. > --- > > Key: SPARK-44004 > URL: https://issues.apache.org/jira/browse/SPARK-44004 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > This addresses _LEGACY_ERROR_TEMP_1333, _LEGACY_ERROR_TEMP_2331, > _LEGACY_ERROR_TEMP_0023, _LEGACY_ERROR_TEMP_1157, _LEGACY_ERROR_TEMP_2308, > _LEGACY_ERROR_TEMP_1051, _LEGACY_ERROR_TEMP_1029, _LEGACY_ERROR_TEMP_1318 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44004) Assign name & improve error message for frequent LEGACY errors.
[ https://issues.apache.org/jira/browse/SPARK-44004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732464#comment-17732464 ] ASF GitHub Bot commented on SPARK-44004: User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41504 > Assign name & improve error message for frequent LEGACY errors. > --- > > Key: SPARK-44004 > URL: https://issues.apache.org/jira/browse/SPARK-44004 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > This addresses _LEGACY_ERROR_TEMP_1333, _LEGACY_ERROR_TEMP_2331, > _LEGACY_ERROR_TEMP_0023, _LEGACY_ERROR_TEMP_1157, _LEGACY_ERROR_TEMP_2308, > _LEGACY_ERROR_TEMP_1051, _LEGACY_ERROR_TEMP_1029, _LEGACY_ERROR_TEMP_1318 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44040) Incorrect result after count distinct
[ https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732461#comment-17732461 ] ASF GitHub Bot commented on SPARK-44040: User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/41576 > Incorrect result after count distinct > - > > Key: SPARK-44040 > URL: https://issues.apache.org/jira/browse/SPARK-44040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Aleksandr Aleksandrov >Priority: Critical > > When i try to call count after distinct function for Decimal null field, > spark return incorrect result starting from spark 3.4.0. > A minimal example to reproduce: > import org.apache.spark.sql.types._ > import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession} > import org.apache.spark.sql.types.\{StringType, StructField, StructType} > val schema = StructType( Array( > StructField("money", DecimalType(38,6), true), > StructField("reference_id", StringType, true) > )) > val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema) > val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1")) > val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", > lit("df2")) > val unionDF: DataFrame = aggDf.union(aggDf1) > unionDF.select("money").distinct.show // return correct result > unionDF.select("money").distinct.count // return 2 instead of 1 > unionDF.select("money").distinct.count == 1 // return false > This block of code returns some assertion error and after that an incorrect > count (in spark 3.2.1 everything works fine and i get correct result = 1): > *scala> unionDF.select("money").distinct.show // return correct result* > java.lang.AssertionError: assertion failed: > Decimal$DecimalIsFractional > while compiling: > during phase: globalPhase=terminal, enteringPhase=jvm > library version: version 2.12.17 > compiler version: version 2.12.17 > reconstructed args: -classpath > /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar > -Yrepl-class-based -Yrepl-outdir > /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1 > last tree to typer: TypeTree(class Byte) > tree position: line 6 of > tree tpe: Byte > symbol: (final abstract) class Byte in package scala > symbol definition: final abstract class Byte extends (a ClassSymbol) > symbol package: scala > symbol owners: class Byte > call site: constructor $eval in object $eval in package $line19 > == Source file context for tree position == > 3 > 4object $eval { > 5lazyval $result = > $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0 > 6lazyval $print: {_}root{_}.java.lang.String = { > 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw > 8 > 9"" > at > scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185) > at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525) > at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) > at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346) > at > scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348) > at > scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487) > at > scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799) > at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805) > at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28) > at > scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342) > at > scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645) > at > scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413) > at > scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357) > at scala.reflect.internal.pickling.UnPickler$Scan.at(UnP
[jira] [Commented] (SPARK-43915) Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]
[ https://issues.apache.org/jira/browse/SPARK-43915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732459#comment-17732459 ] ASF GitHub Bot commented on SPARK-43915: User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41553 > Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445] > -- > > Key: SPARK-43915 > URL: https://issues.apache.org/jira/browse/SPARK-43915 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44044) Improve Error message for SQL Window functions
[ https://issues.apache.org/jira/browse/SPARK-44044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732456#comment-17732456 ] ASF GitHub Bot commented on SPARK-44044: User 'siying' has created a pull request for this issue: https://github.com/apache/spark/pull/41578 > Improve Error message for SQL Window functions > -- > > Key: SPARK-44044 > URL: https://issues.apache.org/jira/browse/SPARK-44044 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Siying Dong >Priority: Trivial > > Right now, if window spec is used with a stream query, the error message > looks like following: > Non-time-based windows are not supported on streaming DataFrames/Datasets; > Window [... > The message isn't very helpful to identify what's the problem is and some > customers and even support engineers got confused by this. It is suggested > that we call out aggregation function over the window spec so that the users > can locate the part of the query that caused the problem easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44044) Improve Error message for SQL Window functions
[ https://issues.apache.org/jira/browse/SPARK-44044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732455#comment-17732455 ] ASF GitHub Bot commented on SPARK-44044: User 'siying' has created a pull request for this issue: https://github.com/apache/spark/pull/41578 > Improve Error message for SQL Window functions > -- > > Key: SPARK-44044 > URL: https://issues.apache.org/jira/browse/SPARK-44044 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Siying Dong >Priority: Trivial > > Right now, if window spec is used with a stream query, the error message > looks like following: > Non-time-based windows are not supported on streaming DataFrames/Datasets; > Window [... > The message isn't very helpful to identify what's the problem is and some > customers and even support engineers got confused by this. It is suggested > that we call out aggregation function over the window spec so that the users > can locate the part of the query that caused the problem easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Zhang updated SPARK-44053: Affects Version/s: 3.4.1 (was: 3.5.0) > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 >Reporter: Yiqun Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732447#comment-17732447 ] Yiqun Zhang commented on SPARK-44053: - Our plan is to spark 3.4.1 upgrade to ORC 1.8.4 spark 3.5.0 upgrade to ORC 1.9.0 So I set the affected version to 3.4.1 [~yumwang] :) > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: Yiqun Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44053) Update ORC to 1.8.4
[ https://issues.apache.org/jira/browse/SPARK-44053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44053: Affects Version/s: 3.5.0 (was: 3.4.1) > Update ORC to 1.8.4 > --- > > Key: SPARK-44053 > URL: https://issues.apache.org/jira/browse/SPARK-44053 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.5.0 >Reporter: Yiqun Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44053) Update ORC to 1.8.4
Yiqun Zhang created SPARK-44053: --- Summary: Update ORC to 1.8.4 Key: SPARK-44053 URL: https://issues.apache.org/jira/browse/SPARK-44053 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.1 Reporter: Yiqun Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43645) Enable pyspark.pandas.spark.functions.stddev in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43645: - Assignee: Ruifeng Zheng > Enable pyspark.pandas.spark.functions.stddev in Spark Connect. > -- > > Key: SPARK-43645 > URL: https://issues.apache.org/jira/browse/SPARK-43645 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.stddev in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43931) Add make_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43931: - Assignee: BingKun Pan > Add make_* functions to Scala and Python > > > Key: SPARK-43931 > URL: https://issues.apache.org/jira/browse/SPARK-43931 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > > Add following functions: > * make_dt_interval > * make_interval > * make_timestamp > * make_timestamp_ltz > * make_timestamp_ntz > * make_ym_interval > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43931) Add make_* functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43931. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41477 [https://github.com/apache/spark/pull/41477] > Add make_* functions to Scala and Python > > > Key: SPARK-43931 > URL: https://issues.apache.org/jira/browse/SPARK-43931 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > Fix For: 3.5.0 > > > Add following functions: > * make_dt_interval > * make_interval > * make_timestamp > * make_timestamp_ltz > * make_timestamp_ntz > * make_ym_interval > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43622) Enable pyspark.pandas.spark.functions.var in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43622. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41589 [https://github.com/apache/spark/pull/41589] > Enable pyspark.pandas.spark.functions.var in Spark Connect. > --- > > Key: SPARK-43622 > URL: https://issues.apache.org/jira/browse/SPARK-43622 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.var in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43645) Enable pyspark.pandas.spark.functions.stddev in Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-43645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43645. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41589 [https://github.com/apache/spark/pull/41589] > Enable pyspark.pandas.spark.functions.stddev in Spark Connect. > -- > > Key: SPARK-43645 > URL: https://issues.apache.org/jira/browse/SPARK-43645 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable pyspark.pandas.spark.functions.stddev in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44035) Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow`
[ https://issues.apache.org/jira/browse/SPARK-44035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44035. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41565 [https://github.com/apache/spark/pull/41565] > Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow` > > > Key: SPARK-44035 > URL: https://issues.apache.org/jira/browse/SPARK-44035 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44035) Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow`
[ https://issues.apache.org/jira/browse/SPARK-44035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44035: - Assignee: Ruifeng Zheng > Split `pyspark.pandas.tests.connect.test_parity_ops_on_diff_frames_slow` > > > Key: SPARK-44035 > URL: https://issues.apache.org/jira/browse/SPARK-44035 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark, Tests >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44048) Remove sql-migration-old.md
[ https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44048: - Assignee: Yuming Wang > Remove sql-migration-old.md > --- > > Key: SPARK-44048 > URL: https://issues.apache.org/jira/browse/SPARK-44048 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44048) Remove sql-migration-old.md
[ https://issues.apache.org/jira/browse/SPARK-44048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44048. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41583 [https://github.com/apache/spark/pull/41583] > Remove sql-migration-old.md > --- > > Key: SPARK-44048 > URL: https://issues.apache.org/jira/browse/SPARK-44048 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43963) DataSource V2: Handle MERGE commands for group-based sources
[ https://issues.apache.org/jira/browse/SPARK-43963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43963: - Assignee: Anton Okolnychyi > DataSource V2: Handle MERGE commands for group-based sources > > > Key: SPARK-43963 > URL: https://issues.apache.org/jira/browse/SPARK-43963 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > We need to handle MERGE commands for group-based sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43963) DataSource V2: Handle MERGE commands for group-based sources
[ https://issues.apache.org/jira/browse/SPARK-43963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43963. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41577 [https://github.com/apache/spark/pull/41577] > DataSource V2: Handle MERGE commands for group-based sources > > > Key: SPARK-43963 > URL: https://issues.apache.org/jira/browse/SPARK-43963 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > We need to handle MERGE commands for group-based sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
[ https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44049. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41586 [https://github.com/apache/spark/pull/41586] > Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup > -- > > Key: SPARK-44049 > URL: https://issues.apache.org/jira/browse/SPARK-44049 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44049) Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup
[ https://issues.apache.org/jira/browse/SPARK-44049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44049: - Assignee: Dongjoon Hyun > Fix KubernetesSuite to use `inNamespace` for validating driver pod cleanup > -- > > Key: SPARK-44049 > URL: https://issues.apache.org/jira/browse/SPARK-44049 > Project: Spark > Issue Type: Test > Components: Kubernetes, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44052) Add util to get proper Column or DataFrame class for Spark Connect.
Haejoon Lee created SPARK-44052: --- Summary: Add util to get proper Column or DataFrame class for Spark Connect. Key: SPARK-44052 URL: https://issues.apache.org/jira/browse/SPARK-44052 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee There are many codes are duplicated to get proper PySparkColumn or PySparkDataFrame, so it would be great if we have util function to deduplicate these codes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org