[jira] [Commented] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
[ https://issues.apache.org/jira/browse/SPARK-37161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436232#comment-17436232 ] Apache Spark commented on SPARK-37161: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/34446 > RowToColumnConverter support AnsiIntervalType > -- > > Key: SPARK-37161 > URL: https://issues.apache.org/jira/browse/SPARK-37161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > currently, we have RowToColumnConverter for all data types except > AnsiIntervalType > {code:java} > // code placeholder > val core = dataType match { > case BinaryType => BinaryConverter > case BooleanType => BooleanConverter > case ByteType => ByteConverter > case ShortType => ShortConverter > case IntegerType | DateType => IntConverter > case FloatType => FloatConverter > case LongType | TimestampType => LongConverter > case DoubleType => DoubleConverter > case StringType => StringConverter > case CalendarIntervalType => CalendarConverter > case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, > at.containsNull)) > case st: StructType => new StructConverter(st.fields.map( > (f) => getConverterForType(f.dataType, f.nullable))) > case dt: DecimalType => new DecimalConverter(dt) > case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = > false), > getConverterForType(mt.valueType, mt.valueContainsNull)) > case unknown => throw > QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) > } > if (nullable) { > dataType match { > case CalendarIntervalType => new StructNullableTypeConverter(core) > case st: StructType => new StructNullableTypeConverter(core) > case _ => new BasicNullableTypeConverter(core) > } > } else { > core > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
[ https://issues.apache.org/jira/browse/SPARK-37161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37161: Assignee: Apache Spark > RowToColumnConverter support AnsiIntervalType > -- > > Key: SPARK-37161 > URL: https://issues.apache.org/jira/browse/SPARK-37161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > currently, we have RowToColumnConverter for all data types except > AnsiIntervalType > {code:java} > // code placeholder > val core = dataType match { > case BinaryType => BinaryConverter > case BooleanType => BooleanConverter > case ByteType => ByteConverter > case ShortType => ShortConverter > case IntegerType | DateType => IntConverter > case FloatType => FloatConverter > case LongType | TimestampType => LongConverter > case DoubleType => DoubleConverter > case StringType => StringConverter > case CalendarIntervalType => CalendarConverter > case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, > at.containsNull)) > case st: StructType => new StructConverter(st.fields.map( > (f) => getConverterForType(f.dataType, f.nullable))) > case dt: DecimalType => new DecimalConverter(dt) > case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = > false), > getConverterForType(mt.valueType, mt.valueContainsNull)) > case unknown => throw > QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) > } > if (nullable) { > dataType match { > case CalendarIntervalType => new StructNullableTypeConverter(core) > case st: StructType => new StructNullableTypeConverter(core) > case _ => new BasicNullableTypeConverter(core) > } > } else { > core > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
[ https://issues.apache.org/jira/browse/SPARK-37161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37161: Assignee: (was: Apache Spark) > RowToColumnConverter support AnsiIntervalType > -- > > Key: SPARK-37161 > URL: https://issues.apache.org/jira/browse/SPARK-37161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > currently, we have RowToColumnConverter for all data types except > AnsiIntervalType > {code:java} > // code placeholder > val core = dataType match { > case BinaryType => BinaryConverter > case BooleanType => BooleanConverter > case ByteType => ByteConverter > case ShortType => ShortConverter > case IntegerType | DateType => IntConverter > case FloatType => FloatConverter > case LongType | TimestampType => LongConverter > case DoubleType => DoubleConverter > case StringType => StringConverter > case CalendarIntervalType => CalendarConverter > case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, > at.containsNull)) > case st: StructType => new StructConverter(st.fields.map( > (f) => getConverterForType(f.dataType, f.nullable))) > case dt: DecimalType => new DecimalConverter(dt) > case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = > false), > getConverterForType(mt.valueType, mt.valueContainsNull)) > case unknown => throw > QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) > } > if (nullable) { > dataType match { > case CalendarIntervalType => new StructNullableTypeConverter(core) > case st: StructType => new StructNullableTypeConverter(core) > case _ => new BasicNullableTypeConverter(core) > } > } else { > core > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
[ https://issues.apache.org/jira/browse/SPARK-37161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436231#comment-17436231 ] Apache Spark commented on SPARK-37161: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/34446 > RowToColumnConverter support AnsiIntervalType > -- > > Key: SPARK-37161 > URL: https://issues.apache.org/jira/browse/SPARK-37161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > currently, we have RowToColumnConverter for all data types except > AnsiIntervalType > {code:java} > // code placeholder > val core = dataType match { > case BinaryType => BinaryConverter > case BooleanType => BooleanConverter > case ByteType => ByteConverter > case ShortType => ShortConverter > case IntegerType | DateType => IntConverter > case FloatType => FloatConverter > case LongType | TimestampType => LongConverter > case DoubleType => DoubleConverter > case StringType => StringConverter > case CalendarIntervalType => CalendarConverter > case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, > at.containsNull)) > case st: StructType => new StructConverter(st.fields.map( > (f) => getConverterForType(f.dataType, f.nullable))) > case dt: DecimalType => new DecimalConverter(dt) > case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = > false), > getConverterForType(mt.valueType, mt.valueContainsNull)) > case unknown => throw > QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) > } > if (nullable) { > dataType match { > case CalendarIntervalType => new StructNullableTypeConverter(core) > case st: StructType => new StructNullableTypeConverter(core) > case _ => new BasicNullableTypeConverter(core) > } > } else { > core > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
[ https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436228#comment-17436228 ] Apache Spark commented on SPARK-36646: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34445 > Push down group by partition column for Aggregate (Min/Max/Count) for Parquet > - > > Key: SPARK-36646 > URL: https://issues.apache.org/jira/browse/SPARK-36646 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > If Aggregate (Min/Max/Count) in parquet is group by partition column, push > down group by -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32567) Code-gen for full outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-32567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436215#comment-17436215 ] Apache Spark commented on SPARK-32567: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/3 > Code-gen for full outer shuffled hash join > -- > > Key: SPARK-32567 > URL: https://issues.apache.org/jira/browse/SPARK-32567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Minor > > As a followup for [https://github.com/apache/spark/pull/29342] (non-codegen > full outer shuffled hash join), this task is to add code-gen for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32567) Code-gen for full outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-32567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436214#comment-17436214 ] Apache Spark commented on SPARK-32567: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/3 > Code-gen for full outer shuffled hash join > -- > > Key: SPARK-32567 > URL: https://issues.apache.org/jira/browse/SPARK-32567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Minor > > As a followup for [https://github.com/apache/spark/pull/29342] (non-codegen > full outer shuffled hash join), this task is to add code-gen for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32567) Code-gen for full outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-32567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32567: Assignee: (was: Apache Spark) > Code-gen for full outer shuffled hash join > -- > > Key: SPARK-32567 > URL: https://issues.apache.org/jira/browse/SPARK-32567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Minor > > As a followup for [https://github.com/apache/spark/pull/29342] (non-codegen > full outer shuffled hash join), this task is to add code-gen for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32567) Code-gen for full outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-32567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32567: Assignee: Apache Spark > Code-gen for full outer shuffled hash join > -- > > Key: SPARK-32567 > URL: https://issues.apache.org/jira/browse/SPARK-32567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Minor > > As a followup for [https://github.com/apache/spark/pull/29342] (non-codegen > full outer shuffled hash join), this task is to add code-gen for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37168) Improve error messages for SQL functions and operators under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-37168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436179#comment-17436179 ] Apache Spark commented on SPARK-37168: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/34443 > Improve error messages for SQL functions and operators under ANSI mode > -- > > Key: SPARK-37168 > URL: https://issues.apache.org/jira/browse/SPARK-37168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Make error messages more actionable when ANSI mode is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37168) Improve error messages for SQL functions and operators under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-37168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37168: Assignee: (was: Apache Spark) > Improve error messages for SQL functions and operators under ANSI mode > -- > > Key: SPARK-37168 > URL: https://issues.apache.org/jira/browse/SPARK-37168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Make error messages more actionable when ANSI mode is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37168) Improve error messages for SQL functions and operators under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-37168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37168: Assignee: Apache Spark > Improve error messages for SQL functions and operators under ANSI mode > -- > > Key: SPARK-37168 > URL: https://issues.apache.org/jira/browse/SPARK-37168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Make error messages more actionable when ANSI mode is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37168) Improve error messages for SQL functions and operators under ANSI mode
Allison Wang created SPARK-37168: Summary: Improve error messages for SQL functions and operators under ANSI mode Key: SPARK-37168 URL: https://issues.apache.org/jira/browse/SPARK-37168 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Allison Wang Make error messages more actionable when ANSI mode is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)
[ https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37117. Fix Version/s: 3.3.0 3.2.1 Assignee: Gidon Gershinsky Resolution: Fixed > Can't read files in one of Parquet encryption modes (external keymaterial) > --- > > Key: SPARK-37117 > URL: https://issues.apache.org/jira/browse/SPARK-37117 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 3.2.1, 3.3.0 > > > Parquet encryption has a number of modes. One of them is "external > keymaterial", which keeps encrypted data keys in a separate file (as opposed > to inside Parquet file). Upon reading, the Spark Parquet connector does not > pass the file path, which causes an NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37167) Add benchmark for aggregate push down
Cheng Su created SPARK-37167: Summary: Add benchmark for aggregate push down Key: SPARK-37167 URL: https://issues.apache.org/jira/browse/SPARK-37167 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Cheng Su As we added aggregate push down for Parquet and ORC, let's also add a micro benchmark for both file formats, similar to filter push down and nested schema pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436144#comment-17436144 ] Naresh edited comment on SPARK-26365 at 10/29/21, 7:41 PM: --- [~oscar.bonilla] Any plans to prioritize issue?? This will definitely block the spark usage with K8s was (Author: gangishetty): [~oscar.bonilla] Any plans to prioritize issue?? This will definitely lock the spark usage with K8s > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436144#comment-17436144 ] Naresh commented on SPARK-26365: [~oscar.bonilla] Any plans to prioritize issue?? This will definitely lock the spark usage with K8s > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36844) Window function "first" (unboundedFollowing) appears significantly slower than "last" (unboundedPreceding) in identical circumstances
[ https://issues.apache.org/jira/browse/SPARK-36844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436116#comment-17436116 ] Tanel Kiis commented on SPARK-36844: Hello, I also hit this issue a while back and found that, It is a bit explained in this code comment: https://github.com/apache/spark/blob/abf9675a7559d5666e40f25098334b5edbf8c0c3/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala#L609-L611 So it is not the fault of the first aggregator, but it is the UnboundedFollowing window frame. There are definetly some optimizations, that could be done. If I followed your code correctly, then I think you would be better of using the [lead|https://spark.apache.org/docs/latest/api/sql/index.html#lead] and [lag|https://spark.apache.org/docs/latest/api/sql/index.html#lag] window functions. With those you can drop the .rowsBetween(...) part from your window specs. > Window function "first" (unboundedFollowing) appears significantly slower > than "last" (unboundedPreceding) in identical circumstances > - > > Key: SPARK-36844 > URL: https://issues.apache.org/jira/browse/SPARK-36844 > Project: Spark > Issue Type: Bug > Components: PySpark, Windows >Affects Versions: 3.1.1 >Reporter: Alain Bryden >Priority: Minor > Attachments: Physical Plan 2 - workaround.png, Pysical Plan.png > > > I originally posted a question on SO because I thought perhaps I was doing > something wrong: > [https://stackoverflow.com/questions/69308560|https://stackoverflow.com/questions/69308560/spark-first-window-function-is-taking-much-longer-than-last?noredirect=1#comment122505685_69308560] > Perhaps I am, but I'm now fairly convinced that there's something wonky with > the implementation of `first` that's causing it to unnecessarily have a much > worse complexity than `last`. > > More or less copy-pasted from the above post: > I was working on a pyspark routine to interpolate the missing values in a > configuration table. > Imagine a table of configuration values that go from 0 to 50,000. The user > specifies a few data points in between (say at 0, 50, 100, 500, 2000, 50) > and we interpolate the remainder. My solution mostly follows [this blog > post|https://walkenho.github.io/interpolating-time-series-p2-spark/] quite > closely, except I'm not using any UDFs. > In troubleshooting the performance of this (takes ~3 minutes) I found that > one particular window function is taking all of the time, and everything else > I'm doing takes mere seconds. > Here is the main area of interest - where I use window functions to fill in > the previous and next user-supplied configuration values: > {code:python} > from pyspark.sql import Window, functions as F > # Create partition windows that are required to generate new rows from the > ones provided > win_last = Window.partitionBy('PORT_TYPE', > 'loss_process').orderBy('rank').rowsBetween(Window.unboundedPreceding, 0) > win_next = Window.partitionBy('PORT_TYPE', > 'loss_process').orderBy('rank').rowsBetween(0, Window.unboundedFollowing) > # Join back in the provided config table to populate the "known" scale factors > df_part1 = (df_scale_factors_template > .join(df_users_config, ['PORT_TYPE', 'loss_process', 'rank'], 'leftouter') > # Add computed columns that can lookup the prior config and next config for > each missing value > .withColumn('last_rank', F.last( F.col('rank'), > ignorenulls=True).over(win_last)) > .withColumn('last_sf', F.last( F.col('scale_factor'), > ignorenulls=True).over(win_last)) > ).cache() > debug_log_dataframe(df_part1 , 'df_part1') # Force a .count() and time Part1 > df_part2 = (df_part1 > .withColumn('next_rank', F.first(F.col('rank'), > ignorenulls=True).over(win_next)) > .withColumn('next_sf', F.first(F.col('scale_factor'), > ignorenulls=True).over(win_next)) > ).cache() > debug_log_dataframe(df_part2 , 'df_part2') # Force a .count() and time Part2 > df_part3 = (df_part2 > # Implements standard linear interpolation: y = y1 + ((y2-y1)/(x2-x1)) * > (x-x1) > .withColumn('scale_factor', > F.when(F.col('last_rank')==F.col('next_rank'), > F.col('last_sf')) # Handle div/0 case > .otherwise(F.col('last_sf') + > ((F.col('next_sf')-F.col('last_sf'))/(F.col('next_rank')-F.col('last_rank'))) > * (F.col('rank')-F.col('last_rank' > .select('PORT_TYPE', 'loss_process', 'rank', 'scale_factor') > ).cache() > debug_log_dataframe(df_part3, 'df_part3', explain: True) > {code} > > The above used to be a single chained dataframe statement, but I've since > split it into 3 parts so that I could isolate the part that's taking so long.
[jira] [Created] (SPARK-37166) SPIP: Storage Partitioned Join
Chao Sun created SPARK-37166: Summary: SPIP: Storage Partitioned Join Key: SPARK-37166 URL: https://issues.apache.org/jira/browse/SPARK-37166 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: Chao Sun This JIRA tracks the SPIP for storage partitioned join. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37165) Add REPEATABLE in TABLESAMPLE to specify seed
[ https://issues.apache.org/jira/browse/SPARK-37165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436052#comment-17436052 ] Apache Spark commented on SPARK-37165: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34442 > Add REPEATABLE in TABLESAMPLE to specify seed > - > > Key: SPARK-37165 > URL: https://issues.apache.org/jira/browse/SPARK-37165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37165) Add REPEATABLE in TABLESAMPLE to specify seed
[ https://issues.apache.org/jira/browse/SPARK-37165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37165: Assignee: Apache Spark > Add REPEATABLE in TABLESAMPLE to specify seed > - > > Key: SPARK-37165 > URL: https://issues.apache.org/jira/browse/SPARK-37165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37165) Add REPEATABLE in TABLESAMPLE to specify seed
[ https://issues.apache.org/jira/browse/SPARK-37165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37165: Assignee: (was: Apache Spark) > Add REPEATABLE in TABLESAMPLE to specify seed > - > > Key: SPARK-37165 > URL: https://issues.apache.org/jira/browse/SPARK-37165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37165) Add REPEATABLE in TABLESAMPLE to specify seed
Huaxin Gao created SPARK-37165: -- Summary: Add REPEATABLE in TABLESAMPLE to specify seed Key: SPARK-37165 URL: https://issues.apache.org/jira/browse/SPARK-37165 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37164) Add ExpressionBuilder for functions with complex overloads
[ https://issues.apache.org/jira/browse/SPARK-37164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436037#comment-17436037 ] Apache Spark commented on SPARK-37164: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34441 > Add ExpressionBuilder for functions with complex overloads > -- > > Key: SPARK-37164 > URL: https://issues.apache.org/jira/browse/SPARK-37164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37164) Add ExpressionBuilder for functions with complex overloads
[ https://issues.apache.org/jira/browse/SPARK-37164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37164: Assignee: Apache Spark > Add ExpressionBuilder for functions with complex overloads > -- > > Key: SPARK-37164 > URL: https://issues.apache.org/jira/browse/SPARK-37164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37164) Add ExpressionBuilder for functions with complex overloads
[ https://issues.apache.org/jira/browse/SPARK-37164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37164: Assignee: (was: Apache Spark) > Add ExpressionBuilder for functions with complex overloads > -- > > Key: SPARK-37164 > URL: https://issues.apache.org/jira/browse/SPARK-37164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37164) Add ExpressionBuilder for functions with complex overloads
[ https://issues.apache.org/jira/browse/SPARK-37164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436036#comment-17436036 ] Apache Spark commented on SPARK-37164: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34441 > Add ExpressionBuilder for functions with complex overloads > -- > > Key: SPARK-37164 > URL: https://issues.apache.org/jira/browse/SPARK-37164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37140. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34413 [https://github.com/apache/spark/pull/34413] > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py
[ https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37140: -- Assignee: dch nguyen > Inline type hints for python/pyspark/resultiterable.py > -- > > Key: SPARK-37140 > URL: https://issues.apache.org/jira/browse/SPARK-37140 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37164) Add ExpressionBuilder for functions with complex overloads
Wenchen Fan created SPARK-37164: --- Summary: Add ExpressionBuilder for functions with complex overloads Key: SPARK-37164 URL: https://issues.apache.org/jira/browse/SPARK-37164 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37163) Disallow casting Date as Numeric types
[ https://issues.apache.org/jira/browse/SPARK-37163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435952#comment-17435952 ] Apache Spark commented on SPARK-37163: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/34440 > Disallow casting Date as Numeric types > -- > > Key: SPARK-37163 > URL: https://issues.apache.org/jira/browse/SPARK-37163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, Date type values can be cast as Numeric types. However, the result > is always NULL. > On the other hand, Numeric values can't be cast as Date type. > It doesn't make sense to keep the behavior of casting Date to null numeric > types. I suggest to disallow the conversion. We can have a legacy flag > `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back > to the legacy behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37163) Disallow casting Date as Numeric types
[ https://issues.apache.org/jira/browse/SPARK-37163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37163: Assignee: Gengliang Wang (was: Apache Spark) > Disallow casting Date as Numeric types > -- > > Key: SPARK-37163 > URL: https://issues.apache.org/jira/browse/SPARK-37163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, Date type values can be cast as Numeric types. However, the result > is always NULL. > On the other hand, Numeric values can't be cast as Date type. > It doesn't make sense to keep the behavior of casting Date to null numeric > types. I suggest to disallow the conversion. We can have a legacy flag > `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back > to the legacy behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37163) Disallow casting Date as Numeric types
[ https://issues.apache.org/jira/browse/SPARK-37163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37163: Assignee: Apache Spark (was: Gengliang Wang) > Disallow casting Date as Numeric types > -- > > Key: SPARK-37163 > URL: https://issues.apache.org/jira/browse/SPARK-37163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Currently, Date type values can be cast as Numeric types. However, the result > is always NULL. > On the other hand, Numeric values can't be cast as Date type. > It doesn't make sense to keep the behavior of casting Date to null numeric > types. I suggest to disallow the conversion. We can have a legacy flag > `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back > to the legacy behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37163) Disallow casting Date as Numeric types
[ https://issues.apache.org/jira/browse/SPARK-37163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435950#comment-17435950 ] Apache Spark commented on SPARK-37163: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/34440 > Disallow casting Date as Numeric types > -- > > Key: SPARK-37163 > URL: https://issues.apache.org/jira/browse/SPARK-37163 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, Date type values can be cast as Numeric types. However, the result > is always NULL. > On the other hand, Numeric values can't be cast as Date type. > It doesn't make sense to keep the behavior of casting Date to null numeric > types. I suggest to disallow the conversion. We can have a legacy flag > `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back > to the legacy behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37163) Disallow casting Date as Numeric types
Gengliang Wang created SPARK-37163: -- Summary: Disallow casting Date as Numeric types Key: SPARK-37163 URL: https://issues.apache.org/jira/browse/SPARK-37163 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Currently, Date type values can be cast as Numeric types. However, the result is always NULL. On the other hand, Numeric values can't be cast as Date type. It doesn't make sense to keep the behavior of casting Date to null numeric types. I suggest to disallow the conversion. We can have a legacy flag `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back to the legacy behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36800) The create table as select statement verifies the valid column name
[ https://issues.apache.org/jira/browse/SPARK-36800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435947#comment-17435947 ] dohongdayi edited comment on SPARK-36800 at 10/29/21, 11:54 AM: I guess the image might be similar with: !SparkIssue.png! was (Author: dohongdayi): The image might be like: !SparkIssue.png! > The create table as select statement verifies the valid column name > > > Key: SPARK-36800 > URL: https://issues.apache.org/jira/browse/SPARK-36800 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: melin >Priority: Trivial > Attachments: SparkIssue.png > > > If the column name output by select is not a valid column name, the prompt is > not very clear, it is recommended to add a column name check > {code:java} > create table tdl_demo_dd as select 1+1{code} > !image-2021-09-19-17-25-02-239.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36800) The create table as select statement verifies the valid column name
[ https://issues.apache.org/jira/browse/SPARK-36800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435947#comment-17435947 ] dohongdayi commented on SPARK-36800: The image might be like: !SparkIssue.png! > The create table as select statement verifies the valid column name > > > Key: SPARK-36800 > URL: https://issues.apache.org/jira/browse/SPARK-36800 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: melin >Priority: Trivial > Attachments: SparkIssue.png > > > If the column name output by select is not a valid column name, the prompt is > not very clear, it is recommended to add a column name check > {code:java} > create table tdl_demo_dd as select 1+1{code} > !image-2021-09-19-17-25-02-239.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36800) The create table as select statement verifies the valid column name
[ https://issues.apache.org/jira/browse/SPARK-36800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dohongdayi updated SPARK-36800: --- Attachment: SparkIssue.png > The create table as select statement verifies the valid column name > > > Key: SPARK-36800 > URL: https://issues.apache.org/jira/browse/SPARK-36800 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: melin >Priority: Trivial > Attachments: SparkIssue.png > > > If the column name output by select is not a valid column name, the prompt is > not very clear, it is recommended to add a column name check > {code:java} > create table tdl_demo_dd as select 1+1{code} > !image-2021-09-19-17-25-02-239.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37162) Web UI tasks table column sort order is wrong
[ https://issues.apache.org/jira/browse/SPARK-37162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lichuanliang updated SPARK-37162: - Attachment: spark-3.2.0-web-ui.png > Web UI tasks table column sort order is wrong > - > > Key: SPARK-37162 > URL: https://issues.apache.org/jira/browse/SPARK-37162 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Lichuanliang >Priority: Minor > Attachments: spark-3.2.0-web-ui.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37162) Web UI tasks table column sort order is wrong
[ https://issues.apache.org/jira/browse/SPARK-37162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lichuanliang updated SPARK-37162: - Description: (was: !image-2021-10-29-19-20-23-667.png!) > Web UI tasks table column sort order is wrong > - > > Key: SPARK-37162 > URL: https://issues.apache.org/jira/browse/SPARK-37162 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Lichuanliang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37162) Web UI tasks table column sort order is wrong
Lichuanliang created SPARK-37162: Summary: Web UI tasks table column sort order is wrong Key: SPARK-37162 URL: https://issues.apache.org/jira/browse/SPARK-37162 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.2.0 Reporter: Lichuanliang !image-2021-10-29-19-20-23-667.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37095: Assignee: Apache Spark > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37095: Assignee: (was: Apache Spark) > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435917#comment-17435917 ] Apache Spark commented on SPARK-37095: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34439 > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py
[ https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435914#comment-17435914 ] Apache Spark commented on SPARK-37095: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34439 > Inline type hints for files in python/pyspark/broadcast.py > -- > > Key: SPARK-37095 > URL: https://issues.apache.org/jira/browse/SPARK-37095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
[ https://issues.apache.org/jira/browse/SPARK-37161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435880#comment-17435880 ] PengLei commented on SPARK-37161: - working on this > RowToColumnConverter support AnsiIntervalType > -- > > Key: SPARK-37161 > URL: https://issues.apache.org/jira/browse/SPARK-37161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > currently, we have RowToColumnConverter for all data types except > AnsiIntervalType > {code:java} > // code placeholder > val core = dataType match { > case BinaryType => BinaryConverter > case BooleanType => BooleanConverter > case ByteType => ByteConverter > case ShortType => ShortConverter > case IntegerType | DateType => IntConverter > case FloatType => FloatConverter > case LongType | TimestampType => LongConverter > case DoubleType => DoubleConverter > case StringType => StringConverter > case CalendarIntervalType => CalendarConverter > case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, > at.containsNull)) > case st: StructType => new StructConverter(st.fields.map( > (f) => getConverterForType(f.dataType, f.nullable))) > case dt: DecimalType => new DecimalConverter(dt) > case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = > false), > getConverterForType(mt.valueType, mt.valueContainsNull)) > case unknown => throw > QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) > } > if (nullable) { > dataType match { > case CalendarIntervalType => new StructNullableTypeConverter(core) > case st: StructType => new StructNullableTypeConverter(core) > case _ => new BasicNullableTypeConverter(core) > } > } else { > core > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37161) RowToColumnConverter support AnsiIntervalType
PengLei created SPARK-37161: --- Summary: RowToColumnConverter support AnsiIntervalType Key: SPARK-37161 URL: https://issues.apache.org/jira/browse/SPARK-37161 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: PengLei currently, we have RowToColumnConverter for all data types except AnsiIntervalType {code:java} // code placeholder val core = dataType match { case BinaryType => BinaryConverter case BooleanType => BooleanConverter case ByteType => ByteConverter case ShortType => ShortConverter case IntegerType | DateType => IntConverter case FloatType => FloatConverter case LongType | TimestampType => LongConverter case DoubleType => DoubleConverter case StringType => StringConverter case CalendarIntervalType => CalendarConverter case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, at.containsNull)) case st: StructType => new StructConverter(st.fields.map( (f) => getConverterForType(f.dataType, f.nullable))) case dt: DecimalType => new DecimalConverter(dt) case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = false), getConverterForType(mt.valueType, mt.valueContainsNull)) case unknown => throw QueryExecutionErrors.unsupportedDataTypeError(unknown.toString) } if (nullable) { dataType match { case CalendarIntervalType => new StructNullableTypeConverter(core) case st: StructType => new StructNullableTypeConverter(core) case _ => new BasicNullableTypeConverter(core) } } else { core } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37157) Inline type hints for python/pyspark/util.py
[ https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37157: Assignee: Apache Spark > Inline type hints for python/pyspark/util.py > > > Key: SPARK-37157 > URL: https://issues.apache.org/jira/browse/SPARK-37157 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37157) Inline type hints for python/pyspark/util.py
[ https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37157: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/util.py > > > Key: SPARK-37157 > URL: https://issues.apache.org/jira/browse/SPARK-37157 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37157) Inline type hints for python/pyspark/util.py
[ https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435876#comment-17435876 ] Apache Spark commented on SPARK-37157: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34438 > Inline type hints for python/pyspark/util.py > > > Key: SPARK-37157 > URL: https://issues.apache.org/jira/browse/SPARK-37157 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37156) Inline type hints for python/pyspark/storagelevel.py
[ https://issues.apache.org/jira/browse/SPARK-37156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37156: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/storagelevel.py > > > Key: SPARK-37156 > URL: https://issues.apache.org/jira/browse/SPARK-37156 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37156) Inline type hints for python/pyspark/storagelevel.py
[ https://issues.apache.org/jira/browse/SPARK-37156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37156: Assignee: Apache Spark > Inline type hints for python/pyspark/storagelevel.py > > > Key: SPARK-37156 > URL: https://issues.apache.org/jira/browse/SPARK-37156 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37156) Inline type hints for python/pyspark/storagelevel.py
[ https://issues.apache.org/jira/browse/SPARK-37156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435875#comment-17435875 ] Apache Spark commented on SPARK-37156: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34437 > Inline type hints for python/pyspark/storagelevel.py > > > Key: SPARK-37156 > URL: https://issues.apache.org/jira/browse/SPARK-37156 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic
[ https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36975. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34400 [https://github.com/apache/spark/pull/34400] > Refactor HiveClientImpl collect hive client call logic > -- > > Key: SPARK-36975 > URL: https://issues.apache.org/jira/browse/SPARK-36975 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > Currently, we treat one call withHiveState as one Hive Client call, it's too > weirld. Need to refator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic
[ https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36975: --- Assignee: angerszhu > Refactor HiveClientImpl collect hive client call logic > -- > > Key: SPARK-36975 > URL: https://issues.apache.org/jira/browse/SPARK-36975 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Currently, we treat one call withHiveState as one Hive Client call, it's too > weirld. Need to refator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37149. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34427 [https://github.com/apache/spark/pull/34427] > Improve error messages for arithmetic overflow under ANSI mode > -- > > Key: SPARK-37149 > URL: https://issues.apache.org/jira/browse/SPARK-37149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0 > > > Improve error messages for arithmetic overflow exceptions. We can instruct > users to 1) turn off ANSI mode or 2) use `try_` functions if applicable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37149) Improve error messages for arithmetic overflow under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-37149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37149: --- Assignee: Allison Wang > Improve error messages for arithmetic overflow under ANSI mode > -- > > Key: SPARK-37149 > URL: https://issues.apache.org/jira/browse/SPARK-37149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Improve error messages for arithmetic overflow exceptions. We can instruct > users to 1) turn off ANSI mode or 2) use `try_` functions if applicable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37160) Add a config to optionally disable paddin for char type
[ https://issues.apache.org/jira/browse/SPARK-37160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435851#comment-17435851 ] Apache Spark commented on SPARK-37160: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/34436 > Add a config to optionally disable paddin for char type > --- > > Key: SPARK-37160 > URL: https://issues.apache.org/jira/browse/SPARK-37160 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37160) Add a config to optionally disable paddin for char type
[ https://issues.apache.org/jira/browse/SPARK-37160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37160: Assignee: Wenchen Fan (was: Apache Spark) > Add a config to optionally disable paddin for char type > --- > > Key: SPARK-37160 > URL: https://issues.apache.org/jira/browse/SPARK-37160 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37160) Add a config to optionally disable paddin for char type
[ https://issues.apache.org/jira/browse/SPARK-37160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37160: Assignee: Apache Spark (was: Wenchen Fan) > Add a config to optionally disable paddin for char type > --- > > Key: SPARK-37160 > URL: https://issues.apache.org/jira/browse/SPARK-37160 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37160) Add a config to optionally disable paddin for char type
Wenchen Fan created SPARK-37160: --- Summary: Add a config to optionally disable paddin for char type Key: SPARK-37160 URL: https://issues.apache.org/jira/browse/SPARK-37160 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37159: Assignee: Apache Spark (was: Kousuke Saruta) > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO > BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMap
[jira] [Assigned] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37159: Assignee: Kousuke Saruta (was: Apache Spark) > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO > BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyM
[jira] [Commented] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435819#comment-17435819 ] Apache Spark commented on SPARK-37159: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/34425 > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21
[jira] [Commented] (SPARK-37094) Inline type hints for files in python/pyspark
[ https://issues.apache.org/jira/browse/SPARK-37094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435818#comment-17435818 ] Byron Hsu commented on SPARK-37094: --- [~dchvn] sure! feel free to take it! > Inline type hints for files in python/pyspark > - > > Key: SPARK-37094 > URL: https://issues.apache.org/jira/browse/SPARK-37094 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
Kousuke Saruta created SPARK-37159: -- Summary: Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 Key: SPARK-37159 URL: https://issues.apache.org/jira/browse/SPARK-37159 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but `HiveExternalCatalogVersionsSuite`. {code} [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** (42 seconds, 526 milliseconds) [info] spark-submit returned with exit code 1. [info] Command line: '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' 'spark.sql.hive.metastore.version=2.3' '--conf' 'spark.sql.hive.metastore.jars=maven' '--conf' 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' [info] [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO SparkContext: Running Spark version 3.2.0 [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: == [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: No custom resources configured for spark.driver. [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO ResourceUtils: == [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO SparkContext: Submitted application: prepare testing tables [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO ResourceProfile: Limiting resource is cpu [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO ResourceProfileManager: Added ResourceProfile id: 0 [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing view acls to: kou [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing modify acls to: kou [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing view acls groups to: [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: Changing modify acls groups to: [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kou); groups with view permissions: Set(); users with modify permissions: Set(kou); groups with modify permissions: Set() [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: Successfully started service 'sparkDriver' on port 35867. [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: Registering MapOutputTracker [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: Registering BlockManagerMaster [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information [info] 2021-10-28 06:07:18.944 - stderr> 21/10/28 22:07:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up [info] 2021-10-28 06:07:18.945 - stdout> Traceback (most recent call last): [info] 2021-10-28 06:07:18.946 - stdout> File "/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsS
[jira] [Assigned] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py
[ https://issues.apache.org/jira/browse/SPARK-37155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37155: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/statcounter.py > --- > > Key: SPARK-37155 > URL: https://issues.apache.org/jira/browse/SPARK-37155 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py
[ https://issues.apache.org/jira/browse/SPARK-37155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435815#comment-17435815 ] Apache Spark commented on SPARK-37155: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34435 > Inline type hints for python/pyspark/statcounter.py > --- > > Key: SPARK-37155 > URL: https://issues.apache.org/jira/browse/SPARK-37155 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py
[ https://issues.apache.org/jira/browse/SPARK-37155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435816#comment-17435816 ] Apache Spark commented on SPARK-37155: -- User 'dchvn' has created a pull request for this issue: https://github.com/apache/spark/pull/34435 > Inline type hints for python/pyspark/statcounter.py > --- > > Key: SPARK-37155 > URL: https://issues.apache.org/jira/browse/SPARK-37155 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37155) Inline type hints for python/pyspark/statcounter.py
[ https://issues.apache.org/jira/browse/SPARK-37155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37155: Assignee: Apache Spark > Inline type hints for python/pyspark/statcounter.py > --- > > Key: SPARK-37155 > URL: https://issues.apache.org/jira/browse/SPARK-37155 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37094) Inline type hints for files in python/pyspark
[ https://issues.apache.org/jira/browse/SPARK-37094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435810#comment-17435810 ] dch nguyen commented on SPARK-37094: [~ByronHsu] I have worked on some issues (statcounter, storagelevel and util) last week but haven't created PRs yet, so i create them soon! . Sorry for this! > Inline type hints for files in python/pyspark > - > > Key: SPARK-37094 > URL: https://issues.apache.org/jira/browse/SPARK-37094 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org