[jira] [Commented] (SPARK-47223) Update usage of deprecated Thread.getId() to Thread.threadId()
[ https://issues.apache.org/jira/browse/SPARK-47223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1788#comment-1788 ] Neil Gupta commented on SPARK-47223: Update: Realized that the minimum Java version here is Java 17, thus cannot use the method since it exists in Java19+. > Update usage of deprecated Thread.getId() to Thread.threadId() > -- > > Key: SPARK-47223 > URL: https://issues.apache.org/jira/browse/SPARK-47223 > Project: Spark > Issue Type: Request > Components: Spark Core, SQL >Affects Versions: 3.5.1 >Reporter: Neil Gupta >Priority: Trivial > Labels: pull-request-available > Fix For: 3.5.1 > > > Update usage of deprecated Thread.getId() to Thread.threadId(). > > Currently in Spark, there are multiple references still to the deprecated > method [Thread.getId()|#getId()]] given that the current version is using > Java 21. Java officially requests any type of usage to be switched to the > [Thread.threadId()|#threadId()]] method instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47223) Update usage of deprecated Thread.getId() to Thread.threadId()
[ https://issues.apache.org/jira/browse/SPARK-47223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821925#comment-17821925 ] Neil Gupta commented on SPARK-47223: I can take a stab it this one myself > Update usage of deprecated Thread.getId() to Thread.threadId() > -- > > Key: SPARK-47223 > URL: https://issues.apache.org/jira/browse/SPARK-47223 > Project: Spark > Issue Type: Request > Components: Spark Core, SQL >Affects Versions: 3.5.1 >Reporter: Neil Gupta >Priority: Trivial > Fix For: 3.5.1 > > > Update usage of deprecated Thread.getId() to Thread.threadId(). > > Currently in Spark, there are multiple references still to the deprecated > method [Thread.getId()|#getId()]] given that the current version is using > Java 21. Java officially requests any type of usage to be switched to the > [Thread.threadId()|#threadId()]] method instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47223) Update usage of deprecated Thread.getId() to Thread.threadId()
Neil Gupta created SPARK-47223: -- Summary: Update usage of deprecated Thread.getId() to Thread.threadId() Key: SPARK-47223 URL: https://issues.apache.org/jira/browse/SPARK-47223 Project: Spark Issue Type: Request Components: Spark Core, SQL Affects Versions: 3.5.1 Reporter: Neil Gupta Fix For: 3.5.1 Update usage of deprecated Thread.getId() to Thread.threadId(). Currently in Spark, there are multiple references still to the deprecated method [`Thread.getId()`|[https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#getId()]|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#getId()],] given that the current version is using Java 21. Java officially requests any type of usage to be switched to the [`Thread.threadId()`|[https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#threadId()]] method instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47223) Update usage of deprecated Thread.getId() to Thread.threadId()
[ https://issues.apache.org/jira/browse/SPARK-47223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Gupta updated SPARK-47223: --- Description: Update usage of deprecated Thread.getId() to Thread.threadId(). Currently in Spark, there are multiple references still to the deprecated method [Thread.getId()|#getId()]] given that the current version is using Java 21. Java officially requests any type of usage to be switched to the [Thread.threadId()|#threadId()]] method instead. was: Update usage of deprecated Thread.getId() to Thread.threadId(). Currently in Spark, there are multiple references still to the deprecated method [`Thread.getId()`|[https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#getId()]|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#getId()],] given that the current version is using Java 21. Java officially requests any type of usage to be switched to the [`Thread.threadId()`|[https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#threadId()]] method instead. > Update usage of deprecated Thread.getId() to Thread.threadId() > -- > > Key: SPARK-47223 > URL: https://issues.apache.org/jira/browse/SPARK-47223 > Project: Spark > Issue Type: Request > Components: Spark Core, SQL >Affects Versions: 3.5.1 >Reporter: Neil Gupta >Priority: Trivial > Fix For: 3.5.1 > > > Update usage of deprecated Thread.getId() to Thread.threadId(). > > Currently in Spark, there are multiple references still to the deprecated > method [Thread.getId()|#getId()]] given that the current version is using > Java 21. Java officially requests any type of usage to be switched to the > [Thread.threadId()|#threadId()]] method instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39104) Null Pointer Exeption on unpersist call
[ https://issues.apache.org/jira/browse/SPARK-39104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533546#comment-17533546 ] Neil Gupta commented on SPARK-39104: Do you have reproduction steps? > Null Pointer Exeption on unpersist call > --- > > Key: SPARK-39104 > URL: https://issues.apache.org/jira/browse/SPARK-39104 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Denis >Priority: Major > > DataFrame.unpesist call fails wth NPE > > {code:java} > java.lang.NullPointerException > at > org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedRDDLoaded(InMemoryRelation.scala:247) > at > org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedColumnBuffersLoaded(InMemoryRelation.scala:241) > at > org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8(CacheManager.scala:189) > at > org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8$adapted(CacheManager.scala:176) > at > scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303) > at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) > at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) > at scala.collection.TraversableLike.filter(TraversableLike.scala:395) > at scala.collection.TraversableLike.filter$(TraversableLike.scala:395) > at scala.collection.AbstractTraversable.filter(Traversable.scala:108) > at > org.apache.spark.sql.execution.CacheManager.recacheByCondition(CacheManager.scala:219) > at > org.apache.spark.sql.execution.CacheManager.uncacheQuery(CacheManager.scala:176) > at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3220) > at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3231){code} > Looks like syncronization in required for > org.apache.spark.sql.execution.columnar.CachedRDDBuilder#isCachedColumnBuffersLoaded > > {code:java} > def isCachedColumnBuffersLoaded: Boolean = { > _cachedColumnBuffers != null && isCachedRDDLoaded > } > def isCachedRDDLoaded: Boolean = { > _cachedColumnBuffersAreLoaded || { > val bmMaster = SparkEnv.get.blockManager.master > val rddLoaded = _cachedColumnBuffers.partitions.forall { partition => > bmMaster.getBlockStatus(RDDBlockId(_cachedColumnBuffers.id, > partition.index), false) > .exists { case(_, blockStatus) => blockStatus.isCached } > } > if (rddLoaded) { > _cachedColumnBuffersAreLoaded = rddLoaded > } > rddLoaded > } > } {code} > isCachedRDDLoaded relies on _cachedColumnBuffers != null check while it can > be changed concurrently from other thread. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39104) Null Pointer Exeption on unpersist call
[ https://issues.apache.org/jira/browse/SPARK-39104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533546#comment-17533546 ] Neil Gupta edited comment on SPARK-39104 at 5/9/22 12:06 AM: - Hi Denis, do you have reproduction steps? was (Author: neilagupta): Do you have reproduction steps? > Null Pointer Exeption on unpersist call > --- > > Key: SPARK-39104 > URL: https://issues.apache.org/jira/browse/SPARK-39104 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Denis >Priority: Major > > DataFrame.unpesist call fails wth NPE > > {code:java} > java.lang.NullPointerException > at > org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedRDDLoaded(InMemoryRelation.scala:247) > at > org.apache.spark.sql.execution.columnar.CachedRDDBuilder.isCachedColumnBuffersLoaded(InMemoryRelation.scala:241) > at > org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8(CacheManager.scala:189) > at > org.apache.spark.sql.execution.CacheManager.$anonfun$uncacheQuery$8$adapted(CacheManager.scala:176) > at > scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303) > at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) > at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) > at scala.collection.TraversableLike.filter(TraversableLike.scala:395) > at scala.collection.TraversableLike.filter$(TraversableLike.scala:395) > at scala.collection.AbstractTraversable.filter(Traversable.scala:108) > at > org.apache.spark.sql.execution.CacheManager.recacheByCondition(CacheManager.scala:219) > at > org.apache.spark.sql.execution.CacheManager.uncacheQuery(CacheManager.scala:176) > at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3220) > at org.apache.spark.sql.Dataset.unpersist(Dataset.scala:3231){code} > Looks like syncronization in required for > org.apache.spark.sql.execution.columnar.CachedRDDBuilder#isCachedColumnBuffersLoaded > > {code:java} > def isCachedColumnBuffersLoaded: Boolean = { > _cachedColumnBuffers != null && isCachedRDDLoaded > } > def isCachedRDDLoaded: Boolean = { > _cachedColumnBuffersAreLoaded || { > val bmMaster = SparkEnv.get.blockManager.master > val rddLoaded = _cachedColumnBuffers.partitions.forall { partition => > bmMaster.getBlockStatus(RDDBlockId(_cachedColumnBuffers.id, > partition.index), false) > .exists { case(_, blockStatus) => blockStatus.isCached } > } > if (rddLoaded) { > _cachedColumnBuffersAreLoaded = rddLoaded > } > rddLoaded > } > } {code} > isCachedRDDLoaded relies on _cachedColumnBuffers != null check while it can > be changed concurrently from other thread. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39091) SQL Expression traits don't compose due to nodePatterns being final.
[ https://issues.apache.org/jira/browse/SPARK-39091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531282#comment-17531282 ] Neil Gupta commented on SPARK-39091: Tried to implement a fix - [https://github.com/apache/spark/pull/36441.] It might need to be expanded a little. > SQL Expression traits don't compose due to nodePatterns being final. > > > Key: SPARK-39091 > URL: https://issues.apache.org/jira/browse/SPARK-39091 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: Huw >Priority: Major > > In Spark 3.1 I have an expression which contains these parts: > {code:scala} > case class MyExploder( > arrays: Expression,// Array[AnyDataType] > asOfDate: Expression, // LambdaFunction[AnyDataType -> TimestampType] > extractor: Expression, // TimestampType > ) extends HigherOrderFunction with Generator with TimeZoneAwareExpression { > override def arguments: Seq[Expression] = > Seq(arrays, asOfDate) > override def argumentTypes: Seq[AbstractDataType] = > Seq(ArrayType, TimestampType) > override def functions: Seq[Expression] = > Seq(extractor) > override def functionTypes = > Seq(TimestampType) > }{code} > > This is grossly simplified example. The extractor is a lambda which can > gather information from a nested array, and explodes based on some business > logic. > When upgrading to Spark 3.2 however this can't work anymore, because they > have conflicting final values for nodePatterns. > {code:java} > trait HigherOrderFunction extends Expression with ExpectsInputTypes { > final override val nodePatterns: Seq[TreePattern] = Seq(HIGH_ORDER_FUNCTION) > } {code} > > We get this errror. > {noformat} > value nodePatterns in trait TimeZoneAwareExpression of type > Seq[org.apache.spark.sql.catalyst.trees.TreePattern.TreePattern] cannot > override final member{noformat} > > This blocks us from upgrading. What's doubly annoying is that the actual > value of the member appears to be the same. > > Thank you for your time. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39091) SQL Expression traits don't compose due to nodePatterns being final.
[ https://issues.apache.org/jira/browse/SPARK-39091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531282#comment-17531282 ] Neil Gupta edited comment on SPARK-39091 at 5/3/22 4:28 PM: Tried to implement a fix - [https://github.com/apache/spark/pull/36441.] It might need to be expanded to other traits but only covered a few traits especially those covered in your example. was (Author: neilagupta): Tried to implement a fix - [https://github.com/apache/spark/pull/36441.] It might need to be expanded a little. > SQL Expression traits don't compose due to nodePatterns being final. > > > Key: SPARK-39091 > URL: https://issues.apache.org/jira/browse/SPARK-39091 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: Huw >Priority: Major > > In Spark 3.1 I have an expression which contains these parts: > {code:scala} > case class MyExploder( > arrays: Expression,// Array[AnyDataType] > asOfDate: Expression, // LambdaFunction[AnyDataType -> TimestampType] > extractor: Expression, // TimestampType > ) extends HigherOrderFunction with Generator with TimeZoneAwareExpression { > override def arguments: Seq[Expression] = > Seq(arrays, asOfDate) > override def argumentTypes: Seq[AbstractDataType] = > Seq(ArrayType, TimestampType) > override def functions: Seq[Expression] = > Seq(extractor) > override def functionTypes = > Seq(TimestampType) > }{code} > > This is grossly simplified example. The extractor is a lambda which can > gather information from a nested array, and explodes based on some business > logic. > When upgrading to Spark 3.2 however this can't work anymore, because they > have conflicting final values for nodePatterns. > {code:java} > trait HigherOrderFunction extends Expression with ExpectsInputTypes { > final override val nodePatterns: Seq[TreePattern] = Seq(HIGH_ORDER_FUNCTION) > } {code} > > We get this errror. > {noformat} > value nodePatterns in trait TimeZoneAwareExpression of type > Seq[org.apache.spark.sql.catalyst.trees.TreePattern.TreePattern] cannot > override final member{noformat} > > This blocks us from upgrading. What's doubly annoying is that the actual > value of the member appears to be the same. > > Thank you for your time. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org