[ 
https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-26851:
----------------------------------
    Description: 
In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to 
lazily initialize {{_cachedColumnBuffers}}. Also, clearCache uses 
double-checked locking to likely avoid synchronization when 
{{_cachedColumnBuffers}} is still null.

However, the resource (in this case, {{_cachedColumnBuffers}}) is not declared 
as volatile, which could cause some visibility problems in both methods. 
{{cachedColumnBuffers}} may see a null reference when there is actually a 
constructed RDD, and {{clearCache}} may pick up a reference to partially 
constructed RDD.

>From Java Concurrency in Practice by Brian Goetz et al:
{quote}Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to 
work if resource is made volatile, and the performance impact of this is small 
since volatile reads are usually only slightly more expensive than nonvolatile 
reads.
{quote}
There are comments in other documentation that volatile is not needed if the 
resourceĀ is immutable. While an RDD is immutable from a Spark user's point of 
view, it may not be from a JVM's point of view, since not all internal fields 
are final.

I've marked this as minor since the race conditions are highly unlikely.

  was:
In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to 
lazily initialize {{_cachedColumnBuffers}}. Also, clearCache uses 
double-checked locking to likely avoid synchronization when 
{{_cachedColumnBuffers}} is still null.

However, the resource (in this case, {{_cachedColumnBuffers}}) is not declared 
as volatile, which could cause some visibility problems in both methods. 
{{cachedColumnBuffers}} may see a null reference when there is actually a 
constructed RDD, and {{clearCache}} may pick up a reference to partially 
constructed RDD.

>From Java Concurrency in Practice by Brian Goetz et al:
{quote}Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to 
work if resource is made volatile, and the performance impact of this is small 
since volatile reads are usually only slightly more expensive than nonvolatile 
reads.
{quote}
There are comments in other documentation that volatile is not needed if the 
resource in question (again, in this case {{_cachedColumnBuffers}}) is 
immutable. While an RDD is immutable from a Spark user's point of view, it may 
not be from a JVM's point of view, since not all internal fields are final.

I've marked this as minor since the race conditions are highly unlikely.


> CachedRDDBuilder only partially implements double-checked locking
> -----------------------------------------------------------------
>
>                 Key: SPARK-26851
>                 URL: https://issues.apache.org/jira/browse/SPARK-26851
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Bruce Robbins
>            Priority: Minor
>
> In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to 
> lazily initialize {{_cachedColumnBuffers}}. Also, clearCache uses 
> double-checked locking to likely avoid synchronization when 
> {{_cachedColumnBuffers}} is still null.
> However, the resource (in this case, {{_cachedColumnBuffers}}) is not 
> declared as volatile, which could cause some visibility problems in both 
> methods. {{cachedColumnBuffers}} may see a null reference when there is 
> actually a constructed RDD, and {{clearCache}} may pick up a reference to 
> partially constructed RDD.
> From Java Concurrency in Practice by Brian Goetz et al:
> {quote}Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to 
> work if resource is made volatile, and the performance impact of this is 
> small since volatile reads are usually only slightly more expensive than 
> nonvolatile reads.
> {quote}
> There are comments in other documentation that volatile is not needed if the 
> resourceĀ is immutable. While an RDD is immutable from a Spark user's point of 
> view, it may not be from a JVM's point of view, since not all internal fields 
> are final.
> I've marked this as minor since the race conditions are highly unlikely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to