[jira] [Updated] (SPARK-36810) Handle HDSF read inconsistencies on Spark when observer Namenode is used

Erik Krogen (Jira) Mon, 20 Sep 2021 10:59:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-36810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erik Krogen updated SPARK-36810:
--------------------------------
    Description: 
In short, with HDFS HA and with the use of [Observer 
Namenode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]
 the read-after-write consistency is only available when both the write and the 
read happens from the same client.

But if the write happens on executor and the read happens on the driver, then 
the reads would be inconsistent causing application failure issues. This can be 
fixed by calling `FileSystem.msync` before making any read calls where the 
client thinks the write could have possibly happened elsewhere.

This issue is discussed in greater detail in this 
[discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser]
 

  was:
In short, with HDFS HA and with the use of [Observer 
Namenode|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html],]
 the read-after-write consistency is only available when both the write and the 
read happens from the same client.

But if the write happens on executor and the read happens on the driver, then 
the reads would be inconsistent causing application failure issues. This can be 
fixed by calling `FileSystem.msync` before making any read calls where the 
client thinks the write could have possibly happened elsewhere.

This issue is discussed in greater detail in this 
[discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser]
 


> Handle HDSF read inconsistencies on Spark when observer Namenode is used
> ------------------------------------------------------------------------
>
>                 Key: SPARK-36810
>                 URL: https://issues.apache.org/jira/browse/SPARK-36810
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.2.0
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>
> In short, with HDFS HA and with the use of [Observer 
> Namenode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]
>  the read-after-write consistency is only available when both the write and 
> the read happens from the same client.
> But if the write happens on executor and the read happens on the driver, then 
> the reads would be inconsistent causing application failure issues. This can 
> be fixed by calling `FileSystem.msync` before making any read calls where the 
> client thinks the write could have possibly happened elsewhere.
> This issue is discussed in greater detail in this 
> [discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36810) Handle HDSF read inconsistencies on Spark when observer Namenode is used

Reply via email to