[jira] [Commented] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-10-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201418#comment-16201418
 ] 

Dongjoon Hyun commented on SPARK-21762:
---

Since this is a regression like SPARK-22258, I updated the priority.

> FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new 
> file isn't yet visible
> 
>
> Key: SPARK-21762
> URL: https://issues.apache.org/jira/browse/SPARK-21762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: object stores without complete creation consistency 
> (this includes AWS S3's caching of negative GET results)
>Reporter: Steve Loughran
>
> The metrics collection of SPARK-20703 can trigger premature failure if the 
> newly written object isn't actually visible yet, that is if, after 
> {{writer.close()}}, a {{getFileStatus(path)}} returns a 
> {{FileNotFoundException}}.
> Strictly speaking, not having a file immediately visible goes against the 
> fundamental expectations of the Hadoop FS APIs, namely full consistent data & 
> medata across all operations, with immediate global visibility of all 
> changes. However, not all object stores make that guarantee, be it only newly 
> created data or updated blobs. And so spurious FNFEs can get raised, ones 
> which *should* have gone away by the time the actual task is committed. Or if 
> they haven't, the job is in such deep trouble.
> What to do?
> # leave as is: fail fast & so catch blobstores/blobstore clients which don't 
> behave as required. One issue here: will that trigger retries, what happens 
> there, etc, etc.
> # Swallow the FNFE and hope the file is observable later.
> # Swallow all IOEs and hope that whatever problem the FS has is transient.
> Options 2 & 3 aren't going to collect metrics in the event of a FNFE, or at 
> least, not the counter of bytes written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144045#comment-16144045
 ] 

Apache Spark commented on SPARK-21762:
--

User 'steveloughran' has created a pull request for this issue:
https://github.com/apache/spark/pull/18979

> FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new 
> file isn't yet visible
> 
>
> Key: SPARK-21762
> URL: https://issues.apache.org/jira/browse/SPARK-21762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: object stores without complete creation consistency 
> (this includes AWS S3's caching of negative GET results)
>Reporter: Steve Loughran
>Priority: Minor
>
> The metrics collection of SPARK-20703 can trigger premature failure if the 
> newly written object isn't actually visible yet, that is if, after 
> {{writer.close()}}, a {{getFileStatus(path)}} returns a 
> {{FileNotFoundException}}.
> Strictly speaking, not having a file immediately visible goes against the 
> fundamental expectations of the Hadoop FS APIs, namely full consistent data & 
> medata across all operations, with immediate global visibility of all 
> changes. However, not all object stores make that guarantee, be it only newly 
> created data or updated blobs. And so spurious FNFEs can get raised, ones 
> which *should* have gone away by the time the actual task is committed. Or if 
> they haven't, the job is in such deep trouble.
> What to do?
> # leave as is: fail fast & so catch blobstores/blobstore clients which don't 
> behave as required. One issue here: will that trigger retries, what happens 
> there, etc, etc.
> # Swallow the FNFE and hope the file is observable later.
> # Swallow all IOEs and hope that whatever problem the FS has is transient.
> Options 2 & 3 aren't going to collect metrics in the event of a FNFE, or at 
> least, not the counter of bytes written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21762) FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible

2017-08-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131190#comment-16131190
 ] 

Steve Loughran commented on SPARK-21762:


SPARK-20703 simplifies this, especially testing, as it's isolated from 
FileFormatWriter. Same problem exists though: if you are getting any Create 
inconsistency, metrics probes trigger failures which may not be present by the 
time task commit actually takes place

> FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new 
> file isn't yet visible
> 
>
> Key: SPARK-21762
> URL: https://issues.apache.org/jira/browse/SPARK-21762
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: object stores without complete creation consistency 
> (this includes AWS S3's caching of negative GET results)
>Reporter: Steve Loughran
>Priority: Minor
>
> The metrics collection of SPARK-20703 can trigger premature failure if the 
> newly written object isn't actually visible yet, that is if, after 
> {{writer.close()}}, a {{getFileStatus(path)}} returns a 
> {{FileNotFoundException}}.
> Strictly speaking, not having a file immediately visible goes against the 
> fundamental expectations of the Hadoop FS APIs, namely full consistent data & 
> medata across all operations, with immediate global visibility of all 
> changes. However, not all object stores make that guarantee, be it only newly 
> created data or updated blobs. And so spurious FNFEs can get raised, ones 
> which *should* have gone away by the time the actual task is committed. Or if 
> they haven't, the job is in such deep trouble.
> What to do?
> # leave as is: fail fast & so catch blobstores/blobstore clients which don't 
> behave as required. One issue here: will that trigger retries, what happens 
> there, etc, etc.
> # Swallow the FNFE and hope the file is observable later.
> # Swallow all IOEs and hope that whatever problem the FS has is transient.
> Options 2 & 3 aren't going to collect metrics in the event of a FNFE, or at 
> least, not the counter of bytes written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org