[GitHub] spark pull request #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message ...

lw-lin Thu, 16 Mar 2017 18:41:54 -0700

GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/17327


    [SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch in 
log files

    ## Problem
    
    There are several places where we write out version identifiers in various 
logs for structured streaming (usually `v1`). However, in the places where we 
check for this, we throw a confusing error message.
    
    ## What changes were proposed in this pull request?
    
    This patch made two major changes:
    1. added a `parseVersion(...)` method, and based on this method, fixed the 
following places the way they did version checking (no other place needed to do 
this checking):
    ```
    HDFSMetadataLog
      - CompactibleFileStreamLog  ------------> fixed with this patch
        - FileStreamSourceLog  ---------------> inherited the fix of 
`CompactibleFileStreamLog`
        - FileStreamSinkLog  -----------------> inherited the fix of 
`CompactibleFileStreamLog`
      - OffsetSeqLog  ------------------------> fixed with this patch
      - anonymous subclass in KafkaSource  ---> fixed with this patch
    ```
    
    2. changed the type of `FileStreamSinkLog.VERSION`, 
`FileStreamSourceLog.VERSION` etc. from `String` to `Int`, so that we can 
identify newer versions via `version > 1` instead of `version != "v1"`
        - note this didn't break any backwards compatibility -- we are still 
writing out `"v1"` and reading back `"v1"`
    
    ## Exception message with this patch
    ```
    java.lang.IllegalStateException: Failed to read log file 
/private/var/folders/nn/82rmvkk568sd8p3p8tb33trw0000gn/T/spark-86867b65-0069-4ef1-b0eb-d8bd258ff5b8/0.
 UnsupportedLogVersion: maximum supported log version is v1, but encountered 
v99. The log file was produced by a newer version of Spark and cannot be read 
by this version. Please upgrade.
        at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:202)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite$$anonfun$3$$anonfun$apply$mcV$sp$2.apply(OffsetSeqLogSuite.scala:78)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite$$anonfun$3$$anonfun$apply$mcV$sp$2.apply(OffsetSeqLogSuite.scala:75)
        at 
org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:133)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite.withTempDir(OffsetSeqLogSuite.scala:26)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite$$anonfun$3.apply$mcV$sp(OffsetSeqLogSuite.scala:75)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite$$anonfun$3.apply(OffsetSeqLogSuite.scala:75)
        at 
org.apache.spark.sql.execution.streaming.OffsetSeqLogSuite$$anonfun$3.apply(OffsetSeqLogSuite.scala:75)
        at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
        at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    ```
    
    
    ## How was this patch tested?
    
    unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark good-msg-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17327
    
----
commit daabb27aa32cb19c157e19081f6d08ff368bb42b
Author: Liwei Lin <lwl...@gmail.com>
Date:   2017-02-25T03:46:35Z

    Fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message ...

Reply via email to