[ 
https://issues.apache.org/jira/browse/CASSANDRA-7582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076540#comment-14076540
 ] 

Jonathan Ellis commented on CASSANDRA-7582:
-------------------------------------------

I see two actual classes of CL errors:

# Table is dropped and we are replaying stale data that should also have been 
dropped.  Blocking startup is the Wrong Solution.
# Hardware problem caused a checksum mismatch.  Blocking startup is the Wrong 
Solution.

Granted that blocking startup can help prevent user errors during PIT recover, 
that's an entirely hypothetical situation today; PIT is only nominally usable.  
(Fork the JVM every time a CL segment finishes?  Yeah.)  So let's not optimize 
for that at the expense of scenarios we see frequently.

I think we should roll back 7125 until we can do it right.  Doing it right 
probably means, remembering old cfids in 2.1.x, then we can get paranoid about 
seeing them in the CL for 3.0.  (Getting paranoid in the same version as we 
start remembering is bad for obvious reasons.)

> 2.1 multi-dc upgrade errors
> ---------------------------
>
>                 Key: CASSANDRA-7582
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7582
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.1.1
>
>
> Multi-dc upgrade [was working from 2.0 -> 2.1 fairly 
> recently|http://cassci.datastax.com/job/cassandra_upgrade_dtest/55/testReport/upgrade_through_versions_test/TestUpgrade_from_cassandra_2_0_latest_tag_to_cassandra_2_1_HEAD/],
>  but is currently failing.
> Running 
> upgrade_through_versions_test.py:TestUpgrade_from_cassandra_2_0_HEAD_to_cassandra_2_1_HEAD.bootstrap_multidc_test
>  I get the following errors when starting 2.1 upgraded from 2.0:
> {code}
> ERROR [main] 2014-07-21 23:54:20,862 CommitLog.java:143 - Commit log replay 
> failed due to replaying a mutation for a missing table. This error can be 
> ignored by providing -Dcassandra.commitlog.stop_on_missing_tables=false on 
> the command line
> ERROR [main] 2014-07-21 23:54:20,869 CassandraDaemon.java:474 - Exception 
> encountered during startup
> java.lang.RuntimeException: 
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
> cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) 
> [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457)
>  [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) 
> [main/:na]
> Caused by: org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't 
> find cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:353)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:333)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:365)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:98)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:137) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:115) 
> ~[main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to