[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485968#comment-15485968 ] ASF GitHub Bot commented on TEPHRA-179: --- Github user poornachandra commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/10#discussion_r78485992 --- Diff: tephra-core/src/main/java/org/apache/tephra/runtime/DefaultTransactionManagerProvider.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tephra.runtime; + +import com.google.common.annotations.VisibleForTesting; +import com.google.inject.AbstractModule; +import com.google.inject.Guice; +import com.google.inject.Inject; +import com.google.inject.Injector; +import com.google.inject.Module; +import com.google.inject.Provider; +import org.apache.hadoop.conf.Configuration; +import org.apache.tephra.TransactionManager; +import org.apache.twill.zookeeper.ZKClient; +import org.apache.twill.zookeeper.ZKClientService; + +/** + * A provider for {@link TransactionManager} that provides a new instance every time. + */ +public class DefaultTransactionManagerProvider implements Provider { + private final Configuration conf; + private final ZKClientService zkClientService; + + @Inject + public DefaultTransactionManagerProvider(Configuration conf, ZKClientService zkClientService) { +this.conf = conf; +this.zkClientService = zkClientService; + } + + @Override + public TransactionManager get() { +// Create a new injector every time since Guice services cannot be restarted TEPHRA-179 +Injector injector = Guice.createInjector( --- End diff -- This will require more decoupling. We'll need to break the class hierarchy into HA Transaction Service, Transaction Service and Transaction Manager. For now I have added some get methods to help testing in PR https://github.com/apache/incubator-tephra/pull/11 > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb)
[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484575#comment-15484575 ] ASF GitHub Bot commented on TEPHRA-179: --- Github user chtyim commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/10#discussion_r78407204 --- Diff: tephra-core/src/main/java/org/apache/tephra/runtime/DefaultTransactionManagerProvider.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tephra.runtime; + +import com.google.common.annotations.VisibleForTesting; +import com.google.inject.AbstractModule; +import com.google.inject.Guice; +import com.google.inject.Inject; +import com.google.inject.Injector; +import com.google.inject.Module; +import com.google.inject.Provider; +import org.apache.hadoop.conf.Configuration; +import org.apache.tephra.TransactionManager; +import org.apache.twill.zookeeper.ZKClient; +import org.apache.twill.zookeeper.ZKClientService; + +/** + * A provider for {@link TransactionManager} that provides a new instance every time. + */ +public class DefaultTransactionManagerProvider implements Provider { + private final Configuration conf; + private final ZKClientService zkClientService; + + @Inject + public DefaultTransactionManagerProvider(Configuration conf, ZKClientService zkClientService) { +this.conf = conf; +this.zkClientService = zkClientService; + } + + @Override + public TransactionManager get() { +// Create a new injector every time since Guice services cannot be restarted TEPHRA-179 +Injector injector = Guice.createInjector( --- End diff -- This is quite hacky in the way that usual provider doesn't create instance with a different injector. If all we need is a new instance, why not new it directly in here? > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb) 64 > -p: processes unlimited > -n: file descriptors 65536 > -v:
[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484568#comment-15484568 ] ASF GitHub Bot commented on TEPHRA-179: --- Github user chtyim commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/10#discussion_r78406517 --- Diff: tephra-core/src/main/java/org/apache/tephra/distributed/TransactionService.java --- @@ -153,4 +193,31 @@ protected void internalStop() { } } } + + private void undoRegister() { +if (cancelDiscovery != null) { + cancelDiscovery.cancel(); +} + } + + private void doRegister() { --- End diff -- `register`? > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb) 64 > -p: processes unlimited > -n: file descriptors 65536 > -v: address space (kb) unlimited > -w: locks unlimited > -e: scheduling priority0 > -r: real-time priority 0 > Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -XX:+UseConcMarkSweepGC -cp >
[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484555#comment-15484555 ] ASF GitHub Bot commented on TEPHRA-179: --- Github user chtyim commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/10#discussion_r78405767 --- Diff: tephra-core/src/main/java/org/apache/tephra/distributed/TransactionService.java --- @@ -42,28 +45,64 @@ import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; +import javax.annotation.Nullable; /** * */ -public final class TransactionService extends InMemoryTransactionService { +public class TransactionService extends AbstractService { --- End diff -- Add a javadoc about this class to tell what does it do and when it should be used. > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb) 64 > -p: processes unlimited > -n: file descriptors 65536 > -v: address space (kb) unlimited > -w: locks unlimited > -e: scheduling priority0 > -r: real-time priority 0 > Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -XX:+UseConcMarkSweepGC -cp >
[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478777#comment-15478777 ] ASF GitHub Bot commented on TEPHRA-179: --- GitHub user poornachandra opened a pull request: https://github.com/apache/incubator-tephra/pull/10 TEPHRA-179 Transaction service high availability changes Restructuring the Transaction Service classes to allow for HA restart while binding Transaction Manager and other classes as singletons You can merge this pull request into a Git repository by running: $ git pull https://github.com/poornachandra/incubator-tephra feature/tx-service-ha Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tephra/pull/10.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10 commit 4b2bfe6a6733440fa73c7f5e00ff499662387911 Author: poornaDate: 2016-09-09T23:44:22Z TEPHRA-179 Transaction service high availability changes commit 7efff83009675a512f39cf8b0e94d1c6a1cde20a Author: poorna Date: 2016-09-10T00:24:26Z Add HA test > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb) 64 > -p: processes unlimited > -n: file descriptors 65536 > -v: address space (kb) unlimited > -w: locks unlimited > -e: scheduling priority0 > -r: real-time priority 0 > Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -XX:+UseConcMarkSweepGC -cp >
[jira] [Commented] (TEPHRA-179) Tephra transaction manager breaks on zookeeper restart
[ https://issues.apache.org/jira/browse/TEPHRA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471730#comment-15471730 ] ASF GitHub Bot commented on TEPHRA-179: --- Github user poornachandra commented on a diff in the pull request: https://github.com/apache/incubator-tephra/pull/2#discussion_r77899348 --- Diff: tephra-core/src/main/java/org/apache/tephra/runtime/TransactionDistributedModule.java --- @@ -41,14 +41,15 @@ @Override protected void configure() { +// some of these classes need to be non-singleton in order to create a new instance during leader() in +// TransactionService bind(SnapshotCodecProvider.class).in(Singleton.class); - bind(TransactionStateStorage.class).annotatedWith(Names.named("persist")) - .to(HDFSTransactionStateStorage.class).in(Singleton.class); - bind(TransactionStateStorage.class).toProvider(TransactionStateStorageProvider.class).in(Singleton.class); + bind(TransactionStateStorage.class).annotatedWith(Names.named("persist")).to(HDFSTransactionStateStorage.class); + bind(TransactionStateStorage.class).toProvider(TransactionStateStorageProvider.class); -bind(TransactionManager.class).in(Singleton.class); - bind(TransactionSystemClient.class).to(TransactionServiceClient.class).in(Singleton.class); --- End diff -- Since `TransactionServiceClient` can contain pool of thrift clients, it is better to have it as a singleton. > Tephra transaction manager breaks on zookeeper restart > -- > > Key: TEPHRA-179 > URL: https://issues.apache.org/jira/browse/TEPHRA-179 > Project: Tephra > Issue Type: Bug > Components: manager >Affects Versions: 0.8.0-incubating > Environment: OpenJDK 8 (JDK) on Alpine Linux 3.4 in Docker >Reporter: Francis Chuang >Assignee: Ali Anwar > Fix For: 0.9.0-incubating > > > I am running HBase 1.2.2 with Phoenix 4.8.0 with the tephra transaction > server in 1 docker container. In another docker container, I have Zookeeper > 3.4.8 manage by Netflix Exhibitor. > When everything first starts, I am able to create transactional table and run > transactional queries. > However, once Exhibitor restarts zookeeper and tephra reconnects to > zookeeper, it no longer works correctly. > Running transactional queries result in this error: > {code} > Error: Error -1 (0) : Error while executing SQL "CREATE TABLE my_table321 > (k BIGINT PRIMARY KEY, v VARCHAR) TRANSACTIONAL=true": Remote driver error: > RuntimeException: java.lang.Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> Exception: Thrift error for > org.apache.tephra.distributed.TransactionServiceClient$2@2361d7ab: Internal > error processing startShort -> TApplicationException: Internal error > processing startShort > SQLState: 0 > ErrorCode: -1 > {code} > This is the full log: > {code} > Fri Sep 2 00:26:50 UTC 2016 Starting tephra service on > m9edd51-hmaster1.m9edd51 > -f: file size (blocks) unlimited > -t: cpu time (seconds) unlimited > -d: data seg size (kb) unlimited > -s: stack size (kb)8192 > -c: core file size (blocks)unlimited > -m: resident set size (kb) unlimited > -l: locked memory (kb) 64 > -p: processes unlimited > -n: file descriptors 65536 > -v: address space (kb) unlimited > -w: locks unlimited > -e: scheduling priority0 > -r: real-time priority 0 > Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -XX:+UseConcMarkSweepGC -cp >