Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60
Hi, the problem is gone. I did what you say :) Thanks! 2014-12-13 22:38 GMT+03:00 Serega Sheypak serega.shey...@gmail.com: Great, I'll refactor the code. and report back 2014-12-13 22:36 GMT+03:00 Stack st...@duboce.net: On Sat, Dec 13, 2014 at 11:33 AM, Serega Sheypak serega.shey...@gmail.com wrote: So the idea is 1. instantiate HConnection using HConnectionManager once 2. Create HTable instance for each Servlet.doPost and close after operation is done. Is that correct? Yes. Do region locations cached in this case? Are ZK connections to region/ZK reused? Yes Can I harm put/get data because of Servlet concurrency? Each connection could be used in many threads fro different HTable instances for the same HBase table? Its a bug if the above has issues. St.Ack 2014-12-13 22:21 GMT+03:00 lars hofhansl la...@apache.org: Note also that the createConnection part is somewhat expensive (creates a new thread pool for use with Puts, also does a ZK lookup, etc).If possible create the connection ahead of time and only get/close an HTable per request/thread. -- Lars From: Serega Sheypak serega.shey...@gmail.com To: user user@hbase.apache.org Sent: Friday, December 12, 2014 11:45 AM Subject: Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60 i have 10K doPost/doGet requests per second. Servlet is NOT single-threaded. each doPost/doGet invokes these lines (encapsulated in DAO): 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); and 24 } finally { 25table.close(); 26connection.close(); 27 } I assumed that this static construction 16 HConnection connection = HConnectionManager.createConnection(config); correctly handles multi-threaded access somewhere deep inside. Right now I don't understand what do I do wrong. Try to wrap each your request in Runnable to emulate multi-threaded pressure on ZK. Your code is linear and mine is not, it's concurrent. Thanks 2014-12-12 22:28 GMT+03:00 Stack st...@duboce.net: I cannot reproduce. I stood up a cdh5.2 server and then copy/pasted your code adding in a put for each cycle. I ran loop 1000 times and no complaint from zk. Tell me more (Is servlet doing single-threaded model? A single Configuration is being used or new ones are being created per servlet invocation? Below is code and output. (For better perf, cache the connection) St.Ack 1 package org.apache.hadoop.hbase; 2 3 import java.io.IOException; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.hbase.client.HConnection; 7 import org.apache.hadoop.hbase.client.HConnectionManager; 8 import org.apache.hadoop.hbase.client.HTableInterface; 9 import org.apache.hadoop.hbase.client.Put; 10 import org.apache.hadoop.hbase.util.Bytes; 11 12 public class TestConnnection { 13 public static void main(String[] args) throws IOException { 14Configuration config = HBaseConfiguration.create(); 15for (int i = 0; i 1000; i++) { 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); 18 byte [] cf = Bytes.toBytes(t); 19 try { 20byte [] bytes = Bytes.toBytes(i); 21Put p = new Put(bytes); 22p.add(cf, cf, bytes); 23table.put(p); 24 } finally { 25table.close(); 26connection.close(); 27 } 28 System.out.println(i= + i); 29} 30 } 31 } 2014-12-12 11:26:10,397 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x70dfa475 connecting to ZooKeeper ensemble=localhost:2181 2014-12-12 11:26:10,397 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=9 watcher=hconnection-0x70dfa475, quorum=localhost:2181, baseZNode=/hbase 2014-12-12 11:26:10,398 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/ 127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2014-12-12 11:26:10,398 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/ 127.0.0.1:2181, initiating session 2014-12-12 11:26:10,401 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/ 127.0.0.1:2181,
Region Server Thread with a Single High Idle CPU
I've done a bit of digging and hope someone can shed some light on my particular issue. One and Only One of my region servers after each restart is randomly Plagued with a single maxed out CPU-Core and a Read Request chart registering around 40k read requests per second. The remaining 13 dance around 2-5% and sit promptly at 0 reads per/s. Is this normal? And Why? Cluster Stats: 1 Master/14 Nodes Quad\Dual-Core 2.8-3.0Ghz 6-8gb ram each with a single 500Gb Drive Table Stats: ~150 Tables Total Split into 590 Regions 1 table w/ 2.5Billion Rows 500 Columns 4 tables w/ 60-250 Million Rows 1000 Columns ~145 tables w/ 100K Rows 25 Columns Standard-Key-Template: abcd123456 Versions: CDH: 5.1.2-1.cdh5.1.2.p0.3 HBase Version: 0.98.1-cdh5.1.2, rUnknown HBase Compiled: Mon Aug 25 19:33:59 PDT 2014, jenkins Hadoop Version: 2.3.0-cdh5.1.2, r8e266e052e423af592871e2dfe09d54c03f6a0e8 Hadoop Compiled: 2014-08-26T01:36Z, jenkins Cheers, Jon -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Region-Server-Thread-with-a-Single-High-Idle-CPU-tp4066876.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Region Server Thread with a Single High Idle CPU
Sounds like your access patterns are not balanced well, you have a hotspot. Have a look at the metrics emitted from that machine. It will tell you which region is winning the popularity contest. On Monday, December 15, 2014, uamadman uamadm...@gmail.com wrote: I've done a bit of digging and hope someone can shed some light on my particular issue. One and Only One of my region servers after each restart is randomly Plagued with a single maxed out CPU-Core and a Read Request chart registering around 40k read requests per second. The remaining 13 dance around 2-5% and sit promptly at 0 reads per/s. Is this normal? And Why? Cluster Stats: 1 Master/14 Nodes Quad\Dual-Core 2.8-3.0Ghz 6-8gb ram each with a single 500Gb Drive Table Stats: ~150 Tables Total Split into 590 Regions 1 table w/ 2.5Billion Rows 500 Columns 4 tables w/ 60-250 Million Rows 1000 Columns ~145 tables w/ 100K Rows 25 Columns Standard-Key-Template: abcd123456 Versions: CDH: 5.1.2-1.cdh5.1.2.p0.3 HBase Version: 0.98.1-cdh5.1.2, rUnknown HBase Compiled: Mon Aug 25 19:33:59 PDT 2014, jenkins Hadoop Version: 2.3.0-cdh5.1.2, r8e266e052e423af592871e2dfe09d54c03f6a0e8 Hadoop Compiled: 2014-08-26T01:36Z, jenkins Cheers, Jon -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Region-Server-Thread-with-a-Single-High-Idle-CPU-tp4066876.html Sent from the HBase User mailing list archive at Nabble.com.
Upcoming meetups: Jan+Feb 2015
On January 15th, we're meeting at AppDynamics in San Francisco. We have some nice talks linked up [1]. On Feb 17th, lets meet around Strata+Hadoop World in San Jose. If you are interested in hosting or speaking, write the organizers. Thanks, St.Ack 1. http://www.meetup.com/hbaseusergroup/events/218744798/ 2. http://www.meetup.com/hbaseusergroup/events/219260093/
0.94 going forward
Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars
Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60
Excellent! Should be quite a bit faster too. -- Lars From: Serega Sheypak serega.shey...@gmail.com To: user user@hbase.apache.org Cc: lars hofhansl la...@apache.org Sent: Monday, December 15, 2014 5:57 AM Subject: Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60 Hi, the problem is gone. I did what you say :) Thanks! 2014-12-13 22:38 GMT+03:00 Serega Sheypak serega.shey...@gmail.com: Great, I'll refactor the code. and report back 2014-12-13 22:36 GMT+03:00 Stack st...@duboce.net: On Sat, Dec 13, 2014 at 11:33 AM, Serega Sheypak serega.shey...@gmail.com wrote: So the idea is 1. instantiate HConnection using HConnectionManager once 2. Create HTable instance for each Servlet.doPost and close after operation is done. Is that correct? Yes. Do region locations cached in this case? Are ZK connections to region/ZK reused? Yes Can I harm put/get data because of Servlet concurrency? Each connection could be used in many threads fro different HTable instances for the same HBase table? Its a bug if the above has issues. St.Ack 2014-12-13 22:21 GMT+03:00 lars hofhansl la...@apache.org: Note also that the createConnection part is somewhat expensive (creates a new thread pool for use with Puts, also does a ZK lookup, etc).If possible create the connection ahead of time and only get/close an HTable per request/thread. -- Lars From: Serega Sheypak serega.shey...@gmail.com To: user user@hbase.apache.org Sent: Friday, December 12, 2014 11:45 AM Subject: Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60 i have 10K doPost/doGet requests per second. Servlet is NOT single-threaded. each doPost/doGet invokes these lines (encapsulated in DAO): 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); and 24 } finally { 25 table.close(); 26 connection.close(); 27 } I assumed that this static construction 16 HConnection connection = HConnectionManager.createConnection(config); correctly handles multi-threaded access somewhere deep inside. Right now I don't understand what do I do wrong. Try to wrap each your request in Runnable to emulate multi-threaded pressure on ZK. Your code is linear and mine is not, it's concurrent. Thanks 2014-12-12 22:28 GMT+03:00 Stack st...@duboce.net: I cannot reproduce. I stood up a cdh5.2 server and then copy/pasted your code adding in a put for each cycle. I ran loop 1000 times and no complaint from zk. Tell me more (Is servlet doing single-threaded model? A single Configuration is being used or new ones are being created per servlet invocation? Below is code and output. (For better perf, cache the connection) St.Ack 1 package org.apache.hadoop.hbase; 2 3 import java.io.IOException; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.hbase.client.HConnection; 7 import org.apache.hadoop.hbase.client.HConnectionManager; 8 import org.apache.hadoop.hbase.client.HTableInterface; 9 import org.apache.hadoop.hbase.client.Put; 10 import org.apache.hadoop.hbase.util.Bytes; 11 12 public class TestConnnection { 13 public static void main(String[] args) throws IOException { 14 Configuration config = HBaseConfiguration.create(); 15 for (int i = 0; i 1000; i++) { 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); 18 byte [] cf = Bytes.toBytes(t); 19 try { 20 byte [] bytes = Bytes.toBytes(i); 21 Put p = new Put(bytes); 22 p.add(cf, cf, bytes); 23 table.put(p); 24 } finally { 25 table.close(); 26 connection.close(); 27 } 28 System.out.println(i= + i); 29 } 30 } 31 } 2014-12-12 11:26:10,397 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x70dfa475 connecting to ZooKeeper ensemble=localhost:2181 2014-12-12 11:26:10,397 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=9 watcher=hconnection-0x70dfa475, quorum=localhost:2181, baseZNode=/hbase 2014-12-12 11:26:10,398 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/ 127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
Re: 0.94 going forward
given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars
Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60
was: 200-300 ms per request now: 80 ms request request=full trip from servlet to HBase and back to response. 2014-12-15 22:40 GMT+03:00 lars hofhansl la...@apache.org: Excellent! Should be quite a bit faster too. -- Lars From: Serega Sheypak serega.shey...@gmail.com To: user user@hbase.apache.org Cc: lars hofhansl la...@apache.org Sent: Monday, December 15, 2014 5:57 AM Subject: Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60 Hi, the problem is gone. I did what you say :) Thanks! 2014-12-13 22:38 GMT+03:00 Serega Sheypak serega.shey...@gmail.com: Great, I'll refactor the code. and report back 2014-12-13 22:36 GMT+03:00 Stack st...@duboce.net: On Sat, Dec 13, 2014 at 11:33 AM, Serega Sheypak serega.shey...@gmail.com wrote: So the idea is 1. instantiate HConnection using HConnectionManager once 2. Create HTable instance for each Servlet.doPost and close after operation is done. Is that correct? Yes. Do region locations cached in this case? Are ZK connections to region/ZK reused? Yes Can I harm put/get data because of Servlet concurrency? Each connection could be used in many threads fro different HTable instances for the same HBase table? Its a bug if the above has issues. St.Ack 2014-12-13 22:21 GMT+03:00 lars hofhansl la...@apache.org: Note also that the createConnection part is somewhat expensive (creates a new thread pool for use with Puts, also does a ZK lookup, etc).If possible create the connection ahead of time and only get/close an HTable per request/thread. -- Lars From: Serega Sheypak serega.shey...@gmail.com To: user user@hbase.apache.org Sent: Friday, December 12, 2014 11:45 AM Subject: Re: HConnectionManager leaks with zookeeper conection oo many connections from /my.tomcat.server.com - max is 60 i have 10K doPost/doGet requests per second. Servlet is NOT single-threaded. each doPost/doGet invokes these lines (encapsulated in DAO): 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); and 24 } finally { 25table.close(); 26connection.close(); 27 } I assumed that this static construction 16 HConnection connection = HConnectionManager.createConnection(config); correctly handles multi-threaded access somewhere deep inside. Right now I don't understand what do I do wrong. Try to wrap each your request in Runnable to emulate multi-threaded pressure on ZK. Your code is linear and mine is not, it's concurrent. Thanks 2014-12-12 22:28 GMT+03:00 Stack st...@duboce.net: I cannot reproduce. I stood up a cdh5.2 server and then copy/pasted your code adding in a put for each cycle. I ran loop 1000 times and no complaint from zk. Tell me more (Is servlet doing single-threaded model? A single Configuration is being used or new ones are being created per servlet invocation? Below is code and output. (For better perf, cache the connection) St.Ack 1 package org.apache.hadoop.hbase; 2 3 import java.io.IOException; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.hbase.client.HConnection; 7 import org.apache.hadoop.hbase.client.HConnectionManager; 8 import org.apache.hadoop.hbase.client.HTableInterface; 9 import org.apache.hadoop.hbase.client.Put; 10 import org.apache.hadoop.hbase.util.Bytes; 11 12 public class TestConnnection { 13 public static void main(String[] args) throws IOException { 14Configuration config = HBaseConfiguration.create(); 15for (int i = 0; i 1000; i++) { 16 HConnection connection = HConnectionManager.createConnection(config); 17 HTableInterface table = connection.getTable(TableName.valueOf(table1)); 18 byte [] cf = Bytes.toBytes(t); 19 try { 20byte [] bytes = Bytes.toBytes(i); 21Put p = new Put(bytes); 22p.add(cf, cf, bytes); 23table.put(p); 24 } finally { 25table.close(); 26connection.close(); 27 } 28 System.out.println(i= + i); 29} 30 } 31 } 2014-12-12 11:26:10,397 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x70dfa475 connecting to ZooKeeper ensemble=localhost:2181 2014-12-12 11:26:10,397 INFO [main] zookeeper.ZooKeeper: Initiating client connection,
Re: Region Server Thread with a Single High Idle CPU
This is the table that stores information about all the tables. It is normal when a cluster is recovering for reads to be high on this table while all the table information is being loaded into the regionservers. http://hbase.apache.org/book/arch.catalog.html -Pere On Mon, Dec 15, 2014 at 12:21 PM, uamadman uamadm...@gmail.com wrote: hbase:meta,,1.1588230740620282632 This is the Offending Table. I currently do not know what this means yet. But my basic understanding it is probably an Index of sorts. Jon -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Region-Server-Thread-with-a-Single-High-Idle-CPU-tp4066876p4066888.html Sent from the HBase User mailing list archive at Nabble.com.
Re: 0.94 going forward
Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars
Re: Region Server Thread with a Single High Idle CPU
Meta is getting pegged? Sounds like your client applications are not being friendly. Are you reusing cluster configurations? You should have one per process for its lifetime. Basically, how often are you calling HConnectionFactory.createConnection() ? On Mon, Dec 15, 2014 at 12:30 PM, Pere Kyle p...@whisper.sh wrote: This is the table that stores information about all the tables. It is normal when a cluster is recovering for reads to be high on this table while all the table information is being loaded into the regionservers. http://hbase.apache.org/book/arch.catalog.html -Pere On Mon, Dec 15, 2014 at 12:21 PM, uamadman uamadm...@gmail.com wrote: hbase:meta,,1.1588230740620282632 This is the Offending Table. I currently do not know what this means yet. But my basic understanding it is probably an Index of sorts. Jon -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Region-Server-Thread-with-a-Single-High-Idle-CPU-tp4066876p4066888.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Region Server Thread with a Single High Idle CPU
Minor correction: HConnectionFactory is in the upcoming 1.0 release. How often is HConnectionManager.createConnection() called ? Cheers On Mon, Dec 15, 2014 at 3:36 PM, Nick Dimiduk ndimi...@gmail.com wrote: Meta is getting pegged? Sounds like your client applications are not being friendly. Are you reusing cluster configurations? You should have one per process for its lifetime. Basically, how often are you calling HConnectionFactory.createConnection() ? On Mon, Dec 15, 2014 at 12:30 PM, Pere Kyle p...@whisper.sh wrote: This is the table that stores information about all the tables. It is normal when a cluster is recovering for reads to be high on this table while all the table information is being loaded into the regionservers. http://hbase.apache.org/book/arch.catalog.html -Pere On Mon, Dec 15, 2014 at 12:21 PM, uamadman uamadm...@gmail.com wrote: hbase:meta,,1.1588230740620282632 This is the Offending Table. I currently do not know what this means yet. But my basic understanding it is probably an Index of sorts. Jon -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Region-Server-Thread-with-a-Single-High-Idle-CPU-tp4066876p4066888.html Sent from the HBase User mailing list archive at Nabble.com.
Trying to import data
I am trying to import data into HBase table and tried the following as an example, bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,b,c datatsv hdfs://data.tsv - this command complains about data.tsv not existing in HDFS, when I run hadoop fs -ls , I do see the file. Then I am trying the following with the data.tsv as a local file bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,b,c datatsv /home/hadoop/BigdataEDW/data.tsv and get the following, I tried both without and with HBase table datatsv created. 2014-12-15 17:52:25,449 INFO [main] client.RMProxy: Connecting to ResourceManager at rtr-dev-spark4/10.153.24.132:8032 2014-12-15 17:52:25,559 INFO [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2014-12-15 17:52:25,562 INFO [main] Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 2014-12-15 17:52:25,562 INFO [main] Configuration.deprecation: mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max 2014-12-15 17:52:25,565 INFO [main] Configuration.deprecation: dfs.permissions is deprecated. Instead, use dfs.permissions.enabled 2014-12-15 17:52:25,567 INFO [main] Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 2014-12-15 17:52:25,569 INFO [main] Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 2014-12-15 17:52:25,571 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2014-12-15 17:52:25,604 INFO [main] mapreduce.TableOutputFormat: Created table instance for datatsv 2014-12-15 17:52:26,806 INFO [main] ipc.Client: Retrying connect to server: rtr-dev-spark4/10.153.24.132:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2014-12-15 17:52:27,809 INFO [main] ipc.Client: Retrying connect to server: rtr-dev-spark4/10.153.24.132:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) Help is appreciated. Thanks, Latha This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: 0.94 going forward
Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars
Re: 0.94 going forward
Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: 0.94 going forward
Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: 0.94 going forward
Does replication and snapshot export work from 0.94.6+ to a 0.96 or 0.98 cluster? Presuming it does, shouldn't a site be able to use a multiple cluster set up to do a cut over of a client application? That doesn't help with needing downtime for to do the eventual upgrade, but it mitigates the impact on the downstream app. -- Sean On Dec 15, 2014 6:51 PM, Jeremy Carroll phobos...@gmail.com wrote: Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
RE: 0.94 going forward
Thanks, Lars. We have customers still using 94. It is indeed stable now. Jieshan. From: Sean Busbey [bus...@cloudera.com] Sent: Tuesday, December 16, 2014 9:04 AM To: user Subject: Re: 0.94 going forward Does replication and snapshot export work from 0.94.6+ to a 0.96 or 0.98 cluster? Presuming it does, shouldn't a site be able to use a multiple cluster set up to do a cut over of a client application? That doesn't help with needing downtime for to do the eventual upgrade, but it mitigates the impact on the downstream app. -- Sean On Dec 15, 2014 6:51 PM, Jeremy Carroll phobos...@gmail.com wrote: Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: 0.94 going forward
Replication from 0.94 to +0.96 wont work out of the box unless you use a bridge: HBASE-9360 Maybe before EOL'ing 0.94 we should ship it as part of 0.94. cheers, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 5:23 PM, Bijieshan bijies...@huawei.com wrote: Thanks, Lars. We have customers still using 94. It is indeed stable now. Jieshan. From: Sean Busbey [bus...@cloudera.com] Sent: Tuesday, December 16, 2014 9:04 AM To: user Subject: Re: 0.94 going forward Does replication and snapshot export work from 0.94.6+ to a 0.96 or 0.98 cluster? Presuming it does, shouldn't a site be able to use a multiple cluster set up to do a cut over of a client application? That doesn't help with needing downtime for to do the eventual upgrade, but it mitigates the impact on the downstream app. -- Sean On Dec 15, 2014 6:51 PM, Jeremy Carroll phobos...@gmail.com wrote: Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: 0.94 going forward
Nope :(Replication uses RPC and that was changed to protobufs. AFAIK snapshots can also not be exported from 0.94 and 0.98. We have a really shitty story here. From: Sean Busbey bus...@cloudera.com To: user user@hbase.apache.org Sent: Monday, December 15, 2014 5:04 PM Subject: Re: 0.94 going forward Does replication and snapshot export work from 0.94.6+ to a 0.96 or 0.98 cluster? Presuming it does, shouldn't a site be able to use a multiple cluster set up to do a cut over of a client application? That doesn't help with needing downtime for to do the eventual upgrade, but it mitigates the impact on the downstream app. -- Sean On Dec 15, 2014 6:51 PM, Jeremy Carroll phobos...@gmail.com wrote: Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: 0.94 going forward
Yep. That's also why I've doing 0.94 release all this time. 0.92 had a no-downtime path to 0.94. And 0.96 had a no downtime path to 0.98. So both could be EOL'ed with relatively little annoyance.0.94 is different as going to 0.96 or later (including 0.98) is a big change and requires downtime. If there's a desire I'm happy to continue to do 0.94 releases for a while. Backporting fixes will become more tedious over time, though, as the code lines diverge as time goes on.Maybe a bigger push could be to add a zero-downtime story (somehow) to 0.94, release that in a last release, and then EOL 0.94. Maybe we can brainstorm how a zero downtime cut over can be done. It's tricky from multiple angles:- replication between 0.94 and 0.98 does not work (there's a gateway process that supposedly does that, but it's not known to be very reliable)- snapshots cannot be exported from 0.94 and 0.98 (please correct me if I'm wrong here)- clients cannot load the 0.94 and 0.98 client at the same into the same JVM (classloader to be specific, so if have OSGi you might be OK) I'd be happy to hear what design you have in mind. Thrift should still work between both version in the same way - we'd need to confirm this.Matteo, if you're listening... Do you have an inkling about how hard it would be to export a 0.94 snapshot to 0.98? -- Lars From: Jeremy Carroll phobos...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, December 15, 2014 4:49 PM Subject: Re: 0.94 going forward Which is why I feel that a lot of customers are still on 0.94. Pretty much trapped unless you want to take downtime for your site. Any type of guidance would be helpful. We are currently in the process of designing our own system to deal with this. On Mon, Dec 15, 2014 at 4:47 PM, Andrew Purtell apurt...@apache.org wrote: Zero downtime upgrade from 0.94 won't be possible. See http://hbase.apache.org/book.html#d0e5199 On Mon, Dec 15, 2014 at 4:44 PM, Jeremy Carroll phobos...@gmail.com wrote: Looking for guidance on how to do a zero downtime upgrade from 0.94 - 0.98 (or 1.0 if it launches soon). As soon as we can figure this out, we will migrate over. On Mon, Dec 15, 2014 at 1:37 PM, Esteban Gutierrez este...@cloudera.com wrote: Hi Lars, Thanks for bringing this for discussion. From my experience I can tell that 0.94 is very stable but that shouldn't be a blocker to consider to EOL'ing. Are you considering any specific timeframe for that? thanks, esteban. -- Cloudera, Inc. On Mon, Dec 15, 2014 at 11:46 AM, Koert Kuipers ko...@tresata.com wrote: given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our clients the majority is on 0.94 (versus 0.96 and up). so i am going with 1), its very stable! On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl la...@apache.org wrote: Over the past few months the rate of the change into 0.94 has slowed significantly. 0.94.25 was released on Nov 15th, and since then we had only 4 changes. This could mean two things: (1) 0.94 is very stable now or (2) nobody is using it (at least nobody is contributing to it anymore). If anybody out there is still using 0.94 and is not planning to upgrade to 0.98 or later soon (which will required downtime), please speak up. Otherwise it might be time to think about EOL'ing 0.94. It's not actually much work to do these releases, especially when they are so small, but I'd like to continue only if they are actually used. In any case, I am going to spin 0.94.26 with the current 4 fixes today or tomorrow. -- Lars -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)