[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201366#comment-16201366 ] Christopher Tubbs commented on ACCUMULO-4561: - That's interesting. The thrift client should have a more sane failure behavior when the TCP response is not in the expected protocol. Jetty is detecting the protocol mismatch, and reacting reasonably using the only protocol it knows; there's no reason Thrift shouldn't behave similarly. For a client, the response should be to throw an Exception if the protocol doesn't match. For a server, it should respond with an error message in its native protocol, like Jetty did with the HTTP 400 error. This might be something we need to escalate to Thrift upstream developers, if it's not something already built in to the Thrift client that we're not handling properly. The next step should be to check to see if our client code is failing to properly handle a relevant exception coming from the Thrift library. > Crash when using ping on a non-existing server > -- > > Key: ACCUMULO-4561 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4561 > Project: Accumulo > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Luis Tavarez > > While working on ACCUMULO-4558, I tried running > {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver > on my setup.) > And it caused the shell to exit (crashed) and show the following message. > {code}# > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 25561"... > /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed > "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p > "$ACCUMULO_PASSWORD" "${@:2}" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken
[ https://issues.apache.org/jira/browse/ACCUMULO-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201026#comment-16201026 ] Michael Miller commented on ACCUMULO-4718: -- Well not quite... the accumulo-testing client appears to be configured properly but when it calls TableOperations().importDirectory on BulkPlusOne, it is out of sync. The call to TableOperaions causes a FileNotFoundException when importDirectory() checks the path against the Hadoop FileSystem that it pulls from CachedConfiguration.getInstance(). The fs I have set in accumulo-testing client returns "hdfs://localhost:8020" while the FileSystem in TableOperations returns "file:///". Here is the stacktrace: {code} 2017-10-11 14:14:51,830 [randomwalk.bulk.BulkPlusOne] ERROR: org.apache.accumulo.core.client.AccumuloException: Bulk import directory /tmp/bulk_f96ee0a1-e125-4b00-acf2-10a774cdfe57 does not exist! org.apache.accumulo.core.client.AccumuloException: Bulk import directory /tmp/bulk_f96ee0a1-e125-4b00-acf2-10a774cdfe57 does not exist! at org.apache.accumulo.core.client.impl.TableOperationsImpl.checkPath(TableOperationsImpl.java:1101) at org.apache.accumulo.core.client.impl.TableOperationsImpl.importDirectory(TableOperationsImpl.java:1123) at org.apache.accumulo.testing.core.randomwalk.bulk.BulkPlusOne.bulkLoadLots(BulkPlusOne.java:102) at org.apache.accumulo.testing.core.randomwalk.bulk.BulkPlusOne.runLater(BulkPlusOne.java:120) at org.apache.accumulo.testing.core.randomwalk.bulk.BulkTest.lambda$visit$0(BulkTest.java:31) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) at java.lang.Thread.run(Thread.java:748) {code} Meanwhile, if I connect to the Uno instance I have running, I can see that directory and its contents. > Accumulo-testing classes are broken > --- > > Key: ACCUMULO-4718 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4718 > Project: Accumulo > Issue Type: Bug >Reporter: Michael Miller >Assignee: Michael Miller > Fix For: 2.0.0 > > > Multiple changes to 2.0 over the past 6 months or so have been left out of > the accumulo-testing repo. Update the testing classes with these changes so > they can be run against 2.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ACCUMULO-4170) ClientConfiguration javadoc difficult to read
[ https://issues.apache.org/jira/browse/ACCUMULO-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Owens reassigned ACCUMULO-4170: Assignee: Mark Owens > ClientConfiguration javadoc difficult to read > - > > Key: ACCUMULO-4170 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4170 > Project: Accumulo > Issue Type: Bug > Components: client, docs >Affects Versions: 1.7.1 >Reporter: Mike Drob >Assignee: Mark Owens >Priority: Trivial > Labels: newbie > > The docs displayed on > https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/ClientConfiguration.html#loadDefault%28%29 > are difficult to read because the list is displayed in-line. We could use > proper list formatting to improve readability. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] joshelser commented on issue #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on issue #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#issuecomment-335938107 No problem, dude. Thanks for the contribution! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics
adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#issuecomment-335930968 @joshelser Thanks, if it wouldn't be too much of a hassle. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken
[ https://issues.apache.org/jira/browse/ACCUMULO-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200722#comment-16200722 ] Michael Miller commented on ACCUMULO-4718: -- I solved the configuration issue... conf/accumulo-testing.properties requires proper setting of # HDFS root path. Should match 'fs.defaultFS' property in Hadoop's core-site.xml test.common.hdfs.root=hdfs://localhost:8020 It looks like this requirement could be eliminated with completion of ACCUMULO-4717 > Accumulo-testing classes are broken > --- > > Key: ACCUMULO-4718 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4718 > Project: Accumulo > Issue Type: Bug >Reporter: Michael Miller >Assignee: Michael Miller > Fix For: 2.0.0 > > > Multiple changes to 2.0 over the past 6 months or so have been left out of > the accumulo-testing repo. Update the testing classes with these changes so > they can be run against 2.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics
adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#issuecomment-335895480 @joshelser Back at you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425 ] Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 5:51 PM: These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {noformat} TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.21.v20170918 44263/tcp open httpJetty 6.1.26 50070/tcp open httpJetty 6.1.26 50090/tcp open httpJetty 6.1.26 >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 7693"... >>> Killed This port returned a different response after a timeout: 50075/tcp open httpJetty 6.1.26 >>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 >>> remote=localhost/127.0.0.1:50075] {noformat} I have no feel for how often the 'ping -ts ' command is run and how often it would be provided an invalid port? I would assume a user would only supply a port if they suspected it to be a tserver. Given that case this situation would not happen very often, I suspect. I also noticed that if I stop the tablet servers after I'm in the shell and then run the ping command, the shell never returns the prompt to the user. Ctrl-C'ing at that points exits the shell as well. I would think that should be fixed since the purpose of the ping is to retrieve the status of a tablet server. Has that behavior been documented and/or verified previously? was (Author: jmark99): These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open
[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200641#comment-16200641 ] Mark Owens commented on ACCUMULO-4561: -- A little more info: The monitor.log returns the following warning when the ping command targets a Jetty port: {noformat} 2017-10-11 13:09:25,039 [http.HttpParser] WARN : Illegal character 0x0 in state=START for buffer HeapByteBuffer@3249e354[p=1,l=179,c=8192,r=178]={\x00<<<\x00\x00\xAf\x82!\x01\x15getTabletS...d92138d\x00,\x16\x00\x16\x00\x00\x00>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2017-10-11 13:09:25,039 [http.HttpParser] WARN : bad HTTP parsed: 400 Illegal character 0x0 for HttpChannelOverHttp@4676385e{r=0,c=false,a=IDLE,uri=null} {noformat} The returned HTTP payload (via wireshark) is: {noformat} 48 54 54 50 2f 31 2e 31 20 34 30 30 20 49 6c 6c HTTP/1.1 400 Ill 0010 65 67 61 6c 20 63 68 61 72 61 63 74 65 72 20 30 egal character 0 0020 78 30 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 x0..Content-Leng 0030 74 68 3a 20 30 0d 0a 43 6f 6e 6e 65 63 74 69 6f th: 0..Connectio 0040 6e 3a 20 63 6c 6f 73 65 0d 0a 53 65 72 76 65 72 n: close..Server 0050 3a 20 4a 65 74 74 79 28 39 2e 33 2e 32 31 2e 76 : Jetty(9.3.21.v 0060 32 30 31 37 30 39 31 38 29 0d 0a 0d 0a 20170918) {noformat} > Crash when using ping on a non-existing server > -- > > Key: ACCUMULO-4561 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4561 > Project: Accumulo > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Luis Tavarez > > While working on ACCUMULO-4558, I tried running > {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver > on my setup.) > And it caused the shell to exit (crashed) and show the following message. > {code}# > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 25561"... > /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed > "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p > "$ACCUMULO_PASSWORD" "${@:2}" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200561#comment-16200561 ] Keith Turner commented on ACCUMULO-4561: The ping command makes a thrift connection to the destination. In this case the destination does not speak thrift and send something back, not sure what. I suspect the response from Jetty is read by the thrift code and used to try and allocate a byte array which exceeds available memory. My memory is fuzzy on this, but I think on the server side in Accumulo we use the Thrift framed transport and limit the maximum byte array it will allocate. I think this is configurable. I am not sure if this can be done on the thrift client side though. > Crash when using ping on a non-existing server > -- > > Key: ACCUMULO-4561 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4561 > Project: Accumulo > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Luis Tavarez > > While working on ACCUMULO-4558, I tried running > {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver > on my setup.) > And it caused the shell to exit (crashed) and show the following message. > {code}# > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 25561"... > /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed > "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p > "$ACCUMULO_PASSWORD" "${@:2}" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144061407 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() { protected int getMaxReplicationThreads() { return replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo()); } + + protected void addReplicationQueueTimeMetrics() { +// Exit early if replication table is offline +if (TableState.ONLINE != Tables.getTableState(master.getInstance(), ReplicationTable.ID)) { + return; +} + +// Exit early if we have no replication peers configured +if (replicationUtil.getPeers().isEmpty()) { + return; +} + +Set paths = replicationUtil.getPendingReplicationPaths(); + +// We'll take a snap of the current time and use this as a diff between any deleted +// file's modification time and now. The reported latency will be off by at most a +// number of seconds equal to the metric polling period +long currentTime = System.currentTimeMillis(); + +// Iterate through all the pending paths and update the mod time if we don't know it yet +for (Path path : paths) { + if (!pathModTimes.containsKey(path)) { +try { + pathModTimes.put(path, master.getFileSystem().getFileStatus(path).getModificationTime()); +} catch (IOException e) { + // Ignore all IOExceptions + // Either the system is unavailable or the file was deleted + // since the initial scan and this check +} + } +} + +// Remove all currently pending files +Set deletedPaths = new HashSet<>(pathModTimes.keySet()); +deletedPaths.removeAll(paths); + +// Exit early if we have no replicated files to report on +if (deletedPaths.isEmpty()) { + return; +} + +replicationQueueTimeStat.resetMinMax(); + +for (Path path : deletedPaths) { + // Remove this path and add the latency + long modTime = pathModTimes.remove(path); Review comment: It's just the unboxing you really have to worry about. Something like the following would be fine: ```java Long modTime = pathModTimes.remove(path); if (modTime != null) { ... } ``` Java will handle the unboxing for you once you guaranteed that `modTime` is definitely non-null. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] adamjshook commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
adamjshook commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144060426 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() { protected int getMaxReplicationThreads() { return replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo()); } + + protected void addReplicationQueueTimeMetrics() { +// Exit early if replication table is offline +if (TableState.ONLINE != Tables.getTableState(master.getInstance(), ReplicationTable.ID)) { + return; +} + +// Exit early if we have no replication peers configured +if (replicationUtil.getPeers().isEmpty()) { + return; +} + +Set paths = replicationUtil.getPendingReplicationPaths(); + +// We'll take a snap of the current time and use this as a diff between any deleted +// file's modification time and now. The reported latency will be off by at most a +// number of seconds equal to the metric polling period +long currentTime = System.currentTimeMillis(); + +// Iterate through all the pending paths and update the mod time if we don't know it yet +for (Path path : paths) { + if (!pathModTimes.containsKey(path)) { +try { + pathModTimes.put(path, master.getFileSystem().getFileStatus(path).getModificationTime()); +} catch (IOException e) { + // Ignore all IOExceptions + // Either the system is unavailable or the file was deleted + // since the initial scan and this check +} + } +} + +// Remove all currently pending files +Set deletedPaths = new HashSet<>(pathModTimes.keySet()); +deletedPaths.removeAll(paths); + +// Exit early if we have no replicated files to report on +if (deletedPaths.isEmpty()) { + return; +} + +replicationQueueTimeStat.resetMinMax(); + +for (Path path : deletedPaths) { + // Remove this path and add the latency + long modTime = pathModTimes.remove(path); Review comment: I was actually thinking how best to handle this. It'd be an implementation error if the path to be removed wasn't in `pathModTimes`, since `deletedPaths` is a subset of `pathModTimes`. Maybe check for existence and log an error if it doesn't exist? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144058257 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() { protected int getMaxReplicationThreads() { return replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo()); } + + protected void addReplicationQueueTimeMetrics() { +// Exit early if replication table is offline +if (TableState.ONLINE != Tables.getTableState(master.getInstance(), ReplicationTable.ID)) { + return; +} + +// Exit early if we have no replication peers configured +if (replicationUtil.getPeers().isEmpty()) { + return; +} + +Set paths = replicationUtil.getPendingReplicationPaths(); + +// We'll take a snap of the current time and use this as a diff between any deleted +// file's modification time and now. The reported latency will be off by at most a +// number of seconds equal to the metric polling period +long currentTime = System.currentTimeMillis(); + +// Iterate through all the pending paths and update the mod time if we don't know it yet +for (Path path : paths) { + if (!pathModTimes.containsKey(path)) { +try { + pathModTimes.put(path, master.getFileSystem().getFileStatus(path).getModificationTime()); +} catch (IOException e) { + // Ignore all IOExceptions + // Either the system is unavailable or the file was deleted + // since the initial scan and this check +} + } +} + +// Remove all currently pending files +Set deletedPaths = new HashSet<>(pathModTimes.keySet()); +deletedPaths.removeAll(paths); + +// Exit early if we have no replicated files to report on +if (deletedPaths.isEmpty()) { + return; +} + +replicationQueueTimeStat.resetMinMax(); + +for (Path path : deletedPaths) { + // Remove this path and add the latency + long modTime = pathModTimes.remove(path); Review comment: This will throw an error is `pathModTimes` ever does not have a mapping for `path` because you're unboxing the `Long` into a `long`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144058388 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() { protected int getMaxReplicationThreads() { return replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo()); } + + protected void addReplicationQueueTimeMetrics() { Review comment: Could you add a simple test case for this, please? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144058516 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -26,38 +29,51 @@ import org.apache.accumulo.master.Master; import org.apache.accumulo.server.metrics.Metrics; import org.apache.accumulo.server.replication.ReplicationUtil; +import org.apache.hadoop.fs.Path; import org.apache.hadoop.metrics2.MetricsCollector; import org.apache.hadoop.metrics2.MetricsRecordBuilder; import org.apache.hadoop.metrics2.MetricsSource; import org.apache.hadoop.metrics2.MetricsSystem; import org.apache.hadoop.metrics2.lib.Interns; import org.apache.hadoop.metrics2.lib.MetricsRegistry; +import org.apache.hadoop.metrics2.lib.MutableQuantiles; +import org.apache.hadoop.metrics2.lib.MutableStat; /** * */ public class Metrics2ReplicationMetrics implements Metrics, MetricsSource { public static final String NAME = MASTER_NAME + ",sub=Replication", DESCRIPTION = "Data-Center Replication Metrics", CONTEXT = "master", RECORD = "MasterReplication"; - public static final String PENDING_FILES = "filesPendingReplication", NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads"; + public static final String PENDING_FILES = "filesPendingReplication", NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads", + REPLICATION_QUEUE_TIME_QUANTILES = "replicationQueue10m", REPLICATION_QUEUE_TIME = "replicationQueue"; private final Master master; private final MetricsSystem system; private final MetricsRegistry registry; private final ReplicationUtil replicationUtil; + private final MutableQuantiles replicationQueueTimeQuantiles; + private final MutableStat replicationQueueTimeStat; + private final Map pathModTimes; Metrics2ReplicationMetrics(Master master, MetricsSystem system) { this.master = master; this.system = system; +pathModTimes = new HashMap<>(); + registry = new MetricsRegistry(Interns.info(NAME, DESCRIPTION)); replicationUtil = new ReplicationUtil(master); +replicationQueueTimeQuantiles = registry.newQuantiles(REPLICATION_QUEUE_TIME_QUANTILES, "replication queue time quantiles in milliseconds", "ops", Review comment: nit: capitalize "Replication" please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144057720 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() { protected int getMaxReplicationThreads() { return replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo()); } + + protected void addReplicationQueueTimeMetrics() { +// Exit early if replication table is offline +if (TableState.ONLINE != Tables.getTableState(master.getInstance(), ReplicationTable.ID)) { + return; +} + +// Exit early if we have no replication peers configured +if (replicationUtil.getPeers().isEmpty()) { + return; +} + +Set paths = replicationUtil.getPendingReplicationPaths(); + +// We'll take a snap of the current time and use this as a diff between any deleted +// file's modification time and now. The reported latency will be off by at most a +// number of seconds equal to the metric polling period +long currentTime = System.currentTimeMillis(); + +// Iterate through all the pending paths and update the mod time if we don't know it yet +for (Path path : paths) { + if (!pathModTimes.containsKey(path)) { +try { + pathModTimes.put(path, master.getFileSystem().getFileStatus(path).getModificationTime()); +} catch (IOException e) { + // Ignore all IOExceptions + // Either the system is unavailable or the file was deleted + // since the initial scan and this check Review comment: Add a `log.trace` in case you/someone ever runs into troubles and wants to try to debug this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics
joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics URL: https://github.com/apache/accumulo/pull/305#discussion_r144058956 ## File path: server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java ## @@ -26,38 +29,51 @@ import org.apache.accumulo.master.Master; import org.apache.accumulo.server.metrics.Metrics; import org.apache.accumulo.server.replication.ReplicationUtil; +import org.apache.hadoop.fs.Path; import org.apache.hadoop.metrics2.MetricsCollector; import org.apache.hadoop.metrics2.MetricsRecordBuilder; import org.apache.hadoop.metrics2.MetricsSource; import org.apache.hadoop.metrics2.MetricsSystem; import org.apache.hadoop.metrics2.lib.Interns; import org.apache.hadoop.metrics2.lib.MetricsRegistry; +import org.apache.hadoop.metrics2.lib.MutableQuantiles; +import org.apache.hadoop.metrics2.lib.MutableStat; /** * */ public class Metrics2ReplicationMetrics implements Metrics, MetricsSource { public static final String NAME = MASTER_NAME + ",sub=Replication", DESCRIPTION = "Data-Center Replication Metrics", CONTEXT = "master", RECORD = "MasterReplication"; - public static final String PENDING_FILES = "filesPendingReplication", NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads"; + public static final String PENDING_FILES = "filesPendingReplication", NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads", + REPLICATION_QUEUE_TIME_QUANTILES = "replicationQueue10m", REPLICATION_QUEUE_TIME = "replicationQueue"; private final Master master; private final MetricsSystem system; private final MetricsRegistry registry; private final ReplicationUtil replicationUtil; + private final MutableQuantiles replicationQueueTimeQuantiles; + private final MutableStat replicationQueueTimeStat; + private final Map pathModTimes; Metrics2ReplicationMetrics(Master master, MetricsSystem system) { this.master = master; this.system = system; +pathModTimes = new HashMap<>(); + registry = new MetricsRegistry(Interns.info(NAME, DESCRIPTION)); replicationUtil = new ReplicationUtil(master); +replicationQueueTimeQuantiles = registry.newQuantiles(REPLICATION_QUEUE_TIME_QUANTILES, "replication queue time quantiles in milliseconds", "ops", +"latency", 600); +replicationQueueTimeStat = registry.newStat(REPLICATION_QUEUE_TIME, "replication queue time stat in milliseconds", "ops", "latency", true); Review comment: nit: "statistics" instead of "stat" (and capitalize Replication too) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (ACCUMULO-4591) Replication Latency Metrics2
[ https://issues.apache.org/jira/browse/ACCUMULO-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200503#comment-16200503 ] Adam J Shook commented on ACCUMULO-4591: [~elserj], would you have time in the near future to take a look at the new PR? > Replication Latency Metrics2 > > > Key: ACCUMULO-4591 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4591 > Project: Accumulo > Issue Type: Improvement >Reporter: Noe >Assignee: Adam J Shook > Labels: pull-request-available > Fix For: 2.0.0 > > Attachments: Screen Shot 2017-02-23 at 9.31.06 AM.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently Files Pending Replication is the only available insight to the > state of replication. Latency of replication has been a great concern. > Without a latency metric users can not determine what configuration settings > either reduce or increase replication latency. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425 ] Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:10 PM: These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.21.v20170918 44263/tcp open httpJetty 6.1.26 50070/tcp open httpJetty 6.1.26 50090/tcp open httpJetty 6.1.26 >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 7693"... >>> Killed This port returned a different response after a timeout: 50075/tcp open httpJetty 6.1.26 >>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 >>> remote=localhost/127.0.0.1:50075]}} I have no feel for how often the 'ping -ts ' command is run and how often it would be provided an invalid port? I would assume a user would only supply a port if they suspected it to be a tserver. Given that case this situation would not happen very often, I suspect. I also noticed that if I stop the tablet servers after I'm in the shell and then run the ping command, the shell never returns the prompt to the user. Ctrl-C'ing at that points exits the shell as well. I would think that should be fixed since the purpose of the ping is to retrieve the status of a tablet server. Has that behavior been documented and/or verified previously? was (Author: jmark99): These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty
[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425 ] Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:10 PM: These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.21.v20170918 44263/tcp open httpJetty 6.1.26 50070/tcp open httpJetty 6.1.26 50090/tcp open httpJetty 6.1.26 >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 7693"... >>> Killed This port returned a different response after a timeout: 50075/tcp open httpJetty 6.1.26 >>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 >>> remote=localhost/127.0.0.1:50075] I have no feel for how often the 'ping -ts ' command is run and how often it would be provided an invalid port? I would assume a user would only supply a port if they suspected it to be a tserver. Given that case this situation would not happen very often, I suspect. I also noticed that if I stop the tablet servers after I'm in the shell and then run the ping command, the shell never returns the prompt to the user. Ctrl-C'ing at that points exits the shell as well. I would think that should be fixed since the purpose of the ping is to retrieve the status of a tablet server. Has that behavior been documented and/or verified previously? was (Author: jmark99): These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.
[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425 ] Mark Owens commented on ACCUMULO-4561: -- These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{ TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.21.v20170918 44263/tcp open httpJetty 6.1.26 50070/tcp open httpJetty 6.1.26 50090/tcp open httpJetty 6.1.26 >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 7693"... >>> Killed This port returned a different response after a timeout: 50075/tcp open httpJetty 6.1.26 >>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 >>> remote=localhost/127.0.0.1:50075] }} I have no feel for how often the 'ping -ts ' command is run and how often it would be provided an invalid port? I would assume a user would only supply a port if they suspected it to be a tserver. Given that case this situation would not happen very often, I suspect. I also noticed that if I stop the tablet servers after I'm in the shell and then run the ping command, the shell never returns the prompt to the user. Ctrl-C'ing at that points exits the shell as well. I would think that should be fixed since the purpose of the ping is to retrieve the status of a tablet server. Has that behavior been documented and/or verified previously? > Crash when using ping on a non-existing server > -- > > Key: ACCUMULO-4561 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4561 > Project: Accumulo > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Luis Tavarez > > While working on ACCUMULO-4558, I tried running > {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver > on my setup.) > And it caused the shell to exit (crashed) and show the following message. > {code}# > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill -9 %p" > # Executing /bin/sh -c "kill -9 25561"... > /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed > "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p > "$ACCUMULO_PASSWORD" "${@:2}" > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server
[ https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425 ] Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:09 PM: These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty 9.3.21.v20170918 44263/tcp open httpJetty 6.1.26 50070/tcp open httpJetty 6.1.26 50090/tcp open httpJetty 6.1.26 >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 7693"... >>> Killed This port returned a different response after a timeout: 50075/tcp open httpJetty 6.1.26 >>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 >>> remote=localhost/127.0.0.1:50075]}} I have no feel for how often the 'ping -ts ' command is run and how often it would be provided an invalid port? I would assume a user would only supply a port if they suspected it to be a tserver. Given that case this situation would not happen very often, I suspect. I also noticed that if I stop the tablet servers after I'm in the shell and then run the ping command, the shell never returns the prompt to the user. Ctrl-C'ing at that points exits the shell as well. I would think that should be fixed since the purpose of the ping is to retrieve the status of a tablet server. Has that behavior been documented and/or verified previously? was (Author: jmark99): These crashes appear to be occurring when sending a ping request to ports that have Jetty listening. I ran an nmap scan on my local machine looking for open ports and then ran the accumulo shell ping command against the open ports (closed ports return connection refused). Note that all these tests were run on the 2.0.0-SNAPSHOT. My results are listed below: {{ TServer port on local instance: 9997/tcp open palace-6? >>> localhost:9997:OK Following ports all returned same response: 2181/tcp open eforward? 4560/tcp open unknown 5355/tcp open llmnr? 8030/tcp open hadoop-ipc Hadoop IPC 8031/tcp open hadoop-ipc Hadoop IPC 8032/tcp open hadoop-ipc Hadoop IPC 8033/tcp open hadoop-ipc Hadoop IPC 8040/tcp open hadoop-ipc Hadoop IPC 9000/tcp open hadoop-ipc Hadoop IPC 34737/tcp open unknown 39473/tcp open hadoop-ipc Hadoop IPC 50010/tcp open unknown 50020/tcp open hadoop-ipc Hadoop IPC >>> localhost:8031 ERROR org.apache.thrift.transport.TTransportException 9998/tcp open distinct32? /tcp open abyss? 10001/tcp open scp-config? >>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid >>> method name: 'getTabletServerStatus' 13562/tcp open unknown >>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: >>> java.net.SocketTimeoutException: 12 millis timeout while waiting for >>> channel to be ready for read. ch : >>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 >>> remote=localhost/127.0.0.1:13562] Jetty ports: 8042/tcp open httpJetty 6.1.26 8088/tcp open httpJetty 6.1.26 9995/tcp open httpJetty
[jira] [Created] (ACCUMULO-4721) Document rfile-info in the user manual
Michael Wall created ACCUMULO-4721: -- Summary: Document rfile-info in the user manual Key: ACCUMULO-4721 URL: https://issues.apache.org/jira/browse/ACCUMULO-4721 Project: Accumulo Issue Type: Bug Affects Versions: 1.8.1, 1.7.3, 2.0.0 Reporter: Michael Wall Priority: Trivial Currently the 'old school' PrintInfo is documented at http://accumulo.apache.org/1.8/accumulo_user_manual.html#_tools We should document the 'rfile-info' info which is easier to remember than org.apache.accumulo.core.file.rfile.PrintInfo -- This message was sent by Atlassian JIRA (v6.4.14#64029)